Open YiJessePi opened 4 years ago
Roary is fast because it expects lots of very similar proteins and uses cd-hit
to speed that part up. After that it falls back to ALL vs ALL blastp
. Metagenomes have lots of genes, let's say you have N. Then roary will take N x N
time to run. It will never finish. Consider other tools like proteinortho, MMseqs2, cd-hit directly.
Thanks Torsten! So is it just a matter of time? I've actually planned to execute Roary on reconstructed bins of the same species (is there any meaning for pangenome analysis for different species?) which I assume will have similar number of genes as an isolate genome.
I've read in the documentation that "Roary is not intended for meta-genomics or for comparing extremely diverse sets of genomes". Can you please explain why? What are the drawbacks of using it on gene calling from metagenomic data?