sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
315 stars 190 forks source link

Roary for metagenomes #470

Open YiJessePi opened 4 years ago

YiJessePi commented 4 years ago

I've read in the documentation that "Roary is not intended for meta-genomics or for comparing extremely diverse sets of genomes". Can you please explain why? What are the drawbacks of using it on gene calling from metagenomic data?

tseemann commented 4 years ago

Roary is fast because it expects lots of very similar proteins and uses cd-hit to speed that part up. After that it falls back to ALL vs ALL blastp. Metagenomes have lots of genes, let's say you have N. Then roary will take N x N time to run. It will never finish. Consider other tools like proteinortho, MMseqs2, cd-hit directly.

YiJessePi commented 4 years ago

Thanks Torsten! So is it just a matter of time? I've actually planned to execute Roary on reconstructed bins of the same species (is there any meaning for pangenome analysis for different species?) which I assume will have similar number of genes as an isolate genome.