sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
314 stars 189 forks source link

Roary 3.5.8 works with -i 80 switch, but not with -i 90 or higher with large datasets? #234

Closed dutchscientist closed 8 years ago

dutchscientist commented 8 years ago

I am rerunning a dataset done with an earlier version of Roary, now with 3.5.8, and suddenly it refuses to do the whole set (497 genomes) with the -i 90 or higher setting. It works fine until the generation of the tree with the accessory genes, and then it stops. After 3 days still no movement, whereas it was previously done in <6h.

When I make the dataset smaller, it works fine again, and all the GFF files individually work fine. When using -i 70 or -i 60, it works fine.

Tried it both on a server (double Xeon, 64 Gb RAM etc) and the Sanger Pathogens VM, same result. Any thoughts?

Haven't tried going back to the previous version yet, is that easy with cpan?

dutchscientist commented 8 years ago

Update: have downgraded to 3.5.7 via: sudo cpanm -f AJPAGE/Bio-Roary-3.5.7

And then it works without a hitch again. Hence I think there is a bug somewhere in 3.5.8, I just don't know what it is!

andrewjpage commented 8 years ago

Hi, I have rolled back the changes and pushed it out as version 3.5.9. In 3.5.8 the entire accessory genome was used to create a binary tree & clusters, however it now goes back to its previous behaviour where its limited to 4000 genes (with the top and bottom % removed).

dutchscientist commented 8 years ago

Thanks, glad it's working again :)

tseemann commented 8 years ago

Can this be made an option? As we need the true pan-genome binary tree.