sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
307 stars 189 forks source link

Missing capability of "synteny" + "splitting paralogs" ? #395

Open tseemann opened 6 years ago

tseemann commented 6 years ago

We had recent bioinformatics group meeting and it became apparent that there may be some missing functionality in roary with respect to its use of synteny and its treatment of paralogs.

The situation is S.pyogenes which has 100+ gene duplications, and is also recombinant and rearranged. By default it gets too many clusters, possibly because of "synteny enforcement". If you use -s it removes synteny (?) but also forces paralogs into a single cluster.

I think what was wanted was a way to keep paralogs separate and use synteny still?

Does this make sense at all?

P.S. One way I thought was to pre-process the GFF files so each CDS was put into its own contig, therefore removing any synteny (too many clusters) but not forcing -s use.

felipelira commented 5 years ago

For this, as a suggestion, you may test to use get_homologues. I use to use and it works in this sense. Any other suggestion?

tseemann commented 5 years ago

@felipelira what is get_homologues ?

cwbcm commented 4 years ago

@tseemann I think felipelira was referring to this. https://github.com/eead-csic-compbio/get_homologues