sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
324 stars 189 forks source link

paralogs not being split #290

Closed SWittouck closed 8 years ago

SWittouck commented 8 years ago

Hi,

I have some quick questions about paralog splitting in roary:

Thanks in advance.

andrewjpage commented 8 years ago

Hi, When splitting paralogs, its a best effort. When you have multiple insertions of the same gene into the same place, generally bioinformatics becomes painful (particularly when using short reads), as you have observed. It is difficult to split out the paralogs in this instance, so Roary is conservative in this situation, and doesn't split.

Clusters which contain paralogs are not included in the core genome alignment. So in the spreadsheet of gene presence and absence, the column 'Avg sequences per isolate' must equal 1, and the number of isolates in the cluster must meet the threshold for the core definition percentage (core = in 99% of isolates by default). If you've found otherwise, then its a bug. Andrew

SWittouck commented 8 years ago

Hi Andrew,

Thank you for the quick and clear answer.

Stijn