Closed SWittouck closed 8 years ago
Hi, When splitting paralogs, its a best effort. When you have multiple insertions of the same gene into the same place, generally bioinformatics becomes painful (particularly when using short reads), as you have observed. It is difficult to split out the paralogs in this instance, so Roary is conservative in this situation, and doesn't split.
Clusters which contain paralogs are not included in the core genome alignment. So in the spreadsheet of gene presence and absence, the column 'Avg sequences per isolate' must equal 1, and the number of isolates in the cluster must meet the threshold for the core definition percentage (core = in 99% of isolates by default). If you've found otherwise, then its a bug. Andrew
Hi Andrew,
Thank you for the quick and clear answer.
Stijn
Hi,
I have some quick questions about paralog splitting in roary:
Thanks in advance.