Closed skaralegui closed 4 years ago
The plot is a direct picture of what is in the roary gene_presence_absence.csv
file. It's the number of ortholog clusters each isolate has members in. If there are gene duplications (paralogs) then they often only count as 1 ortholog cluster. So the number on the right will always be LESS than the number of CDS in the sample. The number at the bottom is the total number of ortholog clusters in the roary output ie. the pan genome. Depending on roary settings, this may also exclude small proteins.
Thank you for your response. I understand now and the explanaition makes sense. Nevertheless, I found a problem: the number of CDS in the assembly report (I have attached it in the previous message) is smaller than the number of the ortholog clusters that appears in the pangenome graphic, just the opposite as you have said...
The number of CDS in Prokka is different to the number of orthologs because small CDS are not included in roary, and also roary collapses paralogs (duplicated genes) into a single cluster.
Hi, I am comparing 8 S. marcescens strains and ran Nullarbor for it. When comparing my assembly table report (the CDS column) to the numbers that appeared in the pangenome graphic (to the right), they do not match. Are both data showing the CDS number of each strain or should I understand them as different things? If so, what is the number of the pangenome graphic meaning?
Thank you in advance! report.pdf