gtonkinhill / panaroo

An updated pipeline for pangenome investigation
MIT License
269 stars 34 forks source link

Discrepency between final_graph.gml and gene_presence_absence.csv #276

Closed sydelstan closed 7 months ago

sydelstan commented 8 months ago

Hello,

Genes listed as absent from strains according to the gene_presence_absence.csv file are present in strains according to the final_graph.gml file. The gene_presence_absence.csv file may indicate that only 350 strains carry the gene for example, but the final_graph.gml file lists that there are 1120 members with the gene. I have checked this manually by aligning the gene sequence indicated in the final_graph.gml file to the contigs of strain indicated not to have the gene according to the gene_presence_absence.csv file and it is indeed present.

I would greatly appreciate some clarity on this.

gtonkinhill commented 8 months ago

Hi,

This may be related to issue #275. Are you using the --refind_strict flag? We've just identified a bug in this recent addition to the code base and are working on releasing a fix.

gtonkinhill commented 7 months ago

This is hopefully resolved in the latest release. Please let me know if that is not the case.

sydelstan commented 7 months ago

I am not using the --refind_strict flag, I will try the newest release

gtonkinhill commented 7 months ago

Thanks, let me know if it doesn't fix the issue.