marschall-lab / panacus

Panacus is a tool for computing statistics for GFA-formatted pangenome graphs
MIT License
73 stars 4 forks source link

how is panacus treating Ns #22

Closed subwaystation closed 5 months ago

subwaystation commented 5 months ago

Assuming I have 100 haplotypes and each brings in 1000 Ns. Would this lead to a steep growth curve or is panacus ignoring the Ns?

danydoerr commented 5 months ago

Panacus does not detect Ns, so it counts them as if they were resolved. A 1000 Ns in the human genome won't lead to a steep curve, even if you have 100 haplotypes. You probably won't see the effect--for comparison, the are euchromatic 82Mb in the hprc-v1.0-pggb graph that are unique, i.e., not shared by two ore more genomes. The most accurate way to treat them would be to exclude them from growth calculation, using Panacus' --exclude feature.

subwaystation commented 5 months ago

Ah, I need to take a closer look at the --exclude feature.

subwaystation commented 5 months ago

Thanks!

danydoerr commented 5 months ago

Ah, I need to take a closer look at the --exclude feature.

That, or actively include regions that you're sure about with --subset.