parsnp calculates genome size wrongly

marbl / harvest

Other

51 stars 11 forks source link

I noticed the same thing. I was trying to identify which contigs in my fasta files were aligning within each cluster by reading through the parsnp.xmfa file, when I noticed that many of the coordinates provided were running past the entire length of my assemblies.

I kind of figured out that there is probably some sort of padding being placed between contigs during alignment. There appears to be an offset of about 1 kbp per contig, but I can tell its inexact.

However, this behaviour is only observed with fragmented assemblies. The offsets are shorter for complete reference genomes, which follows the value @Ekie22 gave.

Any info on how to convert the inflated alignment coordinates within each cluster in the xmfa files into exact loci in each fasta file would be great!

marbl / harvest

parsnp calculates genome size wrongly #43