marbl / parsnp

Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. Input can be both draft assemblies and finished genomes, and output includes variant (SNP) calls, core genome phylogeny and multi-alignments. Parsnp leverages contextual information provided by multi-alignments surrounding SNP sites for filtration/cleaning, in addition to existing tools for recombination detection/filtration and phylogenetic reconstruction.
Other
127 stars 25 forks source link

Mapping coordinates of the MAF back to the reference #38

Closed plasmid02 closed 8 months ago

plasmid02 commented 6 years ago

Anyone know if someone has solved how to map coordinates from the multi-alignment file output back to the original coordinate son the reference genbank file?

If it hasn't been done, I make have a a try at it.

bkille commented 8 months ago

The header for each record is the following

[fileidx]:[concat_start]-[concat_end] [strand] cluster[x] s[contig_idx]:p[contig_pos]

The concat_start and concat_end values are internal to parsnp. The sequence for this record can be found in the file at index fileidx (these are declared at the top of the xmfa) on the contig_idxth contig starting at position contig_pos.