marbl / harvest

Other
51 stars 11 forks source link

Question #12

Open gotero opened 9 years ago

gotero commented 9 years ago

Hi-

Is there a way to create an alignment with parsnp/harvesttools that includes the unaligned sequences in addition to the core sequences? I have 99% coverage in my genome alignments but there is still about 21k bp omitted from the alignment when creating the xmfa or multi-fasta alignment file. For example, my reference genome shrinks from 4043846 to 4023750, as do the rest of the aligned genomes. Those missing bases throw off the annotation results from the gwas on the snps since I'm using the reference genome's genbank file.

Suggestions?

Thanks!

Glen

treangen commented 9 years ago

Glen,

Is there a way to create an alignment with parsnp/harvesttools that includes the unaligned sequences in >addition to the core sequences?

Good question, while creating an output file of core sequences + unaligned regions per genome is not supported, '-u' will output all unaligned regions per genome. This, in addition to the aligned regions contained in the XMFA file, will produce a full representation of the sequence contained in each genome.

for example, my reference genome shrinks from 4043846 to 4023750, as do the rest of the aligned >genomes. Those missing bases throw off the annotation results from the gwas on the snps since I'm >using the reference genome's genbank file.

I'm not sure I fully understand. Assuming you are using an existing annotation for the reference, you will still be able to look up the gene annotations per each SNP despite the 21kbp unaligned. The XMFA file provides genome-specific coordinates of all aligned regions and the VCF file will list the reference-based coordinates of the position of each SNP. Neither of these files/positions will be affected by the unaligned sequence, unless you are trying to annotate the alignment and/or SNPs contained within the unaligned region(s).