mummer4 / mummer

Mummer alignment tool
Artistic License 2.0
470 stars 108 forks source link

Feature request: vcf output #37

Open lfaller opened 7 years ago

lfaller commented 7 years ago

This would be very helpful to visualize in IGV or Geneious!

Thank you, ~Lina

gmarcais commented 7 years ago

Can you give more details on what you envision. It seems that it is not a direct one-to-one transformation from an alignment file to a variant calling format.

lfaller commented 7 years ago

A lot of the folks in the lab at my institute use Geneious or IGV to look at alignments. These programs can read alignment files (bam, sam, etc) and they can also read a VCF file that contains SNP and Indel information.

Unfortunately Mummer outputs SNP data in an uncommon format that is hard to overlay onto an alignment file.

My specific use case is as follows: I have two related bacterial genomes. I want to find out how similar they are. I used Mummer to align the two genomes and generated the SNP output. Now I would like to load one of the genomes into Geneious and add a track of the SNP information on top so I can see where the two genomes differ. I could easily do this with a VCF file.

Cheers, ~Lina

lfaller commented 6 years ago

Hello! Do you have any further thoughts on this?

Thanks!

apredeus commented 6 years ago

I agree - this would be an extremely useful feature. Of the few programs capable exporting the differences between the two genomes, most work quite .. unpredictably. It would be extremely useful to streamline finding SNPs and small indels, and annotate them with something like snpEff, to quickly assess if the variant is missense/nonsense/silent, identify frameshifts, etc.

You are right of course that it's not a direct or easy transition. Large structural variations could be pretty hard to describe. But at least small indels and SNPs would be very useful.

gmarcais commented 6 years ago

I am OK with supporting an extra format, but the VCF is designed to report SNPs, not alignments. That is, to return a VCF files, some decision needs to be made on what or how to report, it is not just a matter of formatting the output differently.

So I have 2 thoughts about that proposal:

fritzsedlazeck commented 6 years ago

Hey, sorry to jump into this discussion. It would be great to have an option in show-snps to output a VCF file. This would ease the usage of the results and also the comparison to e.g. mapping based methods.

Do you know of any converter script?

Thanks Fritz

Ellis-Anderson commented 5 years ago

Hey all, I'm also here to advocate for show-snps to have a VCF output option. Trying to parse the current output into a vcf - particularly if you want to look at InDels - is not particularly easy.

It would make sense to me that show-snps could produce an output that is rather standard across the field.

Best, Ellis

apredeus commented 5 years ago

Minimap2 now comes with a tool named paftools.js that allows reporting SNPs in VCF format, based on whole-genome comparison. Maybe nucmer developers can implement VCF output in a similar fashion, so that results would be more easily comparable between the two tools.

shadiakiki1986 commented 4 years ago

For future-comers, this might be useful: https://www.biostars.org/p/395210/ It's a utility script that takes the show-snps -T ... output and converts it to vcf. Also, check here for a few examples.