vatlab / varianttools

software tool for the manipulation, annotation, selection, and analysis of variants in the context of next-gen sequencing analysis
https://vatlab.github.io/vat-docs/
GNU General Public License v3.0
31 stars 4 forks source link

Stop combining variants in `vtools export` #55

Open BoPeng opened 6 years ago

BoPeng commented 6 years ago

I have those two variants in my vtools variant database:

4       106156653       T       C                       Scan1,Scan2                                                ....,.,.,.,....
4       106156653       T       G                       Scan1,Scan2                                                ....,.,.,.,....

So, when I export it to vcf with the following command

vtools export variant --format $SCRIPTS/myvcf.fmt --header CHROM POS ID REF ALT QUAL FILTER INFO --var_info callers genotypes --output ./Variants_raw.vcf

These variants will be combined to a multi-allelic entry like this:

4    106156653 .    T    C,G  .    PASS callers=[u'Scan1|Scan2', u'Scan1|Scan2'];genotypes=[u'....|.|.|.|....', u'....|.|.|.|....']

This is very bad – for one, because the further processing gets corrupted by the MAV and these strange [] arrays are also difficult to process. I would prefer it to output just one line per each variant, just as it would be done via vtools export.

Surely there will be a nice little workaround for this, I assume… But I seem not to be able to find it already…

So, can you help me with this another time?

BoPeng commented 6 years ago

Changing

export_by=chr,%(pos)s,%(ref)s

to

export_by=chr,%(pos)s,%(ref)s,%(alt)s

in vcf.fmt

[format description]
description=Import vcf
variant=chr,%(pos)s,%(ref)s,%(alt)s
genotype=%(geno)s
variant_info=%(var_info)s
genotype_info=%(geno_info)s
# variants with identical chr,pos,ref will be collapsed.
export_by=chr,%(pos)s,%(ref)s

should solve the problem.