I was using the variants command to annotate SNPs with their mutation types. In the case where the ancestral allele maps to the ALT allele, mutyper is supposed to flip the genotypes (0->1, 1->0) for that SNP in the VCF file. However I noticed that missing genotypes (.) will always get converted to 0 during this process. To illustrate the problem, I have the following fasta file:
>foo:chr1
AAACCCgggTTT
And the following VCF file
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr1,length=248387328>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele frequency in genotypes">
##INFO=<ID=mutation_type,Number=1,Type=Character,Description="ancestral 3-mer mutation type">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2
chr1 2 . A T 30 PASS AN=3;AC=2;AF=0.67;mutation_type=AAG>ATG GT 1|. 1/0
chr1 3 . C A 30 PASS AN=3;AC=1;AF=0.33;mutation_type=ACT>AAT GT .|1 0/0
chr1 4 . T C 30 PASS AN=2;AC=1;AF=0.50;mutation_type=ATT>ACT GT .|. 1/0
This is the output (with header removed) after processing the VCF with mutyper variants --chrom_pos 1 $fasta_file $vcf_file:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2
chr1 2 . A T 30 PASS AN=3;AC=2;AF=0.67;mutation_type=AAA>ATA GT 1|. 1/0
chr1 3 . A C 30 PASS AN=3;AC=2;AF=0.666667;mutation_type=AAC>ACC GT 0|0 1/1
chr1 4 . C T 30 PASS AN=2;AC=1;AF=0.5;mutation_type=ACC>ATC GT 0|0 0/1
As you can see, both variants at position 3 and 4 had their REF and ALT allele flipped, and the missing genotype . was converted to a 0. I suspect the culprit is this line:
variant.genotypes = [
[int(not gt[0]), int(not gt[1]), gt[2]] for gt in variant.genotypes
]
cyvcf2 represents the missing genotype with -1, and as such, int(not -1)=0 .
Hi,
I was using the
variants
command to annotate SNPs with their mutation types. In the case where the ancestral allele maps to theALT
allele,mutyper
is supposed to flip the genotypes (0
->1
,1
->0
) for that SNP in the VCF file. However I noticed that missing genotypes (.
) will always get converted to0
during this process. To illustrate the problem, I have the following fasta file:And the following VCF file
This is the output (with header removed) after processing the VCF with
mutyper variants --chrom_pos 1 $fasta_file $vcf_file
:As you can see, both variants at position 3 and 4 had their REF and ALT allele flipped, and the missing genotype
.
was converted to a0
. I suspect the culprit is this line:cyvcf2
represents the missing genotype with-1
, and as such,int(not -1)=0
.