mskcc / vcf2maf

Convert a VCF into a MAF, where each variant is annotated to only one of all possible gene isoforms
Other
375 stars 218 forks source link

Empty 't_depth' 't_ref_count' 't_alt_count' from VCF with format GT:GL:GOF:GQ:NR:NV #351

Open ChristianRohde opened 9 months ago

ChristianRohde commented 9 months ago

Hi,

I have a kind of similar issue as in https://github.com/mskcc/vcf2maf/issues/332. I noticed this problem earlier and fixed the non matching names with previous data.

Now I have the problem that it seems that VCF2maf does not handle VCF files with the format GT:GL:GOF:GQ:NR:NV

Here I need to point out that I retrieved VCFv4.0 files from a colleague including a very rich annotation. Therefore I am running vcf2maf with --inhibit-vep parameter and hope that it will pick up data from my file. This is the format explanation from my VCF file:

##FORMAT=<ID=GL,Number=.,Type=Float,Description="Genotype log10-likelihoods for AA,AB and BB genotypes, where A = ref and B = variant. Only applicable for bi-allelic sites">
##FORMAT=<ID=GOF,Number=.,Type=Float,Description="Goodness of fit value">
##FORMAT=<ID=GQ,Number=.,Type=Integer,Description="Genotype quality as phred score">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Unphased genotypes">
##FORMAT=<ID=NR,Number=.,Type=Integer,Description="Number of reads covering variant location in this sample">
##FORMAT=<ID=NV,Number=.,Type=Integer,Description="Number of reads containing variant in this sample">

Here is my first line and first entry:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  P10
1       17539   .       C       A       1713    alleleBias      BRF=0.54;FR=0.5;HP=2;HapScore=1;MGOF=10;MMLQ=37;MQ=40.94;NF=28;NR=32;PP=1713;QD=29.1295;SC=TGTCTGATGCCCTGGGTCCCC;SbPval=0.4;Source=Platypus;TC=382;TCF=163;TCR=219;TR=60;WE=17547;WS=17528;GT_Classification=HETERO;SEGMENTAL_DUPLICATION;MAPABILITY=0.125;GOOD_MAP;VARIANT_CONFIDENCE=8;CSQ=A|intron_variant&non_coding_transcript_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000423562|unprocessed_pseudogene||4/9|ENST00000423562.1:n.488+67G>T|||||||||-1||SNV|HGNC|38034||||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000438504|unprocessed_pseudogene||5/11|ENST00000438504.2:n.604+63G>T|||||||||-1||SNV|HGNC|38034|YES|||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000450305|transcribed_unprocessed_pseudogene|||||||||||3869|1||SNV|HGNC|37102||||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000456328|processed_transcript|||||||||||3130|1||SNV|HGNC|37102|YES|||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000488147|unprocessed_pseudogene||5/10|ENST00000488147.1:n.574+67G>T|||||||||-1||SNV|HGNC|38034||||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000515242|transcribed_unprocessed_pseudogene|||||||||||3127|1||SNV|HGNC|37102||||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|downstream_gene_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000518655|transcribed_unprocessed_pseudogene|||||||||||3130|1||SNV|HGNC|37102||||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000538476|unprocessed_pseudogene||5/12|ENST00000538476.1:n.815+63G>T|||||||||-1||SNV|HGNC|38034||||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000541675|unprocessed_pseudogene||4/8|ENST00000541675.1:n.540-35G>T|||||||||-1||SNV|HGNC|38034||||||||||||||Ensembl||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|intron_variant&non_coding_transcript_variant|MODIFIER|WASH7P|653635|Transcript|NR_024540.1|transcribed_pseudogene||5/10|NR_024540.1:n.587+67G>T|||||||||-1||SNV|EntrezGene|38034|YES|||||||||||||RefSeq||C|C|OK||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|downstream_gene_variant|MODIFIER|DDX11L1|100287102|Transcript|NR_046018.2|transcribed_pseudogene|||||||||||3130|1||SNV|EntrezGene|37102|YES|||||||||||||RefSeq||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||,A|upstream_gene_variant|MODIFIER|MIR6859-1|102466751|Transcript|NR_106918.1|miRNA|||||||||||103|-1||SNV|EntrezGene||YES|||||||||||||RefSeq||C|C|||||||||||||||||||||||||||||||||||||||||||||0.163466|5.285|||||||||||     GT:GL:GOF:GQ:NR:NV      1/0:-58.46,0,-299.7:10:99:382:60

In case https://github.com/mskcc/vcf2maf/issues/332 you mention that vcf2maf needs the AD field. Can I somehow tweak the parameter vcf2maf uses to solve the problem my current VCF files? Unfortunately I did not spot any parameter like this in the help. But this should be possible, right?

Best, Christian

ChristianRohde commented 9 months ago

Hi,

finally I used vcf2maf with --retain-fmt NR,NV parameter. This gave me t_NR, t_NV in my exported MAF files.

Next I read in the files in R using maftools::read.maf(local_MAF_file) and combined all files one after the other with maftools::merge_mafs(). Afterwards I exported this file using maftools::write.mafSummary() and loaded with data.table::fread(). Here I could rename the cols t_NR, t_NV to t_depth, t_alt_count. From these values I can easily calculate VAF and t_ref_count. Finally I can read this table to MAF format using maftools::read.maf(table).

It sounds a bit complicated but works well.

Thank you, Christian