mskcc / vcf2maf

Convert a VCF into a MAF, where each variant is annotated to only one of all possible gene isoforms
Other
365 stars 215 forks source link

Error running vcf2maf #11

Closed AshiqMasood closed 9 years ago

AshiqMasood commented 9 years ago

Hi Cyriac,

I am trying to run vcf2maf for my exome data. Here is the error I keep getting? Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 787, line 109. Use of uninitialized value in string eq at vcf2maf.pl line 788, line 109. Use of uninitialized value in string eq at vcf2maf.pl line 789, line 109. Use of uninitialized value in string eq at vcf2maf.pl line 790, line 109. Use of uninitialized value in string eq at vcf2maf.pl line 791, line 109. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 792, line 109. Use of uninitialized value in string eq at vcf2maf.pl line 793, line 109. Use of uninitialized value in string eq at vcf2maf.pl line 794, line 109. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 795, line 109. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 796, line 109. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 797, line 109. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 798, line 109. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 799, line 109. Use of uninitialized value in string eq at vcf2maf.pl line 800, line 109. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 801, line 109. Use of uninitialized value in string eq at vcf2maf.pl line 802, line 109. Use of uninitialized value in string eq at vcf2maf.pl line 803, line 109. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 787, line 110. Use of uninitialized value in string eq at vcf2maf.pl line 788, line 110. Use of uninitialized value in string eq at vcf2maf.pl line 789, line 110. Use of uninitialized value in string eq at vcf2maf.pl line 790, line 110. Use of uninitialized value in string eq at vcf2maf.pl line 791, line 110. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 792, line 110. Use of uninitialized value in string eq at vcf2maf.pl line 793, line 110. Use of uninitialized value in string eq at vcf2maf.pl line 794, line 110. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 795, line 110. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 796, line 110. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 797, line 110. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 798, line 110. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 799, line 110. Use of uninitialized value in string eq at vcf2maf.pl line 800, line 110. Use of uninitialized value in pattern match (m//) at vcf2maf.pl line 801, line 110.

Can you please help me in this regard.

Best, Ashiq

ckandoth commented 9 years ago

Can you provide the command you are trying to run, and a sample line or two from your input file? My 2 best guesses are that your input file has oddities, or vcf2maf/VEP has not been installed properly.

AshiqMasood commented 9 years ago

Hi Cyriac, Thank you for your reply. I have already annotated vcf's with snpEFF. This is an example command: perl /home/amasood/ashiq_tools/vcf2maf.pl --input-snpEff /local/projects-t3/TCGA/RCCEX/mutect_results/T1_RCC.Pass.snpeff.vcf --output-maf T1.maf --tumor-id T1_RCC --normal-id N1_RCC . Here are the few lines of my vcf:

reference=file:///local/projects-t3/1000G/bcantarel_bams/human_g1k_v37_decoy.fasta

SnpEffVersion="4.1b (build 2015-02-13), by Pablo Cingolani"

SnpEffCmd="SnpEff GRCh38.78 /local/projects-t3/TCGA/RCCEX/mutect_results/T1_RCC.Pass.vcf "

INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGV

INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'

INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_af

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT N1 T1

1 6473219 . G A . PASS SOMATIC;VT=SNP;ANN=A|upstream_gene_variant|MODIFIER|PLEKHG5|ENSG00000171680|transcript|ENST00000487949|retained_intron||n.-1C>T|||||1636|,A|i 1 9791407 . T C . PASS SOMATIC;VT=SNP;ANN=C|intron_variant|MODIFIER|CLSTN1|ENSG00000171603|transcript|ENST00000377298|protein_coding|1/18|c.92-18013A>G||||||,C|intr 1 11115117 . G A . PASS SOMATIC;VT=SNP;ANN=A|upstream_gene_variant|MODIFIER|MTOR|ENSG00000198793|transcript|ENST00000473471|processed_transcript||n.-1C>T|||| 1 17953831 . C A . PASS SOMATIC;VT=SNP;ANN=A|intergenic_region|MODIFIER|ACTL8-RP11-174G17.2|ENSG00000117148-ENSG00000261781|intergenic_region|ENSG00000117148 1 27216501 . C A . PASS SOMATIC;VT=SNP;ANN=A|intergenic_region|MODIFIER|SNRPEP7-RP11-40H20.4|ENSG00000225990-ENSG00000224311|intergenic_region|ENSG0000022599 1 78426070 . G T . PASS SOMATIC;VT=SNP;ANN=T|intron_variant|MODIFIER|PTGFR|ENSG00000122420|transcript|ENST00000370758|protein_coding|2/3|c.-73+27045G>T|||||| 1 99380468 . G C . PASS SOMATIC;VT=SNP;ANN=C|intergenicregion|MODIFIER|LPPR4-RP4-735N21.1|ENSG00000117600-ENSG00000233983|intergenic_region|ENSG00000117600 1 153941198 . G A . PASS SOMATIC;VT=SNP;ANN=A|upstream_gene_variant|MODIFIER|DENND4B|ENSG00000198837|transcript|ENST00000477746|retained_intron||n.-1C>T|||||4 1 154492715 . A C . PASS SOMATIC;VT=SNP;ANN=C|upstream_gene_variant|MODIFIER|SHE|ENSG00000169291|transcript|ENST00000555188|protein_coding||c.-2T>G|||||3549|W 1 156202231 . G A . PASS SOMATIC;VT=SNP;ANN=A|downstream_gene_variant|MODIFIER|SLC25A44|ENSG00000160785|transcript|ENST00000482737|processed_transcript||n._45 1 156202232 . C A . PASS SOMATIC;VT=SNP;ANN=A|downstream_gene_variant|MODIFIER|SLC25A44|ENSG00000160785|transcript|ENST00000482737|processed_transcript||n._45 1 185269330 . A T . PASS SOMATIC;VT=SNP;ANN=T|upstream_gene_variant|MODIFIER|RP4-635A23.3|ENSG00000233583|transcript|ENST00000441261|processed_pseudogene||n.- 1 207200960 . C T . PASS SOMATIC;VT=SNP;ANN=T|intergenic_region|MODIFIER|RP11-164O23.7-C4BPAP2|ENSG00000243636-ENSG00000232621|intergenic_region|ENSG000002436 1 214491392 . T C . PASS SOMATIC;VT=SNP;ANN=C|intron_variant|MODIFIER|PTPN14|ENSG00000152104|transcript|ENST00000366956|protein_coding|1/18|c.-154-26435A>G||| 1 240255583 . C G . PASS SOMATIC;VT=SNP;ANN=G|intron_variant|MODIFIER|FMN2|ENSG00000155816|transcript|ENST00000319653|protein_coding|6/17|c.4066-2362C>G|||||| 1 240370662 . A C . PASS SOMATIC;VT=SNP;ANN=C|intron_variant|MODIFIER|FMN2|ENSG00000155816|transcript|ENST00000319653|protein_coding|14/17|c.4858+14754A>C|||| 2 11355209 . C G . PASS SOMATIC;VT=SNP;ANN=G|downstream_gene_variant|MODIFIER|AC018463.5|ENSG00000226961|transcript|ENST00000433548|processed_pseudogene||n. 2 17947999 . T A . PASS SOMATIC;VT=SNP;ANN=A|intron_variant|MODIFIER|KCNS3|ENSG00000170745|transcript|ENST00000465292|processed_transcript|2/4|n.305+30128T>A 2 27801343 . T C . PASS SOMATIC;VT=SNP;ANN=C|intron_variant|MODIFIER|RBKS|ENSG00000171174|transcript|ENST00000302188|protein_coding|7/7|c.796-19555A>G||||||, 2 40392123 . G A . PASS SOMATIC;VT=SNP;ANN=A|intron_variant|MODIFIER|SLC8A1|ENSG00000183023|transcript|ENST00000332839|protein_coding|1/9|c.1808+36350C>T|||| 2 54856517 . T C . PASS SOMATIC;VT=SNP;ANN=C|intron_variant|MODIFIER|EML6|ENSG00000214595|transcript|ENST00000356458|protein_coding|10/40|c.1657+2662T>C||||| 2 68873364 . C G . PASS SOMATIC;VT=SNP;ANN=G|upstream_gene_variant|MODIFIER|BMP10|ENSG00000163217|transcript|ENST00000295379|protein_coding||c.-160G>C|||||18 2 70315951 . G T . PASS SOMATIC;VT=SNP;ANN=T|intergenic_region|MODIFIER|FAM136A-BRD7P6|ENSG00000035141-ENSG00000235289|intergenic_region|ENSG00000035141-ENSG 2 121981878 . G T . PASS SOMATIC;VT=SNP;ANN=T|intergenic_region|MODIFIER|AC018737.3-AC062020.2|ENSG00000224655-ENSG00000237856|intergenic_region|ENSG000002246 2 122202454 . G A . PASS SOMATIC;VT=SNP;ANN=A|intergenic_region|MODIFIER|AC018737.3-AC062020.2|ENSG00000224655-ENSG00000237856|intergenic_region|ENSG000002246 2 172693665 . A G . PASS SOMATIC;VT=SNP;ANN=G|downstream_gene_variant|MODIFIER|snoU13|ENSG00000239041|transcript|ENST00000458863|snoRNA||n.104A>G|||||2263|,G 2 235404481 . C T . PASS SOMATIC;VT=SNP;ANN=T|intergenic_region|MODIFIER|AC092576.1-AGAP1|ENSG00000216002-ENSG00000157985|intergenic_region|ENSG00000216002-EN 2 235405071 . G A . PASS SOMATIC;VT=SNP;ANN=A|intergenic_region|MODIFIER|AC092576.1-AGAP1|ENSG00000216002-ENSG00000157985|intergenic_region|ENSG00000216002-EN 3 5025062 . C T . PASS SOMATIC;VT=SNP;ANN=T|intergenic_region|MODIFIER|BHLHE40-RNF10P1|ENSG00000134107-ENSG00000230182|intergenic_region|ENSG00000134107-ENSG0000023 3 6903147 . C

ckandoth commented 9 years ago

Thanks. vcf2maf does not yet support the more recent version of snpEff that you are using. It may take a month till I can add that support. In the meantime, I would strongly recommend using VEP instead. It generates a ton more clinically relevant annotations.

AshiqMasood commented 9 years ago

Sounds good to me ... I will use VEP

Thanks, Ashiq

AshiqMasood commented 9 years ago

Hi Cyriac,

I tried snpEff_v3_6_core.zip and snpEff_v4_0_core.zip. I am getting same errors. I also used a vcf from a shimmer snp caller.

I am trying to download VEP but I was wondering what could be the reason for the above error?

Many thanks, Ashiq

ckandoth commented 9 years ago

Hi Ashiq. Unfortunately, I abandoned support for snpEff in vcf2maf 1.6, because of all the major additions to VEP like --pick. The good news is that both VEP and snpEff authors wrote a spec sheet for VCF INFO:ANN. So I foresee a common ANN notation that is closer to VEP's VCF output, that vcf2maf should be able to handle.

ckandoth commented 3 years ago

Update Nov 2020: vcf2maf is currently compatible with VEP's INFO/ANN format, defined as follows in a VCF header:

##INFO=<ID=ANN,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|ALLELE_NUM|DISTANCE|STRAND|FLAGS|PICK|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|TSL|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|RefSeq|GENE_PHENO|SIFT|PolyPhen|DOMAINS|HGVS_OFFSET|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|AA_AF|EA_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|TRANSCRIPTION_FACTORS">

Whereas snpEff 5.0 INFO/ANN format looks like this:

##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">

Fields are named differently, there is whitespace that needs cleanup, and their values are possibly different too. snpEff VCFs will continue to be unsupported by vcf2maf. I strongly recommend using VEP with vcf2maf.