mskcc / vcf2maf

Convert a VCF into a MAF, where each variant is annotated to only one of all possible gene isoforms
Other
365 stars 215 forks source link

uninitialized value $effect{"cDNA_position"} #174

Closed sagarutturkar closed 3 years ago

sagarutturkar commented 6 years ago

I am using the SnpEFF annotated VCF for canine data and trying to convert to MAF format. As instructed in other issues, I have renamed my SnpEff annotated VCF as "test.vcf", created a copy of same VCF as "test.vep.vcf" and ran vcf2maf as:

perl vcf2maf.pl \ --input-vcf test.vcf \ --output-maf test.maf \ --ref-fasta Reference/genome_ref.fasta \ --species canis_familiaris --ncbi-build CanFam3.1

However, I get the warnings as:

WARNING: No genotype column for TUMOR in VCF!
WARNING: No genotype column for NORMAL in VCF!
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in list assignment at ../vcf2maf.pl line 649, <GEN4> line 98.
Use of uninitialized value in split at ../vcf2maf.pl line 659, <GEN4> line 98.
Use of uninitialized value $effect{"cDNA_position"} in pattern match (m//) at ../vcf2maf.pl line 713, <GEN4> line 98.
.
..
...
<Truncated output>

I get "test.maf" file generated but has all Hugo_Symbol as unknown and most of the MAF fields as blank. Although this is the test data, my goal is to generate MAF file from multiple samples and use it with maftools for generate visualizations. Please help with this issue.

Please check attached input and output files. test.maf.txt test.vcf.txt test.vep.vcf.txt

ckandoth commented 6 years ago

Thanks for sharing the files. Unfortunately the annotation field names generated by snpEff are very different from what VEP generates. This is what I see in the VCF header:

##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">

vcf2maf does not support these snpEff fields, and it would take some development time to translate these into the corresponding MAF fields. I have to put this on the backlog. In the meantime, I strongly recommend using VEP instead of snpEff. The newest VEP is significantly faster than older versions, and for some types of variants, it's faster than snpEff. Click here for instructions installing VEP v101.

Nov 2020 Update: vcf2maf now has an option called --inhibit-vep which will skip running VEP, and instead try its best to extract MAF fields from your given VCF. This is prone to errors or warnings, but only if you ran VEP with different parameters than vcf2maf does. And certainly snpEFF will produce different VCFs than VEP, but you will likely still end up with a usable MAF-like file. So, --inhibit-vep is worth a try.

sagarutturkar commented 6 years ago

Thank you for the reply. As a possible workaround: If I use the online VEP to annotate VCF files, and then follow the suggested route - name file as test.vcf, copy as test.vep.vcf then that should work? Do you think there might be an issue with using online VEP?

ckandoth commented 6 years ago

That might work. If you're using GRCh37 use this one - https://grch37.ensembl.org/Tools/VEP - and check all boxes so that it produces the most comprehensive annotations. You should be able to export the results as VCF format after the run completes. There are some options in the command-line VEP that are not available online, but any missing data will just be blank data in the resulting MAF, which is usually OK.

sahuno commented 3 years ago

Thanks for sharing the files. Unfortunately the annotation field names generated by snpEff are very different from what VEP generates. This is what I see in the VCF header:

##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">

vcf2maf does not support these snpEff fields, and it would take some development time to translate these into the corresponding MAF fields. I have to put this on the backlog. In the meantime, I strongly recommend using VEP instead of snpEff. The newest VEP is significantly faster than older versions, and for some types of variants, it's faster than snpEff. Click here for instructions installing VEP v101.

Nov 2020 Update: vcf2maf now has an option called --inhibit-vep which will skip running VEP, and instead try its best to extract MAF fields from your given VCF. This is prone to errors or warnings, but only if you ran VEP with different parameters than vcf2maf does. And certainly snpEFF will produce different VCFs than VEP, but you will likely still end up with a usable MAF-like file. So, --inhibit-vep is worth a try.

Please what values does --inhibit-vep accept? Thanks S

ckandoth commented 3 years ago

@sahuno please open a new issue if you have an unrelated question. But quick answer: --inhibit-vep does not take an values. It just inhibits vcf2maf from running VEP.

ckandoth commented 3 years ago

snpEff annotated VCFs will continue to be unsupported by vcf2maf. The INFO/ANN format generated by snpEff is very different from VEP, and I do not have the bandwidth to handle and maintain support for both formats. vcf2maf has deep integration with VEP, which I strongly recommend over snpEff.