mskcc / vcf2maf

Convert a VCF into a MAF, where each variant is annotated to only one of all possible gene isoforms
Other
373 stars 216 forks source link

ERROR: You're either using an outdated samtools, or --ref-fasta is not the same genome build as your --input-vcf #229

Closed milgri closed 5 years ago

milgri commented 5 years ago

Good morning, I have tried converting my file from vcf to maf with a line

perl vcf2maf.pl --input-vcf /media/milda/Elements/OneDrive/OneDrive/TXT_files/COADandREAD/VCF_files/COAD.vcf/COAD_CYT-High/all.vcf --ref-fasta /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 --output-maf /media/milda/Elements/OneDrive/OneDrive/TXT_files/COADandREAD/VCF_files/COAD.vcf/COAD_CYT-High/all.maf

but it did not work:

ERROR: You're either using an outdated samtools, or --ref-fasta is not the same genome build as your --input-vcf. at vcf2maf.pl line 389.

Here are the other lines that appeared before the error:

Use of uninitialized value in split at vcf2maf.pl line 272, line 72989. Use of uninitialized value $pos in subtraction (-) at vcf2maf.pl line 361, line 72989. Use of uninitialized value in addition (+) at vcf2maf.pl line 361, line 72989. Use of uninitialized value $pos in addition (+) at vcf2maf.pl line 361, line 72989. Use of uninitialized value $pos in concatenation (.) or string at vcf2maf.pl line 365, line 72989. Use of uninitialized value $pos in concatenation (.) or string at vcf2maf.pl line 365, line 72989. [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [E::fai_build3_core] Failed to open the file /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38 [faidx] Could not load fai index of /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38

I downloaded the new samtools but the issue seems to remain. Is there a way I could fix this? Thank you for your time and help.

Best Regards, Milda

wujianming604 commented 5 years ago

first: --ref-fasta /media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38/Homo_sapiens.GRCh38.75.dna.primary_assembly.fa second: your input vcf Header's reference name must be same to your --ref-fasta ,for example:

reference=/media/milda/Elements/OneDrive/vep/homo_sapiens_merged/97_GRCh38/Homo_sapiens.GRCh38.75.dna.primary_assembly.fa

You can try it ~

milgri commented 5 years ago

Good morning, there is no file named Homo_sapiens.GRCh38.75.dna.primary_assembly.fa in the 97_GRCh38 folder - it is just full of folders: https://gyazo.com/7f91ab729119e0c330800ce5ff0f8bd8

ckandoth commented 5 years ago

Start with typing perl vcf2maf.pl --help to read the documentation. For example:

$ perl vcf2maf.pl --help
Usage:
     perl vcf2maf.pl --help
     perl vcf2maf.pl --input-vcf WD4086.vcf --output-maf WD4086.maf --tumor-id WD4086 --normal-id NB4086

Options:
     --input-vcf      Path to input file in VCF format
     --output-maf     Path to output MAF file
     --tmp-dir        Folder to retain intermediate VCFs after runtime [Default: Folder containing input VCF]
     --tumor-id       Tumor_Sample_Barcode to report in the MAF [TUMOR]
     --normal-id      Matched_Norm_Sample_Barcode to report in the MAF [NORMAL]
     --vcf-tumor-id   Tumor sample ID used in VCF's genotype columns [--tumor-id]
     --vcf-normal-id  Matched normal ID used in VCF's genotype columns [--normal-id]
     --custom-enst    List of custom ENST IDs that override canonical selection
     --vep-path       Folder containing the vep script [~/vep]
     --vep-data       VEP's base cache/plugin directory [~/.vep]
     --vep-forks      Number of forked processes to use when running VEP [4]
     --buffer-size    Number of variants VEP loads at a time; Reduce this for low memory systems [5000]
     --any-allele     When reporting co-located variants, allow mismatched variant alleles too
     --online         Use useastdb.ensembl.org instead of local cache (supports only GRCh38 VCFs listing <100 events)
     --ref-fasta      Reference FASTA file [~/.vep/homo_sapiens/95_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz]
     --filter-vcf     A VCF for FILTER tag common_variant. Set to 0 to disable [~/.vep/ExAC_nonTCGA.r0.3.1.sites.vep.vcf.gz]
     --max-filter-ac  Use tag common_variant if the filter-vcf reports a subpopulation AC higher than this [10]
     --species        Ensembl-friendly name of species (e.g. mus_musculus for mouse) [homo_sapiens]
     --ncbi-build     NCBI reference assembly of variants MAF (e.g. GRCm38 for mouse) [GRCh37]
     --cache-version  Version of offline cache to use with VEP (e.g. 75, 84, 91) [Default: Installed version]
     --maf-center     Variant calling center to report in MAF [.]
     --retain-info    Comma-delimited names of INFO fields to retain as extra columns in MAF []
     --retain-fmt     Comma-delimited names of FORMAT fields to retain as extra columns in MAF []
     --min-hom-vaf    If GT undefined in VCF, minimum allele fraction to call a variant homozygous [0.7]
     --remap-chain    Chain file to remap variants to a different assembly before running VEP
     --help           Print a brief help message and quit
     --man            Print the detailed manual

From here, you can see a description of the values that you need to give for each argument. For example, --ref-fasta needs to point to a reference genome fasta file. Always use a reference that matches what your VCF is based on. It could be either GRCh37, hg19, GRCh38, etc.

aditisk commented 4 years ago

I am getting the same samtools error as mentioned above. I am using the same reference file used to generate the VCF. I'm using v1.6.18 ( installed from bioconda https://bioconda.github.io/recipes/vcf2maf/README.html) and my command is pasted below:

vcf2maf.pl --input-vcf HN01.vcf --output-maf HN01.maf --inhibit-vep --ref-fasta /bgfs/genomics/refs/igenomes/Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.fasta

lentaing commented 3 years ago

I've been having this issue with v1.6.18 as well. I noticed that vcf2maf was trying to input non-sense regions like ":-1-1" into the samtools faidx cmd. I was able to solve these errors by adding a simple check at ~ line 362-370:

my $region = "$chr:" . ( $pos - 1 ) . "-" . ( $pos + length( $ref ));
if ($chr ne "") {
    $ref_bps{$region} = $ref;
    push( @ref_regions, $region );
    $uniq_regions{$region} = 1;
    $uniq_loci{"$chr:$pos-$pos"} = 1;
}

I'm not sure why parsing the VCF file generates bad regions like these but I hope this work around helps out anyone else running into this issue.

Regards, Len

BrunoGrandePhD commented 2 years ago

In case it's useful for others, I was able to overcome this error by providing more memory to the job.