Closed ShixiangWang closed 6 years ago
Same errors no matter what genome build I use:
# use hg38 maf while ensembl v75 fa.gz
$ ../biotools/vcf2maf/maf2vcf.pl --input-maf ../data/TCGA.LUAD.mutect.hg38.somatic.maf --output-dir ./test_vcfs
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
ERROR: Make sure that ref-fasta is the same genome build as your MAF: /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
# use hg38 maf while ensembl v91 GRCh38 fa.gz
$../biotools/vcf2maf/maf2vcf.pl --input-maf ../data/TCGA.LUAD.mutect.hg38.somatic.maf --output-dir ./test_vcfs --ref-fasta ~/.vep/homo_sapiens/91_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
ERROR: Make sure that ref-fasta is the same genome build as your MAF: /home/diviner-wsx/.vep/homo_sapiens/91_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
# use hg19 maf while ensembl v75 fa.gz
$../biotools/vcf2maf/maf2vcf.pl --input-maf ~/Downloads/gsc_LUAD_pairs.aggregated.capture.tcga.uuid.automated.somatic.maf --output-dir ./test_vcfs --ref-fasta ~/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
ERROR: Make sure that ref-fasta is the same genome build as your MAF: /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
hg19 maf download from https://portal.gdc.cancer.gov/legacy-archive/files/b2e25bdf-f2b5-4a37-b330-05251ea09f2c
hg38 maf download from https://portal.gdc.cancer.gov/files/0458c57f-316c-4a7c-9294-ccd11c97c2f9
Please help me.
I try using UCSC hg19
and hg38
, and the outcome is
$ ../biotools/vcf2maf/maf2vcf.pl --input-maf ~/Downloads/gsc_LUAD_pairs.aggregated.capture.tcga.uuid.automated.somatic.maf --output-dir ./test_vcfs --ref-fasta ~/Annotation/reference/hg19.fa
[W::fai_fetch] Reference 19:58420782-58420784 not found in FASTA file, returning empty sequence
Failed to fetch sequence in 19:58420782-58420784
[W::fai_fetch] Reference 10:26581394-26581396 not found in FASTA file, returning empty sequence
Failed to fetch sequence in 10:26581394-26581396
[W::fai_fetch] Reference 14:20528636-20528638 not found in FASTA file, returning empty sequence
Failed to fetch sequence in 14:20528636-20528638
[W::fai_fetch] Reference 11:55657434-55657436 not found in FASTA file, returning empty sequence
Failed to fetch sequence in 11:55657434-55657436
[W::fai_fetch] Reference 1:78269058-78269060 not found in FASTA file, returning empty sequence
Failed to fetch sequence in 1:78269058-78269060
ERROR: Make sure that ref-fasta is the same genome build as your MAF: /home/diviner-wsx/Annotation/reference/hg19.fa
$ ../biotools/vcf2maf/maf2vcf.pl --input-maf ../data/TCGA.LUAD.mutect.hg38.somatic.maf --output-dir ./test_vcfs --ref-fasta ~/Annotation/reference/hg38.fa
$ cd test_vcfs/
$ ls -lh
total 1.8G
-rw-rw-r-- 1 diviner-wsx diviner-wsx 33K 3月 28 18:14 TCGA.LUAD.mutect.hg38.somatic.pairs.tsv
-rw-rw-r-- 1 diviner-wsx diviner-wsx 1.8G 3月 28 18:17 TCGA.LUAD.mutect.hg38.somatic.vcf
Thank your tool, the hg38 works.
So the hg19 geome build TCGA used is neither equals to ensembl v75 or UCSC hg19?
But the conent of vcf is very weird. :cry: How to explain this???
Left is outcome and the right is normal vcf file.
I open it using notepad++,
It seems normal, I am not sure. What the right part means?
@ShixiangWang there are many different files and commands you're using here. There is only one rule for maf2vcf - the --ref-fasta
chromosome names and lengths must exactly match the chromosomes used in the --input-maf
.
Thanks, I will write a script to check the output.
发自我的 iPhone
在 2018年3月28日,下午10:13,Cyriac Kandoth notifications@github.com<mailto:notifications@github.com> 写道:
@ShixiangWanghttps://github.com/ShixiangWang there are many different files and commands you're using here. There is only one rule for maf2vcf - the --ref-fasta chromosome names and lengths must exactly match the chromosomes used in the --input-maf.
― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mskcc/vcf2maf/issues/162#issuecomment-376900796, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AX5Y5DOg3Xov5kkNApywqdrwm3in9rUoks5ti5nngaJpZM4S-GNY.
It is most likely that your hg19.fa
uses chr
-prefixes for the chromosome names, while the MAF file does not. For example, your hg19.fa
uses chr7
for chromosome 7, while the MAF uses just 7
. Read more about this issue here - https://www.biostars.org/p/119295/#119308
@ckandoth Yes, the hg19.fa
is different from GRCh37.fa
$ head Annotation/reference/hg19.fa
>chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
$ zcat ~/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz | head
>1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 REF
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
While Neither Ensembl v75
or hg19.fa
from UCSC can work with this hg19
maf file download from TCGA https://portal.gdc.cancer.gov/legacy-archive/files/b2e25bdf-f2b5-4a37-b330-05251ea09f2c.
$.maf2vcf.pl --input-maf ~/Downloads/gsc_LUAD_pairs.aggregated.capture.tcga.uuid.automated.somatic.maf --output-dir ./test_vcfs --ref-fasta ~/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
Could not load fai index of /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
ERROR: Make sure that ref-fasta is the same genome build as your MAF: /home/diviner-wsx/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
$ maf2vcf.pl --input-maf ~/Downloads/gsc_LUAD_pairs.aggregated.capture.tcga.uuid.automated.somatic.maf --output-dir ./test_vcfs --ref-fasta ~/Annotation/reference/hg19.fa
[W::fai_fetch] Reference 19:58420782-58420784 not found in FASTA file, returning empty sequence
Failed to fetch sequence in 19:58420782-58420784
[W::fai_fetch] Reference 10:26581394-26581396 not found in FASTA file, returning empty sequence
Failed to fetch sequence in 10:26581394-26581396
[W::fai_fetch] Reference 14:20528636-20528638 not found in FASTA file, returning empty sequence
Failed to fetch sequence in 14:20528636-20528638
[W::fai_fetch] Reference 11:55657434-55657436 not found in FASTA file, returning empty sequence
Failed to fetch sequence in 11:55657434-55657436
[W::fai_fetch] Reference 1:78269058-78269060 not found in FASTA file, returning empty sequence
Failed to fetch sequence in 1:78269058-78269060
ERROR: Make sure that ref-fasta is the same genome build as your MAF: /home/diviner-wsx/Annotation/reference/hg19.fa
Should I use GRCh 37.p13
from ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p13/ instead of Ensembl v75
? I am haunted by the three different source and I don't know how to pick a correct assembly.fa
to work with TCGA hg19 maf file. I don't want to bother you more but can you give me a suggestion?
Thank you very much, Shixiang
The error [E::fai_build3] Cannot index files compressed with gzip, please use bgzip
is easy to fix. It is usually due to having an older version of Perl. If it is not easy for you to upgrade your Perl, then unzip the GRCh37
fasta using gzip -d
. VEP should work fine with unzipped FASTAs.
@ckandoth Thanks for your help.
Hi,
I want to transform hg38 maf to vcf,
I know how to download the file, but the info remind me using GRCh37 while maf file I use based on hg38. Can I use this command, or I should use hg19 maf file?
Best wishes, Shixiang