mskcc / vcf2maf

Convert a VCF into a MAF, where each variant is annotated to only one of all possible gene isoforms
Other
375 stars 218 forks source link

Problem with running maf2vcf using GRch38 as reference fasta #187

Open Frank-LSY opened 6 years ago

Frank-LSY commented 6 years ago

I tried to run the maf2vcf by using reference fasta GRch38 p.12 which was downloaded from NCBI. But I continuously met with the same problem. Every time, the "not found chromosome" are different.

lushuyudeMacBook-Pro:vcf2maf frank-lsy$ perl maf2vcf.pl --input-maf ../456.maf --output-dir ../2 --ref-fasta ../GCF_000001405.38_GRCh38.p12_genomic.fna Can't exec "/usr/local/bin/samtools": Argument list too long at maf2vcf.pl line 106. Use of uninitialized value in concatenation (.) or string at maf2vcf.pl line 106. Can't exec "/usr/local/bin/samtools": Argument list too long at maf2vcf.pl line 106. Use of uninitialized value in concatenation (.) or string at maf2vcf.pl line 106. Can't exec "/usr/local/bin/samtools": Argument list too long at maf2vcf.pl line 106. Use of uninitialized value in concatenation (.) or string at maf2vcf.pl line 106. Can't exec "/usr/local/bin/samtools": Argument list too long at maf2vcf.pl line 106. Use of uninitialized value in concatenation (.) or string at maf2vcf.pl line 106. [W::fai_get_val] Reference 10:69103935-69103937 not found in file, returning empty sequence [faidx] Failed to fetch sequence in 10:69103935-69103937 ERROR: Make sure that ref-fasta is the same genome build as your MAF: ../GCF_000001405.38_GRCh38.p12_genomic.fna

Can you help me explain with that? Thanks.

paradoxechen commented 6 years ago

It seems that mat2vcf can not handle an enormous maf file? I used the tcga_esca maf file (65Mb) and tcga_read maf file(~92Mb) and it reported just the same error as yours. But it worked properly when I used the tcga_thym maf file(~6.9Mb) or tcga_meso maf file(~5.6 Mb). GRCh38.d1.vd1.fa.gz was used as the ref-fasta as the TCGA collaboration did. Hoping the coder might solve this problem.

paradoxechen commented 6 years ago

I am sorry to comment as above. today, I used the maf2vcf on a linux computer and it worked well. I guess the error might be caused by the incompatibility between the script and MacOS. LInux is recommended to you.

Frank-LSY commented 6 years ago

Oh! I've already figured out where the problem lies in. It's due to the different maximum word length of MacOS and Linux: Mac is only 2^18 and Linux is 2^21. There are comments around 101-102 lines which indicate that you need to fix the number on line 105 to adjust your own system. By changing the number to 2000, I could easily run the script without any error.