mskcc / vcf2maf

Convert a VCF into a MAF, where each variant is annotated to only one of all possible gene isoforms
Other
368 stars 215 forks source link

maf2vcf.pl doesn't recognise N bases #291

Open shadihames opened 3 years ago

shadihames commented 3 years ago

Hi, I'm working with MAF files that were generated by another person from Caveman/Pindel calls and I'm trying to convert them to VCF files to work with another tool. For some of the samples I get the error:

ERROR: MAF line 2040 (at 22:19178130) contains invalid alleles in Tumor_Seq_Allele or Match_Norm_Seq_Allele columns!

I've found this in a few files and it all seems to go back to the same position that has this Ref sequence:

AAAGATCACTGNTCACAGATCACCATACCATNTNNNGNNCN

I'm presuming that the issue is due to the N bases in the sequence, is this something that can be worked around, or would I have to pull our positions with N bases prior to converting with maf2vcf.pl?

Command:

perl maf2vcf.pl --input-maf sample.txt --output-dir sample --ref-fasta 37_decoy.fasta

Thanks!

drychkov commented 2 years ago

I've got the same error while using maf2vcf to convert maf to vcf.

@shadihames, did you figure out any workaround?

sssimonyang commented 2 years ago

I am trying to convert mc3.v0.2.8.PUBLIC.maf to vcf file and also came across this problem. And I just ignored those variants using awk.

sed 's/\r//' mc3.v0.2.8.PUBLIC.maf |awk 'NR==1 || ($13 ~ /^[AGCT\-]*$/ && $18 ~ /^[AGCT\-]*$/) {print $0}' > mc3.v0.2.8.PUBLIC.fixed.maf

perl ~/software/vcf2maf-1.6.21/maf2vcf.pl --input-maf mc3.v0.2.8.PUBLIC.fixed.maf --output-dir vcf --output-vcf mc3.v0.2.8.PUBLIC.vcf --ref-fasta ${hg19}

Hope to help you

zyllifeworld commented 1 month ago

For anyone who may still have this problem and want to change the behavior of maf2maf, maf2vcf, there is one possible solution, you can change the following code in maf2maf.pl (or maf2vcf.pl):

    unless( $al1=~m/^[ACGT-]*$/ and $al2=~m/^[ACGT-]*$/ and $n_al1=~m/^[ACGT-]*$/ and $n_al2=~m/^[ACGT-]*$/ ) {
        die "ERROR: MAF line $line_count (at $chr:$pos) contains invalid alleles in Tumor_Seq_Allele or Match_Norm_Seq_Allele columns!\n";
    }

to

    unless( $al1=~m/^[ACGT-]*$/ and $al2=~m/^[ACGT-]*$/ and $n_al1=~m/^[ACGT-]*$/ and $n_al2=~m/^[ACGT-]*$/ ) {
        warn "WARNING: MAF line $line_count (at $chr:$pos) contains invalid alleles in Tumor_Seq_Allele or Match_Norm_Seq_Allele columns!\n";
        next;
    }

This will skip lines those don't pass the check. (Take it at your own risk)