Open skanwal opened 4 months ago
Hello,
The "Reference_Allele" will be updated to the new reference genome (hg38 in your case) after liftover. I checked, all the reference alleles in your "PAAD_atlas.liftover.maf" file are indeed matched to the sequence of hg38/GRCh38.
This probably indicates the somatic variants in your "PAAD_atlas.tmp.maf" file are NOT real mutations.
Hope this helps.
Liguo
Thanks for the response, Liguo.
The "Reference_Allele" will be updated to the new reference genome (hg38 in your case) after liftover.
That part makes sense. What I am unsure is why the Tumor_Seq_Allele2
has been updated? For example for row one (Hugo_Symbol - FAM231B) - Tumor_Seq_Allele2
has been updated to C
as compared to T
in my original file. Should Tumor_Seq_Allele2 still be T
as it wouldn't have been updated as part of liftover?
I am finding it hard to understand how so many calls in my "PAAD_atlas.tmp.maf"
are not real.
I have done MAF analysis on my v37 data and the SNV numbers are as below:
Carrying the same analysis on liftover (hg38) MAF, I get few to none SNV calls:
Which suggests I am losing 98% of variants after liftover?
Oh, I know what happened!! It looks like a bug. I will update and release a new version soon. Thanks
Thanks for the response, Liguo.
The "Reference_Allele" will be updated to the new reference genome (hg38 in your case) after liftover.
That part makes sense. What I am unsure is why the
Tumor_Seq_Allele2
has been updated? For example for row one (Hugo_Symbol - FAM231B) -Tumor_Seq_Allele2
has been updated toC
as compared toT
in my original file. Should Tumor_Seq_Allele2 still beT
as it wouldn't have been updated as part of liftover?I am finding it hard to understand how so many calls in my
"PAAD_atlas.tmp.maf"
are not real.I have done MAF analysis on my v37 data and the SNV numbers are as below:
![]()
Carrying the same analysis on liftover (hg38) MAF, I get few to none SNV calls:
Which suggests I am losing 98% of variants after liftover?
After read the MAF specificaion again, I think the original code is still correct. According to https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/, the Ref_Allele is in the 11th column, while your input file has the Ref_Allele in the 10th column, please double check.
Many thanks @liguowang - updating order of columns in my original data has resolved the issue.
Can I ask another question - how is CrossMap handling liftover of insertion in MAF. For example an entry in v37 MAF was:
sample_id Hugo_Symbol NCBI_Build Chromosome Start_Position End_Position Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele2 Tumor_Sample_Barcode HGVSp_Short aa_mutation
KRAS-wt_Croagh_subset LINC00383 37 13 69791655 69791656 5'Flank INS - T MON3__PRJ180359_MON3-T-somatic.pcgr_acmg.grch37.vcf NA
After liftover to hg38 it becomes:
sample_id Hugo_Symbol NCBI_Build Chromosome Start_Position End_Position Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele2 Tumor_Sample_Barcode HGVSp_Short aa_mutation
KRAS-wt_Croagh_subset LINC00383 hg38 chr13 69217523 69217524 5'Flank INS AT T MON3__PRJ180359_MON3-T-somatic.pcgr_acmg.grch37.vcf NA
Can you please explain the rationale behind updating Reference_Allele
from -
to AT
in this case or in general changing from -
to actual nucleotides on that position for INS?
"-/T" is NOT the standard way to represent an insertion (at least according to VCF's specification; maybe MAF has its own rule, which I am not sure).
"AT/T" indicates an "A" was inserted into the REF genome.
"-/T" is NOT the standard way to represent an insertion (at least according to VCF's specification; maybe MAF has its own rule, which I am not sure).
Indeed MAF has this rule to represent insertion using -
.
From https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/:
11 - Reference_Allele | The plus strand reference allele at this position. Includes the deleted sequence for a deletion or "-" for an insertion
"AT/T" indicates an "A" was inserted into the REF genome.
According to VCF spec, this would indicate a deletion i.e. A
from the reference was deleted in the alternate allele.
Based on this, would it make sense to stick to MAF spec and indicate INSERTIONS using -
in the Reference_Allele
column in the output of CrossMap maf module?
Hello,
Thanks for this useful utility. I have data in MAF format (NCBI build 37). I am trying to lift it over to hg38 using the following Crossmap (v0.6.6) command:
Liftover file was downloaded from https://github.com/broadinstitute/gatk/blob/083aac832cb64515fd0456008bf847dd22f6c234/scripts/funcotator/data_sources/gnomAD/b37ToHg38.over.chain
The command runs successfully with following output:
However, after inspecting the output I have realised that the
Reference
andTumor_Seq_Allele2
are both the same in the lifted over maf file. For example, the head of output looks like:In comparison, the head of original (genome build 37) file is:
It seems the program is updating both reference and alternate alleles. Can you please help me debug the issue? Thanks.