Closed raylim closed 7 years ago
Hi Ray. Position 2:220439701
in GRCh37 contains a C
, and the MAF specifies that an additional CT
has been inserted immediately after it. In VCF format, this can be represented as a C
->CCT
at position 220439701
. Why do you think the VCF should say 2:220439700 G/GCT
?
the maf says that positions 2:220439701-220439702
are CT. 2:220439700
is a G, hence 2:220439700 G/GCT
No, the MAF says that CT
was inserted between positions 2:220439701
and 2:220439702
.
Maybe the confusion is that 2:220439701-220439702
is also CT
in the GRCh37 reference, but that's just a coincidence.
I've confirmed that the variant is G/GCT in a bam file for which it was called. I've attached the mpileup. The variant is actually at 2:220439700
.
mpileup.txt
Sounds good. Then the input MAF format is incorrect. Where did you get it from?
the maf is straight from msk-impact
The DMP pipeline for MSK-IMPACT doesn't generate MAFs - they create a flattened VCF-like file, which maf2maf and maf2vcf are able to operate on. Can you point me to the raw pipeline results? Somewhere on Luna?
on saba2: /home/limr/share/data/grail_cfdna_tidy/validation_prostate/rawdata/msk-impact/msk-impact/data_mutations_extended.txt
my 2 cents, this file is generated by portal itself, so probably the conversion for the portal from dmp based vcf like format to maf format has issues.
Yup makes sense. I worked with @zheins in March to fix this - https://github.com/cBioPortal/genome-nexus-annotation-pipeline/pull/16 - you likely need to rerun the conversion.
@raylim @ckandoth The mskimpact maf has been corrected. The data was never refreshed after fixing the issue. I re-ran fetching/annotation on all samples with insertions.
INHA 0 MSKCC GRCh37 2 220439700 220439701 + Frame_Shift_Ins INS - - CT
Thanks @zheins and @ckandoth for helping resolve this.
For example, the following maf input gives the wrong vcf output
Resulting vcf entry:
The vcf entry should be 2:220439700 G/GCT