Closed PedalheadPHX closed 4 years ago
Similar issue in MMRF_2531_1_BM_CD138pos_T1 NOT (MMRF_1323_1_PB_Whole_C2_KAS5U-MMRF_1323_1_BM_CD138pos_T1_KAS5U -- JJK Error?)
VCFmerger:
chr8 127740906 rs1465173750 G A . . VTYPE=snv;CC=2;TPCE=MUTECT2;MUTECT2;STRELKA2;MUTECT2_CONTQ=93;MUTECT2_DP=564;MUTECT2_ECNT=1;MUTECT2_GERMQ=93;MUTECT2_MBQ=36,37;MUTECT2_MFRL=158,166;MUTECT2_MMQ=60,60;MUTECT2_MPOS=19;MUTECT2_NALOD=-1.749;MUTECT2_NLOD=72.41;MUTECT2_POPAF=6;MUTECT2_ROQ=93;MUTECT2_SEQQ=93;MUTECT2_STRANDQ=93;MUTECT2_TLOD=402.72;MUTECT2_OGT=0/0,0/1;STRELKA2_SOMATIC;STRELKA2_QSS=463;STRELKA2_TQSS=1;STRELKA2_NT=ref;STRELKA2_QSS_NT=3070;STRELKA2_TQSS_NT=1;STRELKA2_SGT=GG->AG;STRELKA2_DP=546;STRELKA2_MQ=60;STRELKA2_MQ0=0;STRELKA2_ReadPosRankSum=0.68;STRELKA2_SNVSB=0;STRELKA2_SomaticEVS=19.98;STRELKA2_S1_GT=0/0;STRELKA2_S1_DP=279;STRELKA2_S1_FDP=0;STRELKA2_S1_SDP=0;STRELKA2_S1_SUBDP=0;STRELKA2_S1_AU=2,2;STRELKA2_S1_CU=0,0;STRELKA2_S1_GU=277,278;STRELKA2_S1_TU=0,0;STRELKA2_S1_AR=0.0072;STRELKA2_S1_AD=277,2;STRELKA2_S2_GT=0/1;STRELKA2_S2_DP=266;STRELKA2_S2_FDP=1;STRELKA2_S2_SDP=0;STRELKA2_S2_SUBDP=0;STRELKA2_S2_AU=127,128;STRELKA2_S2_CU=0,0;STRELKA2_S2_GU=138,138;STRELKA2_S2_TU=0,0;STRELKA2_S2_AR=0.4792;STRELKA2_S2_AD=138,127;STRELKA2_ID=.;STRELKA2_QUAL=.;STRELKA2_FILTER=PASS;RNA_REF_COUNT=563;RNA_ALT_FREQ=0.520443;RNA_ALT_COUNT=611;DBSNPv152;ANN=A|missense_variant|MODERATE|MYC|ENSG00000136997|transcript|ENST00000613283.2|protein_coding|3/3|c.1313G>A|p.Arg438Gln|1313/1365|1313/1365|438/454|| GT:DP:AR:AD 0/0:271:0.0074:269,2 0/1:262:0.4809:136,126
Lancet_All:
chr8 127740906 . G A 377.486 HighVafNormal;HighAltCntNormal SHARED;FETS=377.486;TYPE=snv;LEN=1;KMERSIZE=15;SB=10.4134 GT:AD:SR:SA:DP 0/1:221,2:111,110:0,2:223 0/1:105,106:51,54:47,59:211
VarDict_All:
chr8 127740906 . G A 248 PASS STATUS=LikelySomatic;SAMPLE=MMRF_2531_1_BM_CD138pos_T1;TYPE=SNV;DP=208;VD=103;AF=0.4952;SHIFT3=0;MSI=2;MSILEN=3;SSF=0;SOR=104.596;LSEQ=AGAGGACTTGTTGCGGAAAC;RSEQ=ACGAGAACAGTTGAAACACA GT:DP:VD:ALD:RD:AD:AF:BIAS:PMEAN:PSTD:QUAL:QSTD:SBF:ODDRATIO:MQ:SN:HIAF:ADJAF:NM 0/1:208:103:48,55:49,56:105,103:0.4952:2,2:19.9:1:37.2:1:1:1.00259:60:102:0.4976:0.0433:1.1 0/0:217:2:0,2:109,106:215,2:0.0092:2,0:24.5:1:39:1:0.24654:0:60:4:0.0096:0:1
Octopus_All:
chr8 127740906 . G A 3485.73 NC AC=1;AN=5;DP=544;MP=37.75;MQ=60;MQ0=0;NS=2;PP=3485.73;SOMATIC GT:GQ:DP:MQ:PS:PQ:MAP_VAF:VAF_CR:AD:AF:ADP:SB:FT 0|0|1:999:267:60:127740906:99:0.48:0.43,0.53:145:0.483:300:0.000:NC 0|0:999:277:60:127740906:99:.:.,.:.:.:310:.:PASS
NC
in the filter flag.NC
stands for Normal
Contamination
Here the definition of the NC flag from Octopus Website:
FilterName | description | Sample_Specific
NC | Number of reads supporting a called somatic haplotype in non-SOMATIC samples | Yes
And the definition within the VCF header:
##FILTER=<ID=NC,Description="Somatic haplotypes detected in a called normal sample">
Like you wrote: due to contamination in normal.
Because the AD
does not follow specs (it has only one value, not 2), we cannot see in Normal AD
the presence of ALTs in the field and therefore NC
tells us that ALTs have also been found in the Normal Sample.
Here are the default thresholds I have found in Octopus Website
default_value("QUAL < 2 | GQ < 20 | MQ < 30 | SMQ < 40 | SB > 0.9 | SD > 0.9 | BQ < 20 | DP < 3 | MF > 0.2 | NC > 1 | FRF > 0.5")
The Threshold is pretty stringent; If there are 2 or more ALTs in Normal, the NC
flag is raised.
You will also have to check the realigned BAM to see if there are more or less T in the realigned BAMs.
Her the issue is the same as in Octopus; The two filters tell us that the ALT Variant is also found in the Normal sample and has a higher VAF than the hardcoded threshold [HighVafNormal
] along with a higher number of expected read counts having the ALT than expected [HighAltCntNormal
]
Yes, the call was filtered due to line --include 'FILTER == "PASS" & INFO/STATUS == "StrongSomatic"'
added to the filtering step.
I was not aware of that additional filter and I guess it was added later to the pipeline to further filter the VarDict's calls.
I can see that at least 3 ALTs are present in the normal AD flag; If we had a flag that says, NORMAL_AD_ALT<=3 or NORMAL_AD_ALT<=2, that could could be filtered out as well and not present in the vcfMerger2.
Two variants are telling us that we might have some small fraction of tumor contamination in the normal sample; Maybe these two callers have too stringent thresholds and we might have to modify them?
With Octopus
, We may add NC>5
replacing the default NC>1
,
or
Proceed as I did with AF
, ADP
and AD
flags, by selecting values that will not filter anything and add the flags in the FORMAT field. We will then be able to filter manually at the filter step and decide ourselves the threshold value.
Note: we could proceed the same way for all the filters currently present in the line
--somatic-filter-expression "QUAL < 2 | GQ < 20 | MQ < 30 | SMQ < 40 | SD > 0.9 | BQ < 20 | DP < 3 | MF > 0.2 | NC > 1 | FRF > 0.5 | AD < -1 | AF < -1 | ADP > 100000000 | SB < -1 "
About Lancet: We could run Lancet with the option that creates the DeBruijn graph for visual inspection for borderline variants or modify some of the default filtering options, such as:
Lancet Filters
--min-alt-count-tumor, -a <int> : minimum alternative count in the tumor [default: 3]
--max-alt-count-normal, -m <int> : maximum alternative count in the normal [default: 0]
--min-vaf-tumor, -e <float> : minimum variant allele frequency (AlleleCov/TotCov) in the tumor [default: 0.04]
--max-vaf-normal, -i <float> : maximum variant allele frequency (AlleleCov/TotCov) in the normal [default: 0]
--min-coverage-tumor, -o <int> : minimum coverage in the tumor [default: 4]
--max-coverage-tumor, -y <int> : maximum coverage in the tumor [default: 1000000]
--min-coverage-normal, -z <int> : minimum coverage in the normal [default: 10]
--max-coverage-normal, -j <int> : maximum coverage in the normal [default: 1000000]
--min-phred-fisher, -s <float> : minimum fisher exact test score [default: 5]
--min-phred-fisher-str, -E <float> : minimum fisher exact test score for STR mutations [default: 25]
--min-strand-bias, -f <float> : minimum strand bias threshold [default: 1]
I really thought we added the --max-alt-count-normal
option to the lancet command and set it at 2; Currently the default is zero and I think it is too stringent.
Just to be clear: this is not a problem with vcfMerger2
.
The current phoenix's vcfMerger2
module does not perform any variant filtering whatsoever. It only preps each variant caller VCFs and merges them losslessly.
The variant filtering step takes place in each variant caller module (last step) before running the vcfMerger2
module.
So, based on the filtering_steps, everything works ok to me (or am I missing smthg?) I can see the calls you copied may be borderline, but based on the filtering setups, none of them can pass.
Pinging @bryce-turner to prioritize this issue
Vardict solution might be to keep the "LikelySomatic" but that might massively inflate the callset
Review of Medusa versus Phoenix calls in PCL patients by @sskerget identified a MYC mutation ~35% AF that was lost in the new pipeline. It is called by 2 callers (Mutect2 and Strelka2). It is called by VarDict but does not make it into the mergedVCF, @ChristopheLegendre can you please review to define why it drops in prep steps, but I suspect it is because it is flagged as INFO/STATUS=LikelySomatic, not "StrongSomatic" that we require? In the case of Octopus and Lancet it is not a pass variant due to contamination of the normal with the same variant though it was not high at 1.3-1.5% AF. So clearly these callers are not passing variants that would pass our <2% AF in normal filters so leveraging their PASS flags has over filtered the variant list.
Sample: MMRF_1912_1_PB_CD3pos_C4_KBS5U-MMRF_1912_1_BM_CD138pos_T3_KBS5U
LANCET_All:
chr8 127738842 . G T 242.969 HighVafNormal;HighAltCntNormal SHARED;FETS=242.969;TYPE=snv;LEN=1;KMERSIZE=13;SB=9.50464 GT:AD:SR:SA:DP 0/1:207,3:123,84:2,1:210/1:111,74:67,44:47,27:185
for Lancet clearly we need to change the command as it is using defaults for the following two parameters as they are not set.
--max-vaf-normal, -i <float> : maximum variant allele frequency (AlleleCov/TotCov) in the normal [default: 0] --max-alt-count-normal, -m <int> : maximum alternative count in the normal [default: 0]
VARDICT_All:
chr8 127738842 . G T 215 PASS STATUS=LikelySomatic;SAMPLE=MMRF_1912_1_BM_CD138pos_T3;TYPE=SNV;DP=187;VD=70;AF=0.3743;SHIFT3=0;MSI=2;MSILEN=3;SSF=0;SOR=45.715;LSEQ=AGTGCATCGACCCCTCGGTG;RSEQ=TCTTCCCCTACCCTCTCAAC GT:DP:VD:ALD:RD:AD:AF:BIAS:PMEAN:PSTD:QUAL:QSTD:SBF:ODDRATIO:MQ:SN:HIAF:ADJAF:NM 0/1:187:70:51,19:72,45:117,70:0.3743:2,2:22.9:1:35.1:1:0.15153:1.67:60:140:0.3763:0.0053:1.1 0/0:234:3:2,1:144,86:230,3:0.0128:2,2:18:1:38.3:1:1:1.19:60:6:0.0133:0:1
OCTOPUS_All
chr8 127738842 . G T 2013.87 NC AC=1;AN=5;DP=495;MP=21.38;MQ=60;MQ0=0;NS=2;PP=2013.87;SOMATIC GT:GQ:DP:MQ:PS:PQ:MAP_VAF:VAF_CR:AD:AF:ADP:SB:FT 0|0|1:999:224:60:127738842:99:0.38:0.33,0.43:173:0.428:404:0.000:NC 0|0:999:271:60:127738842:99:.:.,.:.:.:422:.:PASS
@ChristopheLegendre Do you know why it is set as NC in the Filter flag
MERGED with MUTECT2 AND STRELKA2 Calls:
chr8 127738842 . G T . . VTYPE=snv;CC=2;TPCE=MUTECT2;MUTECT2;STRELKA2;MUTECT2_CONTQ=93;MUTECT2_DP=502;MUTECT2_ECNT=1;MUTECT2_GERMQ=93;MUTECT2_MBQ=35,35;MUTECT2_MFRL=171,163;MUTECT2_MMQ=60,60;MUTECT2_MPOS=20;MUTECT2_NALOD=-4.027;MUTECT2_NLOD=66.29;MUTECT2_POPAF=6;MUTECT2_ROQ=93;MUTECT2_SEQQ=93;MUTECT2_STRANDQ=93;MUTECT2_TLOD=244.14;MUTECT2_OGT=0/0,0/1;STRELKA2_SOMATIC;STRELKA2_QSS=317;STRELKA2_TQSS=2;STRELKA2_NT=ref;STRELKA2_QSS_NT=3070;STRELKA2_TQSS_NT=1;STRELKA2_SGT=GG->GT;STRELKA2_DP=496;STRELKA2_MQ=60;STRELKA2_MQ0=0;STRELKA2_ReadPosRankSum=1.01;STRELKA2_SNVSB=0;STRELKA2_SomaticEVS=19.98;STRELKA2_S1_GT=0/0;STRELKA2_S1_DP=273;STRELKA2_S1_FDP=0;STRELKA2_S1_SDP=0;STRELKA2_S1_SUBDP=0;STRELKA2_S1_AU=1,1;STRELKA2_S1_CU=0,0;STRELKA2_S1_GU=269,270;STRELKA2_S1_TU=3,3;STRELKA2_S1_AR=0.011;STRELKA2_S1_AD=269,3;STRELKA2_S2_GT=0/1;STRELKA2_S2_DP=219;STRELKA2_S2_FDP=0;STRELKA2_S2_SDP=0;STRELKA2_S2_SUBDP=0;STRELKA2_S2_AU=0,0;STRELKA2_S2_CU=0,0;STRELKA2_S2_GU=132,135;STRELKA2_S2_TU=87,87;STRELKA2_S2_AR=0.3973;STRELKA2_S2_AD=132,87;STRELKA2_ID=.;STRELKA2_QUAL=.;STRELKA2_FILTER=PASS GT:DP:AR:AD 0/0:264:0.0114:261,3 0/1:220:0.3955:133,87