tgen / phoenix

Jetstream compatible workflow template supporting comprehensive analysis of human sequencing data against GRCh38
MIT License
17 stars 6 forks source link

Review of contamination of normal with tumor on various callers #305

Closed PedalheadPHX closed 4 years ago

PedalheadPHX commented 4 years ago

Review of Medusa versus Phoenix calls in PCL patients by @sskerget identified a MYC mutation ~35% AF that was lost in the new pipeline. It is called by 2 callers (Mutect2 and Strelka2). It is called by VarDict but does not make it into the mergedVCF, @ChristopheLegendre can you please review to define why it drops in prep steps, but I suspect it is because it is flagged as INFO/STATUS=LikelySomatic, not "StrongSomatic" that we require? In the case of Octopus and Lancet it is not a pass variant due to contamination of the normal with the same variant though it was not high at 1.3-1.5% AF. So clearly these callers are not passing variants that would pass our <2% AF in normal filters so leveraging their PASS flags has over filtered the variant list.

Sample: MMRF_1912_1_PB_CD3pos_C4_KBS5U-MMRF_1912_1_BM_CD138pos_T3_KBS5U

LANCET_All: chr8 127738842 . G T 242.969 HighVafNormal;HighAltCntNormal SHARED;FETS=242.969;TYPE=snv;LEN=1;KMERSIZE=13;SB=9.50464 GT:AD:SR:SA:DP 0/1:207,3:123,84:2,1:210/1:111,74:67,44:47,27:185

for Lancet clearly we need to change the command as it is using defaults for the following two parameters as they are not set. --max-vaf-normal, -i <float> : maximum variant allele frequency (AlleleCov/TotCov) in the normal [default: 0] --max-alt-count-normal, -m <int> : maximum alternative count in the normal [default: 0]

VARDICT_All: chr8 127738842 . G T 215 PASS STATUS=LikelySomatic;SAMPLE=MMRF_1912_1_BM_CD138pos_T3;TYPE=SNV;DP=187;VD=70;AF=0.3743;SHIFT3=0;MSI=2;MSILEN=3;SSF=0;SOR=45.715;LSEQ=AGTGCATCGACCCCTCGGTG;RSEQ=TCTTCCCCTACCCTCTCAAC GT:DP:VD:ALD:RD:AD:AF:BIAS:PMEAN:PSTD:QUAL:QSTD:SBF:ODDRATIO:MQ:SN:HIAF:ADJAF:NM 0/1:187:70:51,19:72,45:117,70:0.3743:2,2:22.9:1:35.1:1:0.15153:1.67:60:140:0.3763:0.0053:1.1 0/0:234:3:2,1:144,86:230,3:0.0128:2,2:18:1:38.3:1:1:1.19:60:6:0.0133:0:1

OCTOPUS_All chr8 127738842 . G T 2013.87 NC AC=1;AN=5;DP=495;MP=21.38;MQ=60;MQ0=0;NS=2;PP=2013.87;SOMATIC GT:GQ:DP:MQ:PS:PQ:MAP_VAF:VAF_CR:AD:AF:ADP:SB:FT 0|0|1:999:224:60:127738842:99:0.38:0.33,0.43:173:0.428:404:0.000:NC 0|0:999:271:60:127738842:99:.:.,.:.:.:422:.:PASS

@ChristopheLegendre Do you know why it is set as NC in the Filter flag

MERGED with MUTECT2 AND STRELKA2 Calls: chr8 127738842 . G T . . VTYPE=snv;CC=2;TPCE=MUTECT2;MUTECT2;STRELKA2;MUTECT2_CONTQ=93;MUTECT2_DP=502;MUTECT2_ECNT=1;MUTECT2_GERMQ=93;MUTECT2_MBQ=35,35;MUTECT2_MFRL=171,163;MUTECT2_MMQ=60,60;MUTECT2_MPOS=20;MUTECT2_NALOD=-4.027;MUTECT2_NLOD=66.29;MUTECT2_POPAF=6;MUTECT2_ROQ=93;MUTECT2_SEQQ=93;MUTECT2_STRANDQ=93;MUTECT2_TLOD=244.14;MUTECT2_OGT=0/0,0/1;STRELKA2_SOMATIC;STRELKA2_QSS=317;STRELKA2_TQSS=2;STRELKA2_NT=ref;STRELKA2_QSS_NT=3070;STRELKA2_TQSS_NT=1;STRELKA2_SGT=GG->GT;STRELKA2_DP=496;STRELKA2_MQ=60;STRELKA2_MQ0=0;STRELKA2_ReadPosRankSum=1.01;STRELKA2_SNVSB=0;STRELKA2_SomaticEVS=19.98;STRELKA2_S1_GT=0/0;STRELKA2_S1_DP=273;STRELKA2_S1_FDP=0;STRELKA2_S1_SDP=0;STRELKA2_S1_SUBDP=0;STRELKA2_S1_AU=1,1;STRELKA2_S1_CU=0,0;STRELKA2_S1_GU=269,270;STRELKA2_S1_TU=3,3;STRELKA2_S1_AR=0.011;STRELKA2_S1_AD=269,3;STRELKA2_S2_GT=0/1;STRELKA2_S2_DP=219;STRELKA2_S2_FDP=0;STRELKA2_S2_SDP=0;STRELKA2_S2_SUBDP=0;STRELKA2_S2_AU=0,0;STRELKA2_S2_CU=0,0;STRELKA2_S2_GU=132,135;STRELKA2_S2_TU=87,87;STRELKA2_S2_AR=0.3973;STRELKA2_S2_AD=132,87;STRELKA2_ID=.;STRELKA2_QUAL=.;STRELKA2_FILTER=PASS GT:DP:AR:AD 0/0:264:0.0114:261,3 0/1:220:0.3955:133,87

LostMyc

PedalheadPHX commented 4 years ago

Similar issue in MMRF_2531_1_BM_CD138pos_T1 NOT (MMRF_1323_1_PB_Whole_C2_KAS5U-MMRF_1323_1_BM_CD138pos_T1_KAS5U -- JJK Error?)

VCFmerger: chr8 127740906 rs1465173750 G A . . VTYPE=snv;CC=2;TPCE=MUTECT2;MUTECT2;STRELKA2;MUTECT2_CONTQ=93;MUTECT2_DP=564;MUTECT2_ECNT=1;MUTECT2_GERMQ=93;MUTECT2_MBQ=36,37;MUTECT2_MFRL=158,166;MUTECT2_MMQ=60,60;MUTECT2_MPOS=19;MUTECT2_NALOD=-1.749;MUTECT2_NLOD=72.41;MUTECT2_POPAF=6;MUTECT2_ROQ=93;MUTECT2_SEQQ=93;MUTECT2_STRANDQ=93;MUTECT2_TLOD=402.72;MUTECT2_OGT=0/0,0/1;STRELKA2_SOMATIC;STRELKA2_QSS=463;STRELKA2_TQSS=1;STRELKA2_NT=ref;STRELKA2_QSS_NT=3070;STRELKA2_TQSS_NT=1;STRELKA2_SGT=GG->AG;STRELKA2_DP=546;STRELKA2_MQ=60;STRELKA2_MQ0=0;STRELKA2_ReadPosRankSum=0.68;STRELKA2_SNVSB=0;STRELKA2_SomaticEVS=19.98;STRELKA2_S1_GT=0/0;STRELKA2_S1_DP=279;STRELKA2_S1_FDP=0;STRELKA2_S1_SDP=0;STRELKA2_S1_SUBDP=0;STRELKA2_S1_AU=2,2;STRELKA2_S1_CU=0,0;STRELKA2_S1_GU=277,278;STRELKA2_S1_TU=0,0;STRELKA2_S1_AR=0.0072;STRELKA2_S1_AD=277,2;STRELKA2_S2_GT=0/1;STRELKA2_S2_DP=266;STRELKA2_S2_FDP=1;STRELKA2_S2_SDP=0;STRELKA2_S2_SUBDP=0;STRELKA2_S2_AU=127,128;STRELKA2_S2_CU=0,0;STRELKA2_S2_GU=138,138;STRELKA2_S2_TU=0,0;STRELKA2_S2_AR=0.4792;STRELKA2_S2_AD=138,127;STRELKA2_ID=.;STRELKA2_QUAL=.;STRELKA2_FILTER=PASS;RNA_REF_COUNT=563;RNA_ALT_FREQ=0.520443;RNA_ALT_COUNT=611;DBSNPv152;ANN=A|missense_variant|MODERATE|MYC|ENSG00000136997|transcript|ENST00000613283.2|protein_coding|3/3|c.1313G>A|p.Arg438Gln|1313/1365|1313/1365|438/454|| GT:DP:AR:AD 0/0:271:0.0074:269,2 0/1:262:0.4809:136,126

Lancet_All: chr8 127740906 . G A 377.486 HighVafNormal;HighAltCntNormal SHARED;FETS=377.486;TYPE=snv;LEN=1;KMERSIZE=15;SB=10.4134 GT:AD:SR:SA:DP 0/1:221,2:111,110:0,2:223 0/1:105,106:51,54:47,59:211

VarDict_All: chr8 127740906 . G A 248 PASS STATUS=LikelySomatic;SAMPLE=MMRF_2531_1_BM_CD138pos_T1;TYPE=SNV;DP=208;VD=103;AF=0.4952;SHIFT3=0;MSI=2;MSILEN=3;SSF=0;SOR=104.596;LSEQ=AGAGGACTTGTTGCGGAAAC;RSEQ=ACGAGAACAGTTGAAACACA GT:DP:VD:ALD:RD:AD:AF:BIAS:PMEAN:PSTD:QUAL:QSTD:SBF:ODDRATIO:MQ:SN:HIAF:ADJAF:NM 0/1:208:103:48,55:49,56:105,103:0.4952:2,2:19.9:1:37.2:1:1:1.00259:60:102:0.4976:0.0433:1.1 0/0:217:2:0,2:109,106:215,2:0.0092:2,0:24.5:1:39:1:0.24654:0:60:4:0.0096:0:1

Octopus_All: chr8 127740906 . G A 3485.73 NC AC=1;AN=5;DP=544;MP=37.75;MQ=60;MQ0=0;NS=2;PP=3485.73;SOMATIC GT:GQ:DP:MQ:PS:PQ:MAP_VAF:VAF_CR:AD:AF:ADP:SB:FT 0|0|1:999:267:60:127740906:99:0.48:0.43,0.53:145:0.483:300:0.000:NC 0|0:999:277:60:127740906:99:.:.,.:.:.:310:.:PASS

ChristopheLegendre commented 4 years ago

Answer to the question: Do you know why it is set as NC in the filter flag.

NC stands for NormalContamination Here the definition of the NC flag from Octopus Website:

FilterName | description | Sample_Specific
NC | Number of reads supporting a called somatic haplotype in non-SOMATIC samples | Yes

And the definition within the VCF header: ##FILTER=<ID=NC,Description="Somatic haplotypes detected in a called normal sample"> Like you wrote: due to contamination in normal. Because the AD does not follow specs (it has only one value, not 2), we cannot see in Normal AD the presence of ALTs in the field and therefore NC tells us that ALTs have also been found in the Normal Sample.

Here are the default thresholds I have found in Octopus Website default_value("QUAL < 2 | GQ < 20 | MQ < 30 | SMQ < 40 | SB > 0.9 | SD > 0.9 | BQ < 20 | DP < 3 | MF > 0.2 | NC > 1 | FRF > 0.5") The Threshold is pretty stringent; If there are 2 or more ALTs in Normal, the NC flag is raised.

You will also have to check the realigned BAM to see if there are more or less T in the realigned BAMs.

About Lancet call

Her the issue is the same as in Octopus; The two filters tell us that the ALT Variant is also found in the Normal sample and has a higher VAF than the hardcoded threshold [HighVafNormal] along with a higher number of expected read counts having the ALT than expected [HighAltCntNormal]

About VarDict call filtering:

Yes, the call was filtered due to line --include 'FILTER == "PASS" & INFO/STATUS == "StrongSomatic"' added to the filtering step. I was not aware of that additional filter and I guess it was added later to the pipeline to further filter the VarDict's calls.

vcfMerger2

I can see that at least 3 ALTs are present in the normal AD flag; If we had a flag that says, NORMAL_AD_ALT<=3 or NORMAL_AD_ALT<=2, that could could be filtered out as well and not present in the vcfMerger2.

New questions/suggestions:

ChristopheLegendre commented 4 years ago

Just to be clear: this is not a problem with vcfMerger2. The current phoenix's vcfMerger2 module does not perform any variant filtering whatsoever. It only preps each variant caller VCFs and merges them losslessly.

The variant filtering step takes place in each variant caller module (last step) before running the vcfMerger2 module.

So, based on the filtering_steps, everything works ok to me (or am I missing smthg?) I can see the calls you copied may be borderline, but based on the filtering setups, none of them can pass.

PedalheadPHX commented 4 years ago

Pinging @bryce-turner to prioritize this issue

PedalheadPHX commented 4 years ago

Vardict solution might be to keep the "LikelySomatic" but that might massively inflate the callset