mskcc / tempo

CCS research pipeline to process WES and WGS TN pairs
https://cmotempo.netlify.com/
12 stars 5 forks source link

Some OncoKB variants are missing in the TEMPO filtered maf #776

Open ShwetaCh opened 4 years ago

ShwetaCh commented 4 years ago

While looking into the WES-recapture project, I tried to find evidence for IMPACT variants in WES and found that some were missing inspite of sufficient coverage/quality.

Details below: Out of 326, there are 102 with VAF > 5% and 70 with VAF > 10% that are missing. For each of these variants I have tracked them to see if they were present in the unfiltered maf and then if still not found, I traced them to the VCF, and if still not found, I've tried to reason them myself looking at the exome_config file which dictates the read depth and other filter thresholds. I've documented the reasons for missing based on my investigation in a google sheet and happy to share that sheet.

In summary, out of those 326, 80 of them missing were not explained by any FILTER tag in maf or vcf or by the filter criteria in the exome_config. This is a very high FN rate.

On spot checking these variants were present in Roslin, and when reran with new version of TEMPO, they were still not found (based on spot checking only). So the issue does not seem to be version specific.

gongyixiao commented 4 years ago

Issue was not complete yet. But I did talk to Shweta. I think in order to fix this, we need to disable scratch=true in juno.config to save all the temp files in work directory. And then starting from mutect2 and strelka2 to look for these missing variants in every step to see where they were excluded. Then find a way to rescue them.

They are very important variants we need to at least figure out why they were excluded.

gongyixiao commented 3 years ago

Any updates on this @ShwetaCh @md09

ShwetaCh commented 3 years ago

@gongyixiao @md09

I annotated both IMPACT and WES with the most updated OncoKB and moved on with the wes recapture project since impact and wes maf were not comparable for various reasons (including they were not annotated at the same time, so oncokb data/api versions were not the same, nor could the data version be fixed and tempo did not run with cancer types info at the time, see slack channel #tempo_oncokb for more details)

I believe it was concluded that OncoKB annotations would be analyst's responsibility, especially for recapture projects, since IMPACT will always be most updated with Oncokb annotations, and won't necessarily be comparable to the version used inside tempo, unless you have now found a way to make the two mafs more comparable (?)

gongyixiao commented 3 years ago

@ShwetaCh

Thank you for the update. I agree with you that the OncoKB annotation between IMPACT and WES mafs are not necessarily to be comparable for the reasons you stated above.

However, I just want to make sure that the OncoKB variants are not missing significantly in WES mafs regardless of their OncoKB annotation info. What do you think?

ShwetaCh commented 3 years ago

@gongyixiao @md09

True, I think we will need to have this tested specifically with a set of recently tempo-ed recapture sample so that they match what is seen at a the impact end. (I've not worked on any new WES set recently so I cannot confidently say that we are OK). Alternatively, quickest way would be to check with someone in CCS, to whom TEMPO has delivered wes most recently. Could we find this from the tempo delivery log?

gongyixiao commented 1 year ago

Is this the same as this issue https://github.com/mskcc/tempo/issues/922?