Closed anoronh4 closed 2 years ago
Refining Filters I had a few findings when comparing our calls to pcawg using the filters we have been trying out:
No Delly calls had more than 0 alt split reads in the normal, and only a small amount of Manta calls had supporting reads in the normal, so i think this part of the filter is not necessary. However for discordant reads supporting the alt in the normal I found the following:
table(dat.delly$in_pcawg,dat.delly$n_delly_DV > 0 )
FALSE TRUE
0 4956 112
1 7149 515
table(dat.manta$in_pcawg,dat.manta$n_manta_PR_alt > 0 )
FALSE TRUE
0 3682 658
1 7343 4
Seems like this filter in manta is more meaningful than the equivalent filter in delly.
For tumor support filters, seems like the split read filter is not meaningful for delly:
table(dat.delly$in_pcawg, dat.delly$t_delly_RV <2)
FALSE TRUE
0 3478 1590
1 5409 2255
And also removing a good number of true positive calls for manta:
table(dat.manta$in_pcawg, dat.manta$t_manta_SR_alt <2)
FALSE TRUE
0 3885 455
1 6862 485
If we remove the normal read support filter in delly (but keep the equiv in manta), and also remove tumor split read support filter in delly, we end up with a recall of 90%, precision of 72% and F-score of 79. There is an overall loss in accuracy compared to the filters in this PR, but still a large improvement over the accuracy before this PR.
Based on an analysis of precision and recall compared to other datasets, we are incorporating a few filters into Delly and Manta somatic structural variant outputs:
In this PR the variants will not be removed from the file entirely, instead the FILTER value will be updated with
normal_read_supp
andtumor_read_supp
, respectively. Based on a prior analysis where we compared to pcawg calls, the estimate accuracy of the pipeline including the aforementioned filters:Also the containers for strelka and manta were updated to
cmopipeline/strelka2-manta-bcftools-vt:2.0.0
. The previous containercmopipeline/strelka2_manta:latest
is much older and does not contain bcftools or vt, even though it seems to have the same version of manta and both have python2. The container folder forcmopipeline/strelka2_manta:latest
was removed as it is no longer necessary.