Closed mbhall88 closed 3 years ago
I have run the FRS filter with and without a bunch of other filters and here are the results on the 7 validation samples
X-axis label key
none
: no filters0ea6325
: the commit of pandora where I implemented the cluster size expectation fix (https://github.com/rmcolq/pandora/issues/262). The filters for this label are min covg 3, min gt conf 15, strand bias 0.01K<float>
: Minimum FRS threshold of <float>
and the same filters as 0ea6325
(i.e the same filters we were using before the FRS filter was added)K<float>-only
: Minimum FRS threshold of <float>
and no other filtersK<float>s1
: Min. FRS <float>
and strand bias 0.01 (i.e 1%)K<float>s1d3
: As above with min covg 3K<float>s1d3g5
: As above with min. GT_CONF of 5nodenovo
: Pandora with no de novo variant calling (note: this is prior to the cluster size fix, but with the same filters as 0ea6325
)Note: compass and bcftools only call SNPs
I am in the process of rerunning bcftools with the min. FRS filter also.
Pandora Filters:
Looks great. V glad to see tight distribution of orange near zero. Hope this results on pretty congruent clusters
I was actually thinking it might make sense to use different filters for compare? The current ones could be a little harsh?
No harm in looking (what actually are prexision and recall for Pandora compare?), but being a bit brutal, all we need is for, in the range "compass distance <=10", we need Pandora distance to be < some small X, and in the range "Pandora distance <=X" ideally no extra dots. Ie we just need to deliver the same or v similar clusters
Pandora Filters:
After re-running bcftools with the new FRS filter (0.9 and 0.85 threshold) we get the following results. Note: I realised halfway through that I had since changed the varifier version too, so I included the intermediate results to see the effect of the new varifier version
The (close) dotplot before and after FRS for bcftools (this is FRS 0.9)
Something weird with your x axis tickmarks on latest recall and precision. Second tickmark along has no label. Means second box plot has no label?
No, they're correct. The labels are on a 45-degree angle. So you kind of go up from the middle of the label text.
Off the back of the analysis in #61 we need to add a filter for both pandora and bcftools to filter based on the minimum fraction of support on the called allele. This is because, at least for pandora (see #61), a large number of our FP calls are due to positions where Illumina/compass filters due to HET calls or low support for the called base.