mbhall88 / head_to_head_pipeline

Snakemake pipelines to run the analysis for the Illumina vs. Nanopore comparison.
GNU General Public License v3.0
5 stars 2 forks source link

Add minimum FRS filter #62

Closed mbhall88 closed 3 years ago

mbhall88 commented 3 years ago

Off the back of the analysis in #61 we need to add a filter for both pandora and bcftools to filter based on the minimum fraction of support on the called allele. This is because, at least for pandora (see #61), a large number of our FP calls are due to positions where Illumina/compass filters due to HET calls or low support for the called base.

mbhall88 commented 3 years ago

I have run the FRS filter with and without a bunch of other filters and here are the results on the 7 validation samples

X-axis label key

SNP recall

image

SNP precision

image

Indel recall

image

Indel precision

image

All variants recall

Note: compass and bcftools only call SNPs

image

All variants precision

image


I am in the process of rerunning bcftools with the min. FRS filter also.

mbhall88 commented 3 years ago

Pandora Filters:

Close dotplot

close_dotplot_K0 85s1d3g5

Full dotplot

dotplot_K0 85s1d3g5

iqbal-lab commented 3 years ago

Looks great. V glad to see tight distribution of orange near zero. Hope this results on pretty congruent clusters

mbhall88 commented 3 years ago

I was actually thinking it might make sense to use different filters for compare? The current ones could be a little harsh?

iqbal-lab commented 3 years ago

No harm in looking (what actually are prexision and recall for Pandora compare?), but being a bit brutal, all we need is for, in the range "compass distance <=10", we need Pandora distance to be < some small X, and in the range "Pandora distance <=X" ideally no extra dots. Ie we just need to deliver the same or v similar clusters

mbhall88 commented 3 years ago

Pandora Filters:

Close dotplot

close_dotplot_K0 9s1d3g5

Full dotplot

dotplot_K0 9s1d3g5

mbhall88 commented 3 years ago

After re-running bcftools with the new FRS filter (0.9 and 0.85 threshold) we get the following results. Note: I realised halfway through that I had since changed the varifier version too, so I included the intermediate results to see the effect of the new varifier version

Recall

image

Precision

image

mbhall88 commented 3 years ago

The (close) dotplot before and after FRS for bcftools (this is FRS 0.9)

Before FRS - bcftools

bcftools_dotplot_old

With FRS 0.9 - bcftools

bcftools_dotplot_frs0 85

All callers with FRS

close_dotplot_frs0 85

iqbal-lab commented 3 years ago

Something weird with your x axis tickmarks on latest recall and precision. Second tickmark along has no label. Means second box plot has no label?

mbhall88 commented 3 years ago

No, they're correct. The labels are on a 45-degree angle. So you kind of go up from the middle of the label text.