mbhall88 / head_to_head_pipeline

Snakemake pipelines to run the analysis for the Illumina vs. Nanopore comparison.
GNU General Public License v3.0
5 stars 2 forks source link

Filter multi-sample VCF #59

Closed mbhall88 closed 3 years ago

mbhall88 commented 3 years ago

The script for applying filters has been successfully adapted to work on single- or multi-samples VCFs seamlessly.

A stat from the initial run that may be handy in the future is 48% of records in the sparse VCF have all alleles the same length. This is a crude way of estimating how many SNPs there are in the pandora compare VCF - in case we wanted to try SNP distance.

mbhall88 commented 3 years ago

If we left-align indels with bcftools norm and trim alleles that have no call in the samples, we actually increase the proportion of all alleles with the same length to 59%. Doing this also illustrates that 51% of the records in the pandora VCF have no ALT call.

mbhall88 commented 3 years ago

The results from #62 look pretty good, but it would be it does seem like it might wise to try different filters for compare VCFs compared to map.

mbhall88 commented 3 years ago

I tried changing the multi-sample FRS filter to 0.75 but it doesn't seem to be as good as 0.9 (as in https://github.com/mbhall88/head_to_head_pipeline/issues/62#issuecomment-783122783)

compareK0 75

I will leave the filters the same for both pandoras for now and revisit this if required.