harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

Add snp density visualization to QC module #109

Open tsackton opened 1 year ago

tsackton commented 1 year ago

I believe a large part of why it took so long to discover #106 is that we don't have any spatial visualization of snp density in the QC output, in part because this would really only make sense to do on the full (non-downsampled) dataset, which is a bit of a pain to work with due to size.

However, it might be useful to compute sometime like a) the intersection between the callable sites bed and the DB-VCF intervals, and b) number of SNPs per DB interval. We could then make a simple scatter plot of those two variables, which should generally be pretty linearly correlated. Would also be a feasible way to quickly visualize things like sex chromosomes (intervals with low SNP density for depth, usually, unless sample has a skewed sex ratio).

Tagging this low priority and enhancement as does not feel critical to address ASAP but keeping issue to remind us later.