harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

Add scaffolds_to_exclude param in QC #128

Closed aewebb80 closed 9 months ago

aewebb80 commented 9 months ago

The QC rule subsample_snps currently removes mtDNA rather than the chromosomes specified by scaffolds_to_exclude. I've also updated the QC test to account for the scaffolds_to_exclude param.

erikenbody commented 9 months ago

Thanks for looking into this! The scaffolds_to_exclude parameter was originally written as part of the postprocessing module, where it is correctly implemented. The QC code was written as a quick-and-dirty get to an analysis-ready dataset ASAP. Still, I think this change is fine, seems a little tidier. Curious what @tsackton and @cademirch think

tsackton commented 9 months ago

I think it is probably better to be consistent here. I can certainly imagine scenarios where including a scaffold known to be problematic in some way could distort the QC plots.

erikenbody commented 9 months ago

yeah thats true, it seems sensible for example to exclude sex chromosomes on the QC plot as they will bias these so I agree to incorporate

cademirch commented 9 months ago

This looks good to me and makes sense. Will merge after tests pass. Thank you for your contributions!!