simplify and refactor - Githubissues

jackhump commented 4 years ago

to be done by Jack

remove multi-cohort functionality - overcomplicates things
simplify string matching: just assume that user has symlinked a bunch of gzipped VCFs and each VCF is a chromosome for that cohort
hardcode blacklist filtering - no point in making optional
create sample filter rule - from results of PCA for example to remove certain samples
add in final MAF filter - pipeline should output all QC-passed variants, plus a common variants file
look into chunking - can this be increased? ideally chunk at beginning and do every step per chunk.
go over cluster.yaml and snakejob - ensure optimal execution across maximum number of nodes while keeping resources low

jackhump commented 4 years ago

jackhump commented 4 years ago

jackhump / WGS-QC-Pipeline