Open jackhump opened 4 years ago
to be done by Jack
remove multi-cohort functionality - overcomplicates things
simplify string matching: just assume that user has symlinked a bunch of gzipped VCFs and each VCF is a chromosome for that cohort
hardcode blacklist filtering - no point in making optional
create sample filter rule - from results of PCA for example to remove certain samples
add in final MAF filter - pipeline should output all QC-passed variants, plus a common variants file
look into chunking - can this be increased? ideally chunk at beginning and do every step per chunk.
go over cluster.yaml and snakejob - ensure optimal execution across maximum number of nodes while keeping resources low
combine sample filtering with SNP filtering into a single rule
combine all chunk-level filters into a single rule
work with HPC to optimise I/O for writing temp files during chunk filtering.
to be done by Jack
remove multi-cohort functionality - overcomplicates things
simplify string matching: just assume that user has symlinked a bunch of gzipped VCFs and each VCF is a chromosome for that cohort
hardcode blacklist filtering - no point in making optional
create sample filter rule - from results of PCA for example to remove certain samples
add in final MAF filter - pipeline should output all QC-passed variants, plus a common variants file
look into chunking - can this be increased? ideally chunk at beginning and do every step per chunk.
go over cluster.yaml and snakejob - ensure optimal execution across maximum number of nodes while keeping resources low