This project repository contains a Snakemake workflow to produce whole-genome Verkko assemblies, and extract contigs that most likely represent the Y chromosome. The workflow requires HiFi and ONT reads to be executed, plus Illumina short reads for certain assembly evaluation tasks.
The input sample sheet is a simple tab-separated table listing sample name (sample
)
and (file system) location of read sets (hifi
, ont
and short
if available).
The sample sheet needs to be loaded as follows:
$ snakemake --config samples=PATH_TO_SAMPLE_SHEET [...]
The entire workflow uses Conda environments wherever possible to deploy software dependencies.
A base environment containing Snakemake itself is defined in
workflow/envs/run_env.yaml
.
For software in development/prototype stage (Verkko and VerityMap), adaptations to the local
infrastructure (Verkko) or building specific bugfix versions (VerityMap, see module
workflow/envs/80_est_assm_errors.smk
) is required, with the former not being automatable.
The folder notebooks/
contains Jupyter notebooks used to plot various summary statistics
of the generated assemblies. The notebooks contain a brief description documenting the necessary
input files (produced by the Snakemake workflow).
In preparation