monagrland / MB_Pipeline

Metabarcoding Pipeline for Illumina Sequencing Data
GNU Affero General Public License v3.0
1 stars 1 forks source link

Follow Snakemake recommended folder structure #17

Closed kbseah closed 8 months ago

kbseah commented 9 months ago

Snakemake recommended folder structure: https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#distribution-and-reproducibility

I think that the workflow (rules etc.) should be in dedicated subfolder. The pipeline should simply be forked and modified for each new dataset (alternatively it could be uploaded to WorkflowHub in the future).

Detailed explanation:

Current usage scenario of the pipeline: User clones a single copy of the workflow, and uses the same workflow to analyze multiple datasets by writing individual config files for each. The input and output paths for each dataset are specified in the config files and are independent of the workflow, i.e. the input/output folders are not necessarily subfolders of the workflow. In order to accommodate this usage pattern, the workdir is manually specified and also not necessarily the path at which the Snakemake command is run.

The original motivation AFAIK was:

However,

kbseah commented 9 months ago

Rules for inclusion in the Snakemake workflow catalog: https://snakemake.github.io/snakemake-workflow-catalog/?rules=true

kbseah commented 8 months ago

Observed with Snakemake 8:

environment variables dumped in a "wall of text", unclear what's triggering it, seems to be what's reported here: https://github.com/snakemake/snakemake/issues/2624