metagenomics / metagenomics-tk

GNU Affero General Public License v3.0
0 stars 0 forks source link

feat(pipeline): initial implementation for easy/preset pipeline execution mode added #345

Closed pbelmann closed 8 months ago

pbelmann commented 10 months ago

This PR adds an initial implementation of the "preset" mode which allows you to run the toolkit without specifying any configuration file (See example below). Initial documentation will be provided once the entire documentation is revised.

ubuntu@bibigrid-master-wiowhtqpqjglbpr:/vol/spool/peter/meta-omics-toolkit$ ./nextflow run main.nf -work-dir work_wFullPipeline -profile standard -resume -entry wFullPipeline --preset --help
N E X T F L O W  ~  version 23.04.1
###################################################################
#############            Metagenomics-TK             ##############
###################################################################
Version: 0.3.0
Mode: preset
###################################################################
###################################################################
The following modules will be executed:

qc
qcONT
assembly
assemblyONT
binning
binningONT
magAttributes
fragmentRecruitment
dereplication
readMapping
cooccurrence
plasmid
###################################################################
The following job flavors are defined:

Flavor:highmemLarge, CPUs:28, Memory:230
Flavor:highmemMedium, CPUs:14, Memory:113
Flavor:large, CPUs:28, Memory:58
Flavor:medium, CPUs:18, Memory:29
Flavor:small, CPUs:7, Memory:14
Flavor:tiny, CPUs:1, Memory:1
###################################################################
Help Page:

Mandatory Parameters:
--databases:            Path to a folder where databases are downloaded and extracted. 
                        If you are using slurm then the path should point to a folder which is local to the worker host and not shared by all workers.
--scratch:              Scratch directory which is used for storing intermediate results.
--output:               Output directory path or S3 url.
--input.ont.path:       Path to a samplesheet containing the two columns: SAMPLE and PATH.
                        SAMPLE contains the id of the dataset and the PATH column contains the path or url that points to the nanopore datasets.
--input.paired.path:    Path that points to a samplesheet with the required columns SAMPLE, READS1 and READS2.
                        READS1 and READS2 point to paths or urls of the input datasets.

Optional Parameters:
Possible resource settings with the values cpus and ram.
Example: --tiny 1,4 means 1 cpu and 4 GB RAM.
--highmemLarge
--highmemMedium
--large
--medium
--small
--tiny

You can disable modules via the following parameters:
--no-qc
--no-qcONT
--no-assembly
--no-assemblyONT
--no-binning
--no-binningONT
--no-magAttributes
--no-fragmentRecruitment
--no-dereplication
--no-readMapping
--no-cooccurrence
--no-plasmid

Please provide a description for this PR

Description goes here...

PR review guidelines

Thank you for submitting this PR.

Before merge: