nrminor / oneroof

Base-, Variant-, and Consensus-calling under One Proverbial Roof. Work in progress!
MIT License
5 stars 4 forks source link

Add a Python executable entrypoint that can be used in place of Nextflow run in the pixi environment or docker container #17

Closed nrminor closed 2 months ago

nrminor commented 2 months ago

We'll add more features to the CLI as time goes on, but this particular enhancement is finished.

Users can now run pixi shell --frozen or uv sync to get access to the following CLI:

usage: oneroof [-h] {env,validate,resume,run} ...

options:
  -h, --help            show this help message and exit

Subcommands:
  {env,validate,resume,run}
    env                 Check that all dependencies are available in the environment
    validate            Validate provided inputs.
    resume              Resume the previous run.
    run                 Run the full pipeline.

And for running:

usage: oneroof run [-h] [--primer_bed PRIMER_BED] [--fwd_suffix FWD_SUFFIX] [--rev_suffix REV_SUFFIX] --refseq REFSEQ [--ref_gbk REF_GBK]
                   [--remote_pod5_location REMOTE_POD5_LOCATION] [--file_watcher_config FILE_WATCHER_CONFIG] [--pod5_staging POD5_STAGING]
                   [--pod5_dir POD5_DIR] [--precalled_staging PRECALLED_STAGING] [--prepped_data PREPPED_DATA]
                   [--illumina_fastq_dir ILLUMINA_FASTQ_DIR] [--model MODEL] [--model_cache MODEL_CACHE] [--kit KIT]
                   [--pod5_batch_size POD5_BATCH_SIZE] [--basecall_max BASECALL_MAX] [--max_len MAX_LEN] [--min_len MIN_LEN]
                   [--min_qual MIN_QUAL] [--secondary] [--max_mismatch MAX_MISMATCH] [--downsample_to DOWNSAMPLE_TO]
                   [--min_consensus_freq MIN_CONSENSUS_FREQ] [--min_haplo_reads MIN_HAPLO_READS] [--snpeff_cache SNPEFF_CACHE]
                   [--min_depth_coverage MIN_DEPTH_COVERAGE] [--nextclade_dataset NEXTCLADE_DATASET] [--nextclade_cache NEXTCLADE_CACHE]
                   [--results RESULTS] [--cleanup] [--resume] [--snpEff_config SNPEFF_CONFIG]
                   [-profile {standard,docker,singularity,apptainer,containerless} [{standard,docker,singularity,apptainer,containerless} ...]]

options:
  -h, --help            show this help message and exit
  --primer_bed PRIMER_BED
                        A bed file of primer coordinates relative to the reference provided withthe parameters `refseq` and `ref_gbk`.
  --fwd_suffix FWD_SUFFIX
                        Suffix in the primer bed file denoting forward primer
  --rev_suffix REV_SUFFIX
                        Suffix in the primer bed file denoting reverse primer
  --refseq REFSEQ       The reference sequence to be used for mapping in FASTA format.
  --ref_gbk REF_GBK     The reference sequence to be used for variant annotation in Genbankformat.
  --remote_pod5_location REMOTE_POD5_LOCATION
                        A remote location to use with a ssh client to watch for pod5 files inrealtime as they are generated by the
                        sequencing instrument.
  --file_watcher_config FILE_WATCHER_CONFIG
                        Configuration file for remote file monitoring.
  --pod5_staging POD5_STAGING
                        Where to cache pod5s as they arrive from the remote location
  --pod5_dir POD5_DIR   A local, on-device directory where pod5 files have been manuallytransferred.
  --precalled_staging PRECALLED_STAGING
                        A local directory to watch for Nanopore FASTQs or BAMs as they becomeavailable.
  --prepped_data PREPPED_DATA
                        Location of already basecalled and demultiplexed pod5 files.
  --illumina_fastq_dir ILLUMINA_FASTQ_DIR
                        Location of Illumina paired-end FASTQ files.
  --model MODEL         The Nanopore basecalling model to apply to the provided pod5 data.
  --model_cache MODEL_CACHE
                        Where to cache the models locally.
  --kit KIT             The Nanopore barcoding kit used to prepare sequencing libraries.
  --pod5_batch_size POD5_BATCH_SIZE
                        How many pod5 files to basecall at once.
  --basecall_max BASECALL_MAX
                        How many parallel instances of the basecaller to run at once.
  --max_len MAX_LEN     The maximum acceptable length for a given read.
  --min_len MIN_LEN     The minimum acceptable length for a given read.
  --min_qual MIN_QUAL   The minimum acceptable average quality for a given read.
  --secondary           Whether to turn on secondary alignments for each amplicon.
  --max_mismatch MAX_MISMATCH
                        The maximum number of mismatches to allow when finding primers.
  --downsample_to DOWNSAMPLE_TO
                        Desired coverage to downsample to, with 0 indicating no downsampling.
  --min_consensus_freq MIN_CONSENSUS_FREQ
                        The minimum required frequency of a variant base to be included in aconsensus sequence.
  --min_haplo_reads MIN_HAPLO_READS
                        The minimum required read support to report an amplicon-haplotype.
  --snpeff_cache SNPEFF_CACHE
                        Where to cache a custom snpEff database.
  --min_depth_coverage MIN_DEPTH_COVERAGE
                        Minimum depth of coverage [default: 20].
  --nextclade_dataset NEXTCLADE_DATASET
                        Nextclade dataset.
  --nextclade_cache NEXTCLADE_CACHE
                        Nextclade dataset cache.
  --results RESULTS     Where to place results.
  --cleanup             Whether to cleanup work directory after a successful run.
  --resume              Whether to resume from a previous run.
  --snpEff_config SNPEFF_CONFIG
                        snpEff config file.
  -profile {standard,docker,singularity,apptainer,containerless} [{standard,docker,singularity,apptainer,containerless} ...]
                        The run configuration profile to use.