Add a Python executable entrypoint that can be used in place of Nextflow run in the pixi environment or docker container

We'll add more features to the CLI as time goes on, but this particular enhancement is finished.

Users can now run pixi shell --frozen or uv sync to get access to the following CLI:

usage: oneroof [-h] {env,validate,resume,run} ...

options:
  -h, --help            show this help message and exit

Subcommands:
  {env,validate,resume,run}
    env                 Check that all dependencies are available in the environment
    validate            Validate provided inputs.
    resume              Resume the previous run.
    run                 Run the full pipeline.

And for running:

usage: oneroof run [-h] [--primer_bed PRIMER_BED] [--fwd_suffix FWD_SUFFIX] [--rev_suffix REV_SUFFIX] --refseq REFSEQ [--ref_gbk REF_GBK]
                   [--remote_pod5_location REMOTE_POD5_LOCATION] [--file_watcher_config FILE_WATCHER_CONFIG] [--pod5_staging POD5_STAGING]
                   [--pod5_dir POD5_DIR] [--precalled_staging PRECALLED_STAGING] [--prepped_data PREPPED_DATA]
                   [--illumina_fastq_dir ILLUMINA_FASTQ_DIR] [--model MODEL] [--model_cache MODEL_CACHE] [--kit KIT]
                   [--pod5_batch_size POD5_BATCH_SIZE] [--basecall_max BASECALL_MAX] [--max_len MAX_LEN] [--min_len MIN_LEN]
                   [--min_qual MIN_QUAL] [--secondary] [--max_mismatch MAX_MISMATCH] [--downsample_to DOWNSAMPLE_TO]
                   [--min_consensus_freq MIN_CONSENSUS_FREQ] [--min_haplo_reads MIN_HAPLO_READS] [--snpeff_cache SNPEFF_CACHE]
                   [--min_depth_coverage MIN_DEPTH_COVERAGE] [--nextclade_dataset NEXTCLADE_DATASET] [--nextclade_cache NEXTCLADE_CACHE]
                   [--results RESULTS] [--cleanup] [--resume] [--snpEff_config SNPEFF_CONFIG]
                   [-profile {standard,docker,singularity,apptainer,containerless} [{standard,docker,singularity,apptainer,containerless} ...]]

options:
  -h, --help            show this help message and exit
  --primer_bed PRIMER_BED
                        A bed file of primer coordinates relative to the reference provided withthe parameters `refseq` and `ref_gbk`.
  --fwd_suffix FWD_SUFFIX
                        Suffix in the primer bed file denoting forward primer
  --rev_suffix REV_SUFFIX
                        Suffix in the primer bed file denoting reverse primer
  --refseq REFSEQ       The reference sequence to be used for mapping in FASTA format.
  --ref_gbk REF_GBK     The reference sequence to be used for variant annotation in Genbankformat.
  --remote_pod5_location REMOTE_POD5_LOCATION
                        A remote location to use with a ssh client to watch for pod5 files inrealtime as they are generated by the
                        sequencing instrument.
  --file_watcher_config FILE_WATCHER_CONFIG
                        Configuration file for remote file monitoring.
  --pod5_staging POD5_STAGING
                        Where to cache pod5s as they arrive from the remote location
  --pod5_dir POD5_DIR   A local, on-device directory where pod5 files have been manuallytransferred.
  --precalled_staging PRECALLED_STAGING
                        A local directory to watch for Nanopore FASTQs or BAMs as they becomeavailable.
  --prepped_data PREPPED_DATA
                        Location of already basecalled and demultiplexed pod5 files.
  --illumina_fastq_dir ILLUMINA_FASTQ_DIR
                        Location of Illumina paired-end FASTQ files.
  --model MODEL         The Nanopore basecalling model to apply to the provided pod5 data.
  --model_cache MODEL_CACHE
                        Where to cache the models locally.
  --kit KIT             The Nanopore barcoding kit used to prepare sequencing libraries.
  --pod5_batch_size POD5_BATCH_SIZE
                        How many pod5 files to basecall at once.
  --basecall_max BASECALL_MAX
                        How many parallel instances of the basecaller to run at once.
  --max_len MAX_LEN     The maximum acceptable length for a given read.
  --min_len MIN_LEN     The minimum acceptable length for a given read.
  --min_qual MIN_QUAL   The minimum acceptable average quality for a given read.
  --secondary           Whether to turn on secondary alignments for each amplicon.
  --max_mismatch MAX_MISMATCH
                        The maximum number of mismatches to allow when finding primers.
  --downsample_to DOWNSAMPLE_TO
                        Desired coverage to downsample to, with 0 indicating no downsampling.
  --min_consensus_freq MIN_CONSENSUS_FREQ
                        The minimum required frequency of a variant base to be included in aconsensus sequence.
  --min_haplo_reads MIN_HAPLO_READS
                        The minimum required read support to report an amplicon-haplotype.
  --snpeff_cache SNPEFF_CACHE
                        Where to cache a custom snpEff database.
  --min_depth_coverage MIN_DEPTH_COVERAGE
                        Minimum depth of coverage [default: 20].
  --nextclade_dataset NEXTCLADE_DATASET
                        Nextclade dataset.
  --nextclade_cache NEXTCLADE_CACHE
                        Nextclade dataset cache.
  --results RESULTS     Where to place results.
  --cleanup             Whether to cleanup work directory after a successful run.
  --resume              Whether to resume from a previous run.
  --snpEff_config SNPEFF_CONFIG
                        snpEff config file.
  -profile {standard,docker,singularity,apptainer,containerless} [{standard,docker,singularity,apptainer,containerless} ...]
                        The run configuration profile to use.

nrminor / oneroof

Add a Python executable entrypoint that can be used in place of Nextflow run in the pixi environment or docker container #17