umccr / dracarys

:dragon_face: DRAGEN workflow tidying :fire:
https://umccr.github.io/dracarys/
Other
0 stars 0 forks source link
cancer-genomics dragen multiqc qc r-package r6

πŸ”₯ dracarys - UMCCR Workflow Tidying

Conda
install Conda
install

πŸ† Aim

Given a directory with results from a DRAGEN/UMCCR workflow, {dracarys} will grab files of interest and transform them into β€˜tidier’ structures for output into TSV/Parquet/RDS format for downstream ingestion into a database/data lake. See supported workflows, running examples, and CLI options in the sections below.

πŸ• Installation

R ``` r remotes::install_github("umccr/dracarys@vX.X.X") # for vX.X.X Release/Tag ```
Conda - Linux & MacOS (non-M1) ``` bash mamba create \ -n dracarys_env \ -c umccr -c bioconda -c conda-forge \ r-dracarys==X.X.X conda activate dracarys_env ``` - MacOS M1 ``` bash CONDA_SUBDIR=osx-64 \ mamba create \ -n dracarys_env \ -c umccr -c bioconda -c conda-forge \ r-dracarys==X.X.X conda activate dracarys_env ```
Docker ``` bash docker pull --platform linux/amd64 ghcr.io/umccr/dracarys:X.X.X ```

✨ Supported Workflows

{dracarys} supports most outputs from the following DRAGEN/UMCCR workflows:

Workflow Description
bcl_convert BCLConvert workflow
tso_ctdna_tumor_only ctDNA TSO500 workflow
wgs_alignment_qc DRAGEN DNA (alignment) workflow
wts_alignment_qc DRAGEN RNA (alignment) workflow
wts_tumor_only DRAGEN RNA workflow
wgs_tumor_normal DRAGEN Tumor/Normal workflow
umccrise umccrise workflow
rnasum RNAsum workflow
sash sash workflow
oncoanalyser oncoanalyser workflow

See which output files from these workflows are supported in Supported Files.

πŸŒ€ CLI

A dracarys.R command line interface is available for convenience.

dracarys_cli=$(Rscript -e 'x = system.file("cli", package = "dracarys"); cat(x, "\n")' | xargs)
export PATH="${dracarys_cli}:${PATH}"
dracarys.R --version
dracarys.R 0.16.0

#-----------------------------------#
dracarys.R --help
usage: dracarys.R [-h] [-v] {tidy} ...

πŸ‰ DRAGEN Output Post-Processing πŸ”₯

positional arguments:
  {tidy}         sub-command help
    tidy         Tidy UMCCR Workflow Outputs

options:
  -h, --help     show this help message and exit
  -v, --version  show program's version number and exit

#-----------------------------------#
#------- Tidy ----------------------#
dracarys.R tidy --help
usage: dracarys.R tidy [-h] -i IN_DIR -o OUT_DIR -p PREFIX [-t TOKEN]
                       [-l LOCAL_DIR] [-f FORMAT] [-n] [-q]

options:
  -h, --help            show this help message and exit
  -i IN_DIR, --in_dir IN_DIR
                        ⛄️ Directory with untidy UMCCR workflow results. Can
                        be GDS, S3 or local.
  -o OUT_DIR, --out_dir OUT_DIR
                        πŸ”₯ Directory to output tidy results.
  -p PREFIX, --prefix PREFIX
                        🎻 Prefix string used for all results.
  -t TOKEN, --token TOKEN
                        πŸ™ˆ ICA access token. Default: ICA_ACCESS_TOKEN env var.
  -l LOCAL_DIR, --local_dir LOCAL_DIR
                        πŸ“₯ If input is a GDS/S3 directory, download the
                        recognisable files to this directory. Default:
                        '<out_dir>/dracarys_<gds|s3>_sync'.
  -f FORMAT, --format FORMAT
                        🎨 Format of output. Default: tsv.
  -n, --dryrun          🐫 Dry run - just show files to be tidied.
  -q, --quiet           😴 Shush all the logs.

πŸš• Running

{dracarys} takes as input (--in_dir) a directory with results from one of the UMCCR workflows. It will recursively scan that directory for supported files, download those into a local directory (--gds_local_dir), and then it will parse, transform and write the tidied versions into the specified output directory (--out_dir). A prefix (--prefix) is prepended to each of the tidied files. The output file format (--format) can be tsv, parquet, or both. To get just a list of supported files within the specified input directory, use the -n (--dryrun) option.

R ``` r # help(umccr_tidy) in_dir <- "gds://path/to/subjectX_multiqc_data/" out_dir <- tempdir() prefix <- "subjectX" umccr_tidy(in_dir = in_dir, out_dir = out_dir, prefix = prefix) ```
Mac/Linux From within an activated conda environment or a shell with the `dracarys.R` CLI available: ``` bash dracarys.R tidy \ -i gds://path/to/subjectX_multiqc_data/ \ -o local_output_dir \ -p subjectX_prefix ```
Docker ``` bash docker container run \ -v $(PWD):/mount1 \ --platform=linux/amd64 \ --env "ICA_ACCESS_TOKEN" \ --rm -it \ ghcr.io/umccr/dracarys:X.X.X \ dracarys.R tidy \ -i gds://path/to/subjectX_multiqc_data/ \ -o /mount1/output_dir \ -p subjectX_prefix ```