sigven / cacao

Callable Cancer Loci - assessment of sequencing coverage for actionable and pathogenic loci in cancer
MIT License
21 stars 3 forks source link
alignment bam cancer cancer-genomics clinical-decision-support command-line-tool coverage-report next-generation-sequencing pathogenic-loci pathogenic-variants quality-assurance quality-control sequencing sequencing-coverage

cacao - callable cancer loci

Contents

Overview

cacao is a computational workflow that provides software and data to assess sequencing depth for clinically actionable/pathogenic loci in cancer for a given sequence alignment (BAM/CRAM). Most importantly, the software will pinpoint genomic loci of clinical relevance in cancer that has sufficient sequencing coverage for reliable variant calling. In combination with the actual variants that have been identified, it may thus serve to confirm negative findings, a matter of significant clinical value that is underappreciated in current cancer sequencing analysis. The specific requirements to denote loci as callable (i.e. depth & alignment quality) can be configured by the user, and should thus reflect how the input are used for variant calling (RNA/DNA, germline/somatic calling)

Technically, cacao combines the speed of mosdepth with the powerful R markdown framework for interactive data reporting. It currently employs the Docker technology for software encapsulation to ease the installation process (A Conda package is in the making)

News

Annotation resources (v0.3.1)

Three clinical genomic tracks in BED format have been created:

IMPORTANT: At each variant identified from the three sources above, we have used a surrounding sequence window of approximately 10bp for which the mean depth is calculated and representing the loci coverage.

All three tracks (hereditary, somatic_actionable, and somatic_hotspot) are available for GRCh37 and GRCh38, and there is also tab-separated files that link each locus to its associated

Example reports

Getting started

Installation

Usage

Run the CACAO workflow with the cacao_wflow.py Python script, which takes the following required and optional arguments:

usage:
cacao_wflow.py -h [options]
--query_aln BAM/CRAM
--track_dir TRACK_DIR
--output_dir OUTPUT_DIR
--genome_assembly grch37|grch38
--sample_id SAMPLE_ID
--mode hereditary|somatic|any

cacao - assessment of sequencing coverage at pathogenic and actionable loci in
cancer

Required arguments:
  --query_aln QUERY_ALN
                        Query alignment file (BAM/CRAM)
  --track_dir TRACK_DIR
                        Directory with BED tracks of pathogenic/actionable cancer loci for grch37/grch38
  --output_dir OUTPUT_DIR
                        Output directory
  --genome_assembly {grch37,grch38}
                        Human genome assembly build: grch37 or grch38
  --mode {hereditary,somatic,any}
                        Choice of loci and clinical cancer context (cancer predisposition/tumor sequencing)
  --sample_id SAMPLE_ID
                        Sample identifier - prefix for output files

Optional arguments:
  -h, --help            show this help message and exit
  --mapq MAPQ           mapping quality threshold (default: 0)
  --threads THREADS     Number of mosdepth BAM decompression threads. (use 4
                        or fewer) (default: 0)
  --callability_levels_germline CALLABILITY_LEVELS_GERMLINE
                        Simple colon-separated string that defines four levels
                        of variant callability: NO_COVERAGE (0), LOW_COVERAGE
                        (1-9), CALLABLE (10-99), HIGH_COVERAGE (>= 100).
                        Initial value must be 0. (default: 0:10:100)
  --callability_levels_somatic CALLABILITY_LEVELS_SOMATIC
                        Simple colon-separated string that defines four levels
                        of variant callability: NO_COVERAGE (0), LOW_COVERAGE
                        (1-29), CALLABLE (30-199), HIGH_COVERAGE (>= 200).
                        Initial value must be 0. (default: 0:30:200)
  --query_target QUERY_TARGET
                        BED file with genome target regions subject to
                        sequencing/analysis (default: None)
  --force_overwrite     By default, the script will fail with an error if any
                        output file already exists. You can force the
                        overwrite of existing result files by using this flag
                        (default: False)
  --version             show program's version number and exit

Documentation

Coming

Contact

sigven AT ifi.uio.no