cacao is a computational workflow that provides software and data to assess sequencing depth for clinically actionable/pathogenic loci in cancer for a given sequence alignment (BAM/CRAM). Most importantly, the software will pinpoint genomic loci of clinical relevance in cancer that has sufficient sequencing coverage for reliable variant calling. In combination with the actual variants that have been identified, it may thus serve to confirm negative findings, a matter of significant clinical value that is underappreciated in current cancer sequencing analysis. The specific requirements to denote loci as callable (i.e. depth & alignment quality) can be configured by the user, and should thus reflect how the input are used for variant calling (RNA/DNA, germline/somatic calling)
Technically, cacao combines the speed of mosdepth with the powerful R markdown framework for interactive data reporting. It currently employs the Docker technology for software encapsulation to ease the installation process (A Conda package is in the making)
Three clinical genomic tracks in BED format have been created:
IMPORTANT: At each variant identified from the three sources above, we have used a surrounding sequence window of approximately 10bp for which the mean depth is calculated and representing the loci coverage.
All three tracks (hereditary, somatic_actionable, and somatic_hotspot) are available for GRCh37 and GRCh38, and there is also tab-separated files that link each locus to its associated
cacao_wflow.py
requires that Python3 is installeddocker pull sigven/cacao:0.3.1
Run the CACAO workflow with the cacao_wflow.py
Python script, which takes the following required and optional arguments:
usage:
cacao_wflow.py -h [options]
--query_aln BAM/CRAM
--track_dir TRACK_DIR
--output_dir OUTPUT_DIR
--genome_assembly grch37|grch38
--sample_id SAMPLE_ID
--mode hereditary|somatic|any
cacao - assessment of sequencing coverage at pathogenic and actionable loci in
cancer
Required arguments:
--query_aln QUERY_ALN
Query alignment file (BAM/CRAM)
--track_dir TRACK_DIR
Directory with BED tracks of pathogenic/actionable cancer loci for grch37/grch38
--output_dir OUTPUT_DIR
Output directory
--genome_assembly {grch37,grch38}
Human genome assembly build: grch37 or grch38
--mode {hereditary,somatic,any}
Choice of loci and clinical cancer context (cancer predisposition/tumor sequencing)
--sample_id SAMPLE_ID
Sample identifier - prefix for output files
Optional arguments:
-h, --help show this help message and exit
--mapq MAPQ mapping quality threshold (default: 0)
--threads THREADS Number of mosdepth BAM decompression threads. (use 4
or fewer) (default: 0)
--callability_levels_germline CALLABILITY_LEVELS_GERMLINE
Simple colon-separated string that defines four levels
of variant callability: NO_COVERAGE (0), LOW_COVERAGE
(1-9), CALLABLE (10-99), HIGH_COVERAGE (>= 100).
Initial value must be 0. (default: 0:10:100)
--callability_levels_somatic CALLABILITY_LEVELS_SOMATIC
Simple colon-separated string that defines four levels
of variant callability: NO_COVERAGE (0), LOW_COVERAGE
(1-29), CALLABLE (30-199), HIGH_COVERAGE (>= 200).
Initial value must be 0. (default: 0:30:200)
--query_target QUERY_TARGET
BED file with genome target regions subject to
sequencing/analysis (default: None)
--force_overwrite By default, the script will fail with an error if any
output file already exists. You can force the
overwrite of existing result files by using this flag
(default: False)
--version show program's version number and exit
Coming
sigven AT ifi.uio.no