pllittle / UNMASC

Tumor-only variant calling
GNU General Public License v3.0
8 stars 1 forks source link
bioinformatics genomics ngs somatic-mutations tumor-only vcf

C++ R CRAN status DOI Project Status: Active - The project has reached a stable, usable state and is being actively developed.

What is this?

One goal of cancer genomics is to identify DNA variants specific to the cancer tissue within an individual. Perhaps a researcher would like to identify mutated genes and design a cancer treatment or therapy specific to that individual's cancer. These cancer variants are considered somatic or variants that cannot be inherited. Our normal tissue harbors inherited DNA variants called germline variants that are present and identical across all normal tissue.

If one sequences an individual's matched normal DNA (e.g. from blood or adjacent tissue) and tumor DNA, one can identify both germline and somatic mutations and more importantly, distinguish between them. However, without the matched normal DNA serving as a control, the performance of somatic mutation callers (MuTect2, Seurat, Indelocator, Varscan, Strelka, Strelka2, etc.) drops off in terms of recall (sensitivity) and precision (positive predictive value). Perhaps the tumor sample:

A third set of detected and unavoidable variants are false positives or artifacts that can arise from several sources including poor sequencing, sample storage, read misalignment to the reference genome, etc. UNMASC attempts to identify somatic variants from tumor samples without an adequate matched normal.

UNMASC workflow for a single tumor sample against Z unmatched normal controls. SB = strand bias, SEG = segmentation, OXOG = oxoG artifacts, FFPE = paraffin artifacts.

Description

This package is designed to filter and annotate tumor-only variant calls through the integration of public database annotations, clustering, and segmentation to provide the user with a clear characterization of each variant when called against a set of unmatched normal controls.

Citation

Little, P., Jo, H., Hoyle, A., Mazul, A., Zhao, X., Salazar, A.H., Farquhar, D., Sheth, S., Masood, M., Hayward, M.C., Parker, J.S., Hoadley, K.A., Zevallos, J. and Hayes, D.N. (2021). UNMASC: tumor-only variant calling with unmatched normal controls. NAR Cancer, 3(4), zcab040. [HTML, PDF, Supplement]

Installation

Click to expand! R/RStudio code to check, install, and load libraries. ```R pandoc = Sys.getenv("RSTUDIO_PANDOC") build_vign = !is.null(pandoc) && file.exists(pandoc) cran_packs = c("devtools","Rcpp","RcppArmadillo","emdbook", "scales","BiocManager","parallel","doParallel", "data.table","grDevices","foreach") bioc_packs = c("seqTools","Rsamtools","GenomicRanges", "IRanges") github_packs = c("smarter","UNMASC") req_packs = c(cran_packs,bioc_packs,github_packs) for(pack in req_packs){ chk_pack = tryCatch(find.package(pack), error = function(ee){NULL}) if( !is.null(chk_pack) ){ library(pack,character.only = TRUE) next } if( pack %in% cran_packs ){ install.packages(pack,dependencies = TRUE) } else if( pack %in% bioc_packs ){ BiocManager::install(pkg = pack,dependencies = TRUE) } else if( pack %in% github_packs ){ devtools::install(sprintf("pllittle/%s",pack), dependencies = TRUE) } } ```

Inputs

Workflow

UNMASC's benchmark samples were run with Strelka. Assuming

are installed along with corresponding dependencies (Perl, HTSlib, etc.), Linux commands are provided below to run these software for variant calling and annotation. Running our customized VEP annotation requires downloading a COSMIC database VCF. For example, CosmicCodingMuts.vcf.gz for GRCh37 with the latest release can be found at here. We have instructed VEP to annotate variants with 1000 Genomes population allele frequencies, ExAC/gnomAD population allele frequencies, variant transcripts, impacts/consequences, and COSMIC counts with stable and legacy IDs.

Refer to our comprehensive documentation for setup, inputs, and execution.

Future directions

FAQs