tgen / lumosVar2

Calls somatic SNVs, indels, and allelic copy number jointly across multiple samples from the same patient. These can be standard tumor/normal pair, longitudinal samples, primary/met, etc. Can also be used for tumor only calling, ideally with a high tumor content and a low tumor content sample.
MIT License
10 stars 1 forks source link

LumosVar2

Calls somatic SNVs, indels, and allelic copy number jointly across multiple samples from the same patient. These can be standard tumor/normal pair, longitudinal samples, primary/met, etc. Can also be used for tumor only calling with one or more tumor samples, ideally with a high tumor content and a low tumor content sample.

Citation

Joint analysis of matched tumor samples with varying tumor contents improves somatic variant calling in the absence of a germline sample. bioRxiv 364943; doi: https://doi.org/10.1101/364943

Prerequisites

System Requirements

Dependencies

A note to TGen dback users, dependencies may be loaded on dback by sourcing setup.sh

Bam preperation

VCFs

Overview

Running lumosVar involves two main steps

  1. normalMetrics: analyzes a set of unmatched controls to find average read depths and position quality metrics.
    IMPORTANT - unmatched controls must be generated using the same exome capture as the tumors
  2. lumosVarMain: call somatic, germline, and copy number variants

Example dataset

http://tools.tgen.org/Files/lumosVar

Notes on pileup engine

Normal Metrics

user inputs

sexList: SEXLIST ###comma deliminated list of sex for each bam in the bamlist in the same order gvmPath: GVMPATH ###path to folder containing gvm executable outfile: OUTFILE ###path and name of output

- The bamList file should contain absolute paths to the unmatched control bams with one per line.

- In order to correctly handle the sex chromosomes, the sex of the individuals in the bams list must be given as input.  We have provided a helper [script](scripts/guessSex.py) to determine sex from the bams if they are not known.  This script takes the same yaml file as input as the normal metrics, but only the following fields in the "input files" section are needed.
>python guessSex.py controlsConfig.yaml

The output will be written to \<BAMLIST\>.guessSex.txt.  The last line of the output contains the sexList for the yaml file

- Normal metrics is run using the [runNormalMetrics.py](scripts/runNormalMetrics.py).  It takes two input arguments, the yaml and the chromosome.  It needs to be run separately on each chromosome.
>python runNormalMetrics.py controlsConfig.yaml 21

## LumosVar Main
- Inputs to lumosVar are defined in a yaml file, see [template](configTemplates/lumosVarMainConfigTemplate.yaml).  You will need to edit:

Input Files

bamList: BAMLIST ###path to file contining paths to bams regionsFile: BED ###path to bed file defining regions targeted in exome snpVCFpath: VCFPATH ###path to vcfs containg population frequencies (one for each chromosome),

including part of filename before chromosome number

snpVCFname: VCFNAME ###filename/extension of population vcf following chromosome number NormalBase: NMETRICS ###path and filename of output form NormalMetrics step before chr number cosmicVCF: COSMICVCF ###path to cancer mutation count VCF refGenome: REFGENOME ###path to reference genome that was used to align bams

User Inputs

outName: OUTNAME ###path and filename base for output files outMat: OUTMAT ###path and filename for "mat" file (matlab data) export gvmPath: GVMPATH ###path to folder containing gvm executable workingDirectory: WORK ###poth to folder containing files in the "work" directory in the github repo NormalSample: NINDEX ###position in bamList of normal sample, 0 indicates tumor only priorF: PRIORF ###vector of expected tumor fractions with one value per bam

for example [0.1;0.7] for a pair of bams with low and high expected tumor content

numCPU: CORES ### number of parallel processors

- The bamList file should contain absolue paths to the tumor bams with one per line.  All of the bams should come from the same patient.

- It is important that the length of priorF matches the number of bams in the bam list.  We recommend the following values for priorF:
  - solid tumor: 0.7
  - purified tumor or cell line: 0.99
  - tumor adjacent normal tissue: 0.1
  - normal tissue unlikely to have tumor contamination: 0.01 (you may also set NormalSample to the indicate postion of your normal sample in your bam list file to run in matched normal mode)  

To run lumosVar

./lumosVarMain lumosVarConfig.yaml



## LumosVar Output
- .lumosVarSNV.vcf - somatic and germline SNV/indel calls
- .lumosVarSeg.vcf - copy number calls by segment
- .somaticPass.txt - table of somatic variant calls
- .lumosVarParam.txt - parameters used in the lumosVar run
- .exonData.tsv - copy number calls by regions in bed file
- .cloneSummary.tsv - summary of clonal variant groups
- .cloneSummary.pdf - graphical summary of clonal variant groups
- .groupLinePlots.pdf - line plots by clonal variant group
- .vafPlot.pdf - plots of variant allele fractions
- .cnaPlot.pdf - plot of copy number states
- \<SAMPLENAME\>.qualMetrics.tsv - quality metrics for candidate variant position
- .mat - matlab workspace export