Calls somatic SNVs, indels, and allelic copy number jointly across multiple samples from the same patient. These can be standard tumor/normal pair, longitudinal samples, primary/met, etc. Can also be used for tumor only calling with one or more tumor samples, ideally with a high tumor content and a low tumor content sample.
Joint analysis of matched tumor samples with varying tumor contents improves somatic variant calling in the absence of a germline sample. bioRxiv 364943; doi: https://doi.org/10.1101/364943
A note to TGen dback users, dependencies may be loaded on dback by sourcing setup.sh
Running lumosVar involves two main steps
http://tools.tgen.org/Files/lumosVar
###input files
bamList: BAMLIST ###path to file contining paths to bams
regionsFile: BEDFILE ###path to bed file defining regions targeted in exome
refGenome: REFGENOME ###path to reference genome that was used to align bams
snpVCFpath: VCFPATH ###path to vcfs containg population frequencies (one for each chromosome),
###including part of filename before chromosome number
snpVCFname: VCFNAME ###filename/extension of population vcf following chromosome number
sexList: SEXLIST ###comma deliminated list of sex for each bam in the bamlist in the same order gvmPath: GVMPATH ###path to folder containing gvm executable outfile: OUTFILE ###path and name of output
- The bamList file should contain absolute paths to the unmatched control bams with one per line.
- In order to correctly handle the sex chromosomes, the sex of the individuals in the bams list must be given as input. We have provided a helper [script](scripts/guessSex.py) to determine sex from the bams if they are not known. This script takes the same yaml file as input as the normal metrics, but only the following fields in the "input files" section are needed.
>python guessSex.py controlsConfig.yaml
The output will be written to \<BAMLIST\>.guessSex.txt. The last line of the output contains the sexList for the yaml file
- Normal metrics is run using the [runNormalMetrics.py](scripts/runNormalMetrics.py). It takes two input arguments, the yaml and the chromosome. It needs to be run separately on each chromosome.
>python runNormalMetrics.py controlsConfig.yaml 21
## LumosVar Main
- Inputs to lumosVar are defined in a yaml file, see [template](configTemplates/lumosVarMainConfigTemplate.yaml). You will need to edit:
bamList: BAMLIST ###path to file contining paths to bams regionsFile: BED ###path to bed file defining regions targeted in exome snpVCFpath: VCFPATH ###path to vcfs containg population frequencies (one for each chromosome),
snpVCFname: VCFNAME ###filename/extension of population vcf following chromosome number NormalBase: NMETRICS ###path and filename of output form NormalMetrics step before chr number cosmicVCF: COSMICVCF ###path to cancer mutation count VCF refGenome: REFGENOME ###path to reference genome that was used to align bams
outName: OUTNAME ###path and filename base for output files outMat: OUTMAT ###path and filename for "mat" file (matlab data) export gvmPath: GVMPATH ###path to folder containing gvm executable workingDirectory: WORK ###poth to folder containing files in the "work" directory in the github repo NormalSample: NINDEX ###position in bamList of normal sample, 0 indicates tumor only priorF: PRIORF ###vector of expected tumor fractions with one value per bam
numCPU: CORES ### number of parallel processors
- The bamList file should contain absolue paths to the tumor bams with one per line. All of the bams should come from the same patient.
- It is important that the length of priorF matches the number of bams in the bam list. We recommend the following values for priorF:
- solid tumor: 0.7
- purified tumor or cell line: 0.99
- tumor adjacent normal tissue: 0.1
- normal tissue unlikely to have tumor contamination: 0.01 (you may also set NormalSample to the indicate postion of your normal sample in your bam list file to run in matched normal mode)
To run lumosVar
./lumosVarMain lumosVarConfig.yaml
## LumosVar Output
- .lumosVarSNV.vcf - somatic and germline SNV/indel calls
- .lumosVarSeg.vcf - copy number calls by segment
- .somaticPass.txt - table of somatic variant calls
- .lumosVarParam.txt - parameters used in the lumosVar run
- .exonData.tsv - copy number calls by regions in bed file
- .cloneSummary.tsv - summary of clonal variant groups
- .cloneSummary.pdf - graphical summary of clonal variant groups
- .groupLinePlots.pdf - line plots by clonal variant group
- .vafPlot.pdf - plots of variant allele fractions
- .cnaPlot.pdf - plot of copy number states
- \<SAMPLENAME\>.qualMetrics.tsv - quality metrics for candidate variant position
- .mat - matlab workspace export