openvax / neoantigen-vaccine-pipeline

Bioinformatics pipeline for selecting patient-specific cancer neoantigen vaccines
Apache License 2.0
75 stars 25 forks source link

Add Manta for identifying candidate larger indels #142

Open iskandr opened 5 years ago

iskandr commented 5 years ago

Installation:

conda install -c bioconda manta

Usage:

configManta.py \
--normalBam normal.cram \
--tumorBam tumor.cram \
--referenceFasta genome.fa \
--runDir ${MY_MANTA_WORKDIR} \
--callRegions canonicalChromosomes.bed \
--exome 

Followed by:

${MY_MANTA_WORKDIR}/runWorkflow.py -j ${NUM_CORES}

Notes:

iskandr commented 5 years ago

One possible wrinkle: Manta requires python 2.6 or 2.7. I'm running it inside a python3 conda env but I think it's picking up the base installed Python:

The configManta.py script starts with:

#!/usr/bin/env python2
iskandr commented 5 years ago

Files generated by Manta:

*diploidSV.vcf.gz*
SVs and indels scored and genotyped under a diploid model for the set of samples in a joint diploid sample analysis or for the normal sample in a tumor/normal subtraction analysis. In the case of a tumor/normal subtraction, the scores in this file do not reflect any information from the tumor sample.

*somaticSV.vcf.gz*
SVs and indels scored under a somatic variant model. This file will only be produced if a tumor sample alignment file is supplied during configuration

*candidateSV.vcf.gz*
Unscored SV and indel candidates. Only a minimal amount of supporting evidence is required for an SV to be entered as a candidate in this file. An SV or indel must be a candidate to be considered for scoring, therefore an SV cannot appear in the other VCF outputs if it is not present in this file. Note that by default this file includes indels of size 8 and larger. The smallest indels in this set are intended to be passed on to a small variant caller without scoring by manta itself (by default manta scoring starts at size 50).

*candidateSmallIndels.vcf.gz*
Subset of the candidateSV.vcf.gz file containing only simple insertion and deletion variants less than the minimum scored variant size (50 by default). Passing this file to a small variant caller will provide continuous coverage over all indel sizes when the small variant caller and manta outputs are evaluated together. Alternate small indel candidate sets can be parsed out of the candidateSV.vcf.gz file if this candidate set is not appropriate.

The passing somatic structural variants are in somaticSV.vcf.gz. The smaller indels get filtered out into candidateSmallIndels.vcf.gz, which should be used as an input to Strelka2.