Developed by: JG Reiter, AP Makohon-Moore, JM Gerold, I Bozic, K Chatterjee, C Iacobuzio-Donahue, B Vogelstein, MA Nowak.
========
Treeomics is a computational tool to reconstruct the phylogeny of metastases with commonly available sequencing technologies. The tool detects putative artifacts in noisy sequencing data and infers robust evolutionary trees across a variety of evaluated scenarios. For more details, see our publication Reconstructing metastatic seeding patterns of human cancers (Nature Communications, 8, 14114, http://dx.doi.org/10.1038/ncomms14114).
pyensembl
and varcode
to infer the gene names where variants occurred as well as their mutation effect.<subject>_variants.csv
with information about the individual variants and how they were classified in the inferred phylogeny. Solved issues with subclone detection and solution pool.--purities <SAMPLE NAMES>
. Added --verbose
option to run Treeomics in DEBUG logging level. Fixed VCF parsing error thanks to Frank's bug report.settings.py
to better configure PDF report appearance. conda create --name treeomics python=3.6
and activate it with conda activate treeomics
git clone https://github.com/reiterlab/treeomics.git
cd <TREEOMICS_DIRECTORY>
, run python setup.py clean sdist bdist_wheel
and install treeomics to your python environment by executing pip install -e <TREEOMICS_DIRECTORY>
cd /Applications/CPLEX_Studio1210/cplex/python/3.6/x86-64_osx/
and run python setup.py install
. Test your installation with python -c 'import cplex'
.
You may also need to add cplex to your PYTHONPATH
with: export PYTHONPATH="~/Applications/CPLEX_Studio1210/cplex/python/3.6/x86-64_osx/:$PYTHONPATH"
conda install python=3.6 qt=5
and then install ete3 conda install -c etetoolkit ete3
. You can test your installation with python -c 'from ete3 import TreeStyle'
.pip install varcode
and pyensembl install --release 75 76
pdflatex
in your PATH
environment variable;
https://www.tug.org/texlive/quickinstall.html)circos
in your PATH
environment variable; http://circos.ca/software/installation)cd <TREEOMICS_DIRECTORY>
and python setup.py test
(or pytest tests/
) and python -c 'import treeomics'
pip uninstall treeomics
or conda remove treeomics
and delete the conda environment with conda env remove --name treeomics
__main__.py
is either
input/Makohon2017/Pam03_mutant_reads.txt
and input/Makohon2017/Pam03_phredcoverage.txt
included in this repository for examples.treeomics -r <mut-reads table> -s <coverage table>
where <mut-reads table>
is the path to a tab-separated-value file with the number of
reads reporting a variant (row) in each sample (column) and <coverage table>
is the path to a tab-separated-value
file with the sequencing depth at the position of this variant in each sample.$ treeomics -r <mut-reads table> -s <coverage table> | -v <vcf file> | -d <vcf file directory>
-e
-a
-z
-o Provide different output directory (default output
)
-n -n FIRSTNORMALSAMPLE SECONDNORMALSAMPLE
-x -x FIRSTEXCLUDEDSAMPLE SECONDEXCLUDEDSAMPLE
--pool_size
-b
-u: Enables subclone detection (default False
)
-c
-f
-p
-i
-y
-g -g grch38
-t Maximum running time for CPLEX to solve the MILP (in seconds, default None
). If not None
, the obtained solution is no longer guaranteed to be optimal
--threads=<N>
Maximal number of parallel threads that will be invoked by CPLEX (0
: default, let CPLEX decide; 1
: single threaded; N
: uses up to N threads)
-l <max no MPS>:
Maximum number of considered mutation patterns per variant (default None
). If not None
, the obtained solution is no longer guaranteed to be optimal
--driver_genes=<path to file>
Path to CSV file with names of putative driver genes highlighted in inferred phylogeny (default --driver_genes=../input/Tokheim_drivers_union.csv
)
--wes_filtering
Removes intronic and intergenic variants in WES data (default False
)
--common_vars_file
Path to file with common variants in normal samples and therefore removed from analysis (default None
)
--no_plots
Disables generation of X11 depending plots (useful for benchmarking; default plots are generated plots
)
--no_tikztrees
Disables generation of latex trees which do not depend on X11 (default latex trees are generated tikztrees
)
--benchmarking
Generates mutation matrix and mutation pattern files that can be used for automatic benchmarking of silico data (default False
)
--include
Provide a list of sample names that should be analyzed (e.g., --include PT1 PT2 PT3 PT4
)
--purities
Provide a list of externally estimated sample purities (e.g., --purities 0.7 0.3 0.9 0.8
). Requires --include
argument with the same ordering of samples.
--min_var_reads <>
and/or --min_vaf <>
Minimum VAF of a variant in at least one of the provided samples with a minimum number of variant reads
--min_var_cov <>
minimum coverage of a variant across all samples, otherwise the variant is excluded
Default parameter values as well as output directory can be changed in treeomics/settings.py
.
Moreover, the settings.py
provides more options an annotation of driver genes and configuration of plot output names.
All plots, analysis and logging files, and the HTML report will be in this output directory.
DRIVER_PATH
in treeomics/settings.py
. As default list, the union of reported driver genes by 20/20+, TUSON, and MutsigCV from Tokheim et al. (PNAS, 2016) is used (see input/Tokheim_drivers_union.csv
). Any CSV-file can be used as long as there is column named 'Gene_Symbol'. Variants in these provided genes will be highlighted in the HTML report as well as in the inferred phylogeny.DRIVER_PATH
) will be check if they occurred in the reported region in the given CSV-file (default input/cancer_gene_census_grch37_v80.csv
; CGC version 80, reference genome hg19).Example 1:
$ treeomics -r input/Makohon2017/Pam03_1-10_mutant_reads.txt -s input/Makohon2017/Pam03_1-10_phredcoverage.txt -n Pam03N3 -e 0.005
Reconstructs the phylogeny of pancreatic cancer patient Pam03 based on targeted sequencing data of 5 distinct liver metastases, 3 distinct lung metastases, and 2 samples of the primary tumor.
Example 2:
$ treeomics -r input/Bashashati2013/Case5_mutant_reads.txt -s input/Bashashati2013/Case5_coverage.txt -e 0.01
Reconstructs the phylogeny of the high-grade serous ovarian cancer of Case 5 in Bashashati et al. (2013).
Example 3:
$ treeomics -v input/example.vcf
Reconstructs the phylogeny of a simulated cancer with 6 metastases from a given VCF file (see input/example.vcf).
Regarding the VCF file input format, Treeomics expects the standard columns: #CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT as well as for each considered sample an additional column.
Minimally AD (Allelic depth) has to be provided in the FORMAT column and then the actually observed number of reference and alternate alleles in each sample in their corresponding columns).
The generated output can be found in output/example_output
and the corresponding Treeomics report at output/example_output/example_6_e=0_01_c0=0_5_af=0_05_report.pdf.
========
If you have any questions, you can contact us (https://reiterlab.stanford.edu) and we will try to help.
Copyright (C) 2017 Johannes Reiter
Treeomics is licensed under the GNU General Public License, Version 3. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3 of the License. There is no warranty for this free software.
========
Author: Johannes Reiter, Stanford University, https://reiterlab.stanford.edu