matsengrp / ecgtheow

Ancestral lineage reconstruction using BEAST or RevBayes
2 stars 2 forks source link

This repository contains software components used for the manuscript Rapid development of an infant-derived HIV-1 broadly neutralizing antibody lineage by Simonich, Doepker, et al. It is not configured for use on data besides that used in the paper, although it could be adapted for other data with some work.

Requirements

Installation

Clone the repository using the following command:

git clone --recursive https://github.com/matsengrp/ecgtheow.git

Then, create the ecgtheow conda environment and compile RevBayes by running ./INSTALL in the ecgtheow root directory.

Example usage

Running scons executes the entire ancestral sequence reconstruction pipeline. An example is given below:

source activate ecgtheow
scons --simulate-data --T=50,105,150 --n=5,90,5 --lambda=2 --lambda0=0.365 --target-dist=15 --target-count=30 --carry-cap=1000 --skip-update=100 --nsims=1 --run-beast --run-revbayes

Remember that the conda environment must be activated before running scons. The most important command line arguments are described in the following table:

Command Description
--simulate-data Should we generate GC-simulated data?
--nsims The number of GC simulation runs (defaults to 1).
--cft-data Should we use CFT data?
--data-dir For CFT, what data directory should we use?
--sample For CFT, what data sample should we use?
--seed For CFT, what seed should we use?
--nprune For CFT, how many sequences should we keep from the clonal family (defaults to 100)?
--run-beast Should we run BEAST inference?
--run-revbayes Should we run RevBayes inference?
--mcmc-iter How many RevBayes MCMC iterations (100x for BEAST) should we use (defaults to 10000)?
--mcmc-thin What RevBayes MCMC thinning frequency (100x for BEAST) should we use (defaults to 10)?
--mcmc-burnin What amount of MCMC burnin should we use (defaults to 100)?
--asr-nfilters The comma-separated list of thresholds used to filter out infrequent ancestral sequence transitions in the MCMC samples (defaults to 50,100).

The entire set of command line arguments can be found by running scons --help.