petrelharp / context

Context-dependent mutation rate inference machinery.
0 stars 0 forks source link

Inference with context-dependent models of nucleotide substitution

This method is described in the paper Enabling Inference for Context-Dependent Models of Mutation by Bounding the Propagation of Dependency, by Erick Matsen and Peter Ralph (2022) - preprint here.

Description of the method

Words and math describing the method are in the subdirectory writeups/.

R packages

The method is implemented in an efficient way in pure R through the use of sparse matrices, making use of the fact that once certain structures are set up, the values of the relevant matrices can be computed using fast linear algebra. R functions implementing various aspects of the method are in three packages:

The code structure and key functions are described in this file.

To install these, run

library(devtools)
install_github("petrelharp/context/contextual")
install_github("petrelharp/context/simcontext")
install_github("petrelharp/context/contextutils")

or else to do it from a local copy:

git clone https://github.com/petrelharp/context.git
cd context
Rscript -e 'library(devtools); install("contextual"); install("simcontext"); install("contextutils")'

Simulation requires some Bioconductor packages; see below for how to install those.

Command-line scripts

The general strategy for inference and visualization is:

  1. Configuration in json files
  2. Computation with R scripts: run Rscript (scriptname) --help for options
  3. Visualization using templated Rmarkdown files

These are in the scripts/ directory. The most useful ones are:

Computation:

Visualization:

Compilation and comparison of different models:

Example models

Full analysis pipelines, from simulation to inference and visualization, are implemented for several example models (see writeup for descriptions). The first few are motivated by statstical physics, not DNA, but serve as good examples. In each directory are shell scripts (usually workflow.sh) that demonstrate the workflow.

Prerequisites:

To install the prerequisites separately:

install.packages(c("expm", "mcmc", "stringdist", "optparse", "jsonlite", "ape", "rmarkdown", "ggplot2", "pander"))
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("Biostrings", "IRanges", "S4Vectors"))