mptouzel / bayes_diffexpr

code associated with publication: Inferring the immune response from repertoire sequencingPuelma Touzel, M PLoS Comp Bio 2020
5 stars 0 forks source link

bayes_diffexpr

Bayesian Inference of Differential Expression

This branch has a compact version of the code used in Puelma Touzel M, Walczak AM, Mora T (2020) Inferring the immune response from repertoire sequencing. PLOS Computational Biology 16(4): e1007873. https://doi.org/10.1371/journal.pcbi.1007873 Please refer to the depaper branch for the actual codebase associated with paper.

infer_diffexpr_main.py takes 2 pairs of .txt files as input.

Each file is a table with each row referring to one observed clone. What is pulled from these files is the nucleotide and amino acid sequence as well as the observed clone count (n.b. file header information, e.g. column order, as well as the paths to these files, has been hard-coded to work with a particular dataset; any application to another dataset with require changing this information). Each pair is merged into a data set of pair counts, one sample for each observed clone.

Using functions in infer_diffexpr_lib.py, infer_diffexpr_main.py learns a null model of variability based on the first pair (e.g. two replicates), and then learns the parameters of a distribution of log fold change, using the second pair. Finally, the script computes posteriors of log fold change for each clone and writes a csv table of summary statistics for these posteriors.

See the top of infer_diffexpr_lib.py and infer_diffexpr_main.py for required libraries. All are standard.

plot_output.ipynb reads in the outputed data and plots the likelihood surface over the computed grid of parameter values of the particular P(s) used.