Welcome to the GitHub repository for the following publication: Mapping the energetic and allosteric landscapes of protein binding domains (Faure AJ, Domingo J & Schmiedel JM et al., 2022)
Here you'll find an R package with all scripts to reproduce the figures and results from the computational analyses described in the paper.
To run the doubledeepms pipeline you will need the following software and associated packages:
The following software is optional:
Open R and enter:
# Install
if(!require(devtools)) install.packages("devtools")
devtools::install_github("lehner-lab/doubledeepms")
# Load
library(doubledeepms)
# Help
?doubledeepms
Fitness scores, thermodynamic models, pre-processed data and required miscellaneous files should be downloaded from here and unzipped in your project directory (see '_basedir' option) i.e. where output files should be written.
There are a number of options available for running the doubledeepms pipeline depending on user requirements.
Default pipeline functionality ('startStage' = 1) uses prefit thermodynamic models and fitness scores from DMS experiments (already processed with MoCHI and DiMSum respectively; see Required Data) to reproduce all figures in the publication.
Pipeline stage 0 ('doubledeepms_fit_thermo_model') fits thermodynamic models to DMS data for the specified domains ('_tmodelprotein'), using all available data or subsets of phenotypes/variants ('_tmodelsubset'). Parallel computing using job arrays is reccommended while running monte carlo simluations to determine confidence intervals of model-inferred free energies ('_tmodel_jobnumber'). Note: this stage can be resource intensive (up to 48h with 30GB of RAM for GB1).
Raw read processing is not handled by the doubledeepms pipeline. FastQ files (GSE184042) from paired-end sequencing of replicate deep mutational scanning (DMS) libraries before ('input') and after selection ('output') were processed using DiMSum (Faure and Schmiedel et al., 2020).
DiMSum command-line arguments and Experimental design files required to obtain variant counts from FastQ files are available here.
The top-level function doubledeepms() is the recommended entry point to the pipeline and by default reproduces the figures and results from the computational analyses described in the following publication: Mapping the energetic and allosteric landscapes of protein binding domains (Faure AJ, Domingo J & Schmiedel JM et al., 2022). See Required Data for instructions on how to obtain all required data and miscellaneous files before running the pipeline.
This stage ('doubledeepms_fit_thermo_model') fits thermodynamic models to variant fitness data from (ddPCA) DMS.
This stage ('doubledeepms_thermo_model_results') evaluates thermodynamic model results and performance including comparing to literature in vitro measurements (related to Figure 2).
This stage ('doubledeepms_structure_metrics') annotates single mutant inferred free energies with PDB structure-derived metrics.
This stage ('doubledeepms_fitness_plots') plots fitness distributions and scatterplots (related to Figure 1).
This stage ('doubledeepms_fitness_heatmaps') plots single mutant fitness heatmaps (related to Figure 1).
This stage ('doubledeepms_free_energy_scatterplots') plots single mutant free energy scatterplots (related to Figure 3).
This stage ('doubledeepms_free_energy_heatmaps') plots single mutant free energy heatmaps (related to Figure 3).
This stage ('doubledeepms_protein_stability_plots') produces protein stability plots (related to Figure 4).
This stage ('doubledeepms_interface_mechanisms') produces binding free energy heatmaps for selected GRB2-SH3 residues (related to Figure 5).
This stage ('doubledeepms_allostery_plots') produces allostery plots (related to Figure 5 and Figure 6).
This stage ('doubledeepms_allostery_scatterplots') produces free energy scatterplots of major allosteric sites and mutations (related to Figure 5 and Figure 6).
This stage ('doubledeepms_downsampling_analysis') evaluates thermodynamic model results and performance after downsampling (related to Figure 2).
This stage ('doubledeepms_foldx_comparisons') compares inferred folding free energy changes to those predicted by FoldX.
This stage ('doubledeepms_polyphen2_comparisons') compares inferred folding free energy changes to PolyPhen2 predictions of functional effects.
This stage ('doubledeepms_3did_comparisons') tests the enrichment of allosteric mutations at interaction interfaces as annotated by the database of three-dimensional interacting domains (3did).
This stage ('doubledeepms_eve_comparisons') compares inferred folding free energy changes to EVE predictions of functional effects.