labstructbioinf / rossmann-toolbox

Prediction and re-engineering of the cofactor specificity of Rossmann-fold proteins
MIT License
12 stars 3 forks source link
bioinformatics

Rossmann Toolbox

The Rossmann Toolbox provides two deep learning models for predicting the cofactor specificity of Rossmann enzymes based on either the sequence or the structure of the beta-alpha-beta cofactor binding motif.

Table of contents

Installation

Create a conda environment:

conda create --name rtb python=3.7
conda activate rtb

Install pip in the environment:

conda install pip

Install from PyPI:

pip install rossmann-toolbox

Alternatively, to get the most recent changes, install directly from the repository:

pip install git+https://github.com/labstructbioinf/rossmann-toolbox.git

Usage

Sequence-based approach

The input is a full-length sequence. The algorithm first detects Rossmann cores (i.e. the β-α-β motifs that interact with the cofactor) in the sequence and later evaluates their cofactor specificity:

import matplotlib.pylab as plt
from rossmann_toolbox import RossmannToolbox
rtb = RossmannToolbox(use_gpu=True)

# Eample 1
# The b-a-b core is predicted in the full-length sequence

data = {'3m6i_A': 'MASSASKTNIGVFTNPQHDLWISEASPSLESVQKGEELKEGEVTVAVRSTGICGSDVHFWKHGCIGPMIVECDHVLGHESAGEVIAVHPSVKSIKVGDRVAIEPQVICNACEPCLTGRYNGCERVDFLSTPPVPGLLRRYVNHPAVWCHKIGNMSYENGAMLEPLSVALAGLQRAGVRLGDPVLICGAGPIGLITMLCAKAAGACPLVITDIDEGRLKFAKEICPEVVTHKVERLSAEESAKKIVESFGGIEPAVALECTGVESSIAAAIWAVKFGGKVFVIGVGKNEIQIPFMRASVREVDLQFQYRYCNTWPRAIRLVENGLVDLTRLVTHRFPLEDALKAFETASDPKTGAIKVQIQSLE'}

preds = rtb.predict(data, mode='seq', core_detect_mode='dl', importance=False)

# Eample 2
# The b-a-b cores are provided by the user (WT vs mutant)

data = {'seq_wt': 'AGVRLGDPVLICGAGPIGLITMLCAKAAGACPLVITDIDEGR', # WT, binds NAD
        'seq_mut': 'AGVRLGDPVLICGAGPIGLITMLCAKAAGACPLVITSRDEGR'} # D211S, I212R mutant, binds NADP

preds, imps = rtb.predict(data, mode='core', importance=True)

# Example 3
# Which residues contributed most to the prediction of WT as NAD-binding?
seq_len = len(data['seq_wt'])
plt.errorbar(list(range(1, seq_len+1)),
             imps['seq_wt']['NAD'][0], yerr=imps['seq_wt']['NAD'][1], ecolor='grey')

Structure-based approach

Structure-based predictions are not currently available. We are working on a new version that will not only provide predictions, but also the ability to make specificity-shifting mutations.

EGATConv layer

The structure-based predictor includes an EGAT layer that deals with graph neural networks supporting edge features. The EGAT layer is available from DGL, and you can find more details about it in the DGL documentation. For a detailed description of the EGAT layer and its usage, please refer to the supplementary materials of the Rossmann Toolbox paper.

Remarks

How to cite?

If you find the rossmann-toolbox useful, please cite the paper:

Rossmann-toolbox: a deep learning-based protocol for the prediction and design of cofactor specificity in Rossmann fold proteins Kamil Kamiński, Jan Ludwiczak, Maciej Jasiński, Adriana Bukala, Rafal Madaj, Krzysztof Szczepaniak, Stanisław Dunin-Horkawicz Briefings in Bioinformatics, Volume 23, Issue 1, January 2022, bbab371

Contact

If you have any questions, problems or suggestions, please contact us.

Funding

This work was supported by the First TEAM program of the Foundation for Polish Science co-financed by the European Union under the European Regional Development Fund.