yassermb / DLA-Mutation

Other
1 stars 0 forks source link

Contents

Overview

Deep Local Analysis (DLA)-Mutation, contrasts the patterns observed in two local cubes encapsulating the physico-chemical and geometrical environments around the wild-type and the mutant amino acids. The underlying self-supervised model (ssDLA) takes advantage of a large-scale exploration of non-redundant experimental protein complex structures in the Protein Data Bank (PDB) to learn the fundamental properties of protein-protein interfaces. The evolutionary constraints and conformational heterogeneity improves the performance of DLA-Mutation.

Features:

Requirements

Packages:

DLA-Ranker can be run on Linux, MacOS, and Windows. We recommend to use DLA-Ranker on the machines with GPU. It requires following packages:

All-in-one: Run conda create --name dla --file dla.yml

Tutorial

Representation learning with self-supervised Deep Local Analysis (ssDLA)

ssDLA is a structure-based general purpose model to generate informative representations from the local environments (masked or not-masked) around interfacial residues for downstream tasks.

Finding residue-specific patterns

We can use the pre-trained ssDLA model to predict the type of amino acid given a masked cube.

Generating masked locally oriented cubes
Example
|___complex_list.txt
|
|___complex_directory
    |
    |___complex 1
    |___complex 2
    |
    ..........

The output will be directory 'map_dir' with the following structure:

Example
|___map_dir
    |___complex 1
    |___complex 2
    ..........

Each output represents interface of a complex and contains a set of local environments (e.g. atomic density map, structure classes (S,C,R), ...)

An atomic density map is a 4 dimensional tensor: a voxelized 3D grid with a size of 24*24*24. Each voxel encodes some characteristics of the protein atoms. Namely, the first 167 dimensions correspond to the atom types that can be found in amino acids (without the hydrogen). This dimension can be reduced to 4 element symbols (C,N,O,S) by running python generate_cubes_reduce_channels_multiproc.py (ATTENTION: This code overwrites the existing files).

Predicting the type of masked residue

From directory 'Evaluation' run python test_xray.py or python test_xray_4channels.py depending on the number of channels.

It processes all the target complexes and produces csv files 'output_xray_wt_mask' ('output_xray_wt_mask_4channels') as the output and 'intermediate_xray_wt_mask_200' ('intermediate_xray_wt_mask_200_4channels') as the embedding vectors. Each row of the output file belongs to an interfacial residue of a target complex and has 10 columns separated by 'tab':

Name of the complex (complex)
Residue name (resname)
Structural region of the residue (resregion)
Residue number (resnumber; according to PDB)
Residue coordinate position (respos)
Receptor or ligand (partner)
The predicted vector of size 20 (prediction)
The one-hot encoding of the target residue (target)
Entropy of the predicted vector (entropy)
Cross-entropy between the predicted and target vectors (crossentropy)

Each row of the embedding file also belongs to an interfacial residue. Beside the information mentioned above, it has the feature vectors of size 200 extracted from each cube. This files serves as input for the downstream tasks (transfer learning with frozen weights).

Similar analysis can be performed on backrub models by running python test_backrub.py or python test_backrub_4channels.py depending on the number of channels.

Predicting mutation-induced changes of binding affinity

Example
|___backrub_directory
    |
    |___complex-mutation 1
    |   |   model 1
    |   |   model 2
    |   |   ...
    |
    |___complex-mutation 2
    |   |   model 1
    |   |   model 2
    |   |   ...
    |
    ..........

Downstream tasks

Predicting the physico-chemical class of interfacial residues

Predicting the function of the protein complex

Acknowledgement

We would like to thank Dr. Sergei Grudinin and his team for helping us with the initial source code of maps_generator and load_data.py. See Ornate.