zhanglabNKU / DAttProt

5 stars 0 forks source link

DAttProt: A Double Attention Model for Enzyme Protein Classification Based on Transformer Encoders and Multi-Scale Convolutions

Code for our paper "DAttProt: A Double Attention Model for Enzyme Protein Classification Based on Transformer Encoders and Multi-Scale Convolutions"

Requirements

The code has been tested running under Python 3.6.4 and 3.7.4, with the following packages and their dependencies installed:

Python 3.6.4 3.7.4 Comment
numpy 1.19.4 1.16.5
torch 1.7.1 1.3.1 We write a Transformer encoder module for torch < 1.3, it is however not compatible with nn.TransformerEncoder
pandas 1.1.5 0.25.1
lxml 4.2.5 4.4.1
tqdm 4.51.0 4.36.1 This package is not obligatory, you can remove from tqdm import tqdm and simply change for element in tqdm(iterable) to for element in iterable

Datasets

Dataset files applied in pre-training and fine-tuning are all large files and not included in this repository. The following dataset files are required:

Usage

Dataset Pre-processing

Generate .npy dataset files to /dataset/SwissProt and DEEPre and ECPred/

git clone https://github.com/zhanglabNKU/DAttProt.git
cd DAttProt/utils/dataset_utils/
python SwissProt.py
python DEEPre.py
python ECPred.py

Pre-training

If you want to modify hyper-parameters, modify codes at the beginning of pretrain.py, please refer to the annotation of each parameter.

# continue from bash commands above
cd ../../
python pretrain.py

Fine-tuning

If you want to modify hyper-parameters, modify codes at the beginning of deepre.py and ecpred.py, please refer to the annotation of each parameter.

# continue from bash commands above
python deepre.py
python ecpred.py

Here is an example of the mainclasses variable setting in DEEPre.py:

Interpretability Analysis

/utils/motif_utils/generate_motif_test_data.py selects sequences from DEEPre and ECPred datasets whose motif features are annotated in the Swiss-Prot database and generates /utils/motif_utils/dataset/motif.json and motif.pkl.

# continue from bash commands above
cp saved_models/{model_name}.pkl utils/motif_utils/saved_models/{model_name}.pkl
cd utils/motif_utils/
python generate_motif_test_data.py
python double_attention.py