realHXiao / GraphKM

KM prediction model
0 stars 1 forks source link

GraphKM: machine and deep learning for KM prediction of wildtype and mutant enzymes

image

Introduction

The GraphKM toolbox is a Python package for prediction of KMs.

Requirements

Assuming that you use Miniconda or Anaconda. In a terminal execute:

conda env create -n GraphKM python=3.8
conda activate GraphKM

Requirement packages:

paddlehelix==1.0.1
pgl==2.2.4
paddlepaddle-gpu==2.3.2
matplotlib
scikit-learn
rdkit
PubChemPy
xgboost==1.7.5
hyperopt==0.2.7
ESM

Note: paddlepaddle-gpu==2.3.2 is installed by command line conda install paddlepaddle-gpu==2.3.2 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge.

Please refer to this github site for ESM installation.

Input files

Before data preprocessing, a json file and a csv file should be ready. The json file and the csv file is generated by KM_data_clean/generate_esm_vector_gpu.py. Run following codes:

python generate_esm_vector_gpu.py -i my_data.json -o sequences_embeddings.csv 

Train

Preprocess

python data_preprocess.py -i my_data.json -l KM -input_seq my_protein_sequences_embeddings.csv -o my_dataset.npz

Training

The training needs big memory if you use GPU for acceleration. Suggestion that the memory of your GPU is 24 GB.

python train.py -d path_to/my_dataset.npz --model_config path_to/gin_config.json -l KM -- model_dir path_to/ --results_dir path_to/

python train_xgb.py -i path_to/my_data.json -l KM -input_seq path_to/my_protein_sequences_embeddings.csv -m path_to/best_model_gin_-1_lr0.0005.pdparams --model_config path_to/gin_config.json

Training results

Methods MSE r.m.s.e. R2
GIN-based 0.639 0.799 0.614
GAT-based 0.709 0.842 0.572
GCN-based 0.671 0.819 0.595
GAT_GCN-based 0.627 0.792 0.622

Note: The trained models are available in the Figshare database with DOI: 10.6084/m9.figshare.25335049.

Prediction

The input for prediction.py:

Tip

Enter -h tag for more helps.

python data_preprocess.py -h
python train.py -h
python train_xgb.py -h
python prediction.py -h

Citation

He, X., Yan, M. GraphKM: machine and deep learning for KM prediction of wildtype and mutant enzymes. BMC Bioinformatics 25, 135 (2024). https://doi.org/10.1186/s12859-024-05746-1