This algorithm is based on deep learning and a classical scoring function (Vina score) and is designed to optimize ligand conformations.
pytorch >= 1.10
conda install -c conda-forge spyrmsd
conda install numpy pandas
Liangzhen Zheng, Shanghai Zelixir Biotech Company Ltd, astrozheng@gmail.com
Zechen Wang, Shandong University, wangzch97@163.comIf you find our scripts useful, please consider citing the following paper:
@article{wang2023fully,
title={A fully differentiable ligand pose optimization framework guided by deep learning and a traditional scoring function},
author={Wang, Zechen and Zheng, Liangzhen and Wang, Sheng and Lin, Mingzhi and Wang, Zhihao and Kong, Adams Wai-Kin and Mu, Yuguang and Wei, Yanjie and Li, Weifeng},
journal={Briefings in Bioinformatics},
volume={24},
number={1},
pages={bbac520},
year={2023},
publisher={Oxford University Press}
}
The algorithm simultaneously optimizes multiple poses of a ligand, which must be generated by the same docking program and placed in the same directory in PDBQT format. The PDBQT files for proteins and ligands can be generated by MGLTools. The detailed process is as follows.
pythonsh prepare_receptor4.py -r protein.pdb -U lps -o protein.pdbqt
pythonsh prepare_ligand4.py -l ligand.mol2 -U lps -o ligand.pdbqt
The content of the input file is as follows
1gpn ./samples/1gpn/1gpn_protein_atom_noHETATM.pdbqt samples/1gpn/decoys
1syi ./samples/1syi/1syi_protein_atom_noHETATM.pdbqt samples/1syi/decoys
bash run_pose_optimization.sh inputs.dat
Finally, the program outputs the optimized ligand conformation ("final_optimized_cnfr.pdb") and the final score. In addition, the conformation changes and scores during optimization are recorded in the "optimized_traj.pdb" and "opt_data.csv" files, respectively.
python scripts/run.py \
-rec_fpath $rec_fpath \
-pose_fpath $pose_fpath \
-mean_std_file ../models/r6-r1_0.3-2.0nm_train_mean_std.csv \
-model ../models/bestmodel_cpu.pth \
-out_fpath $out_fpath
where rec_fpath, pose_fpath, and out_fpath represent the paths for the input protein pdbqt file, ligand pdbqt file, and the file where the scores will be stored, respectively. You can also directly run the "run_scoring.sh" file as follows:
bash run_scoring.sh $rec_fpath $pose_fpath $out_fpath
Here is a simple example to test this process, as follows:
bash run_scoring.sh samples/1bcu_protein_noHETATM.pdbqt samples/1bcu_decoys.pdbqt out.csv
Firstly, generate the ".pkl" file containing features and labels in advance before training. We provide the "generate_features.py" script in the "retrain" directory for creating the required features and labels for DeepRMSD. You can run:
python generate_features.py -inp inputs.dat -out data_label.pkl
each line in "inputs.dat" file represents a protein-ligand pair, specifying the protein-ligand id, protein file, poses file, and crystal ligand file, respectively.
We provide the "train.py" script in the "retrain" directory. You can run the following command to retrain DeepRMSD:
python train.py \
-train_file $train_file \
-valid_file $valid_file \
-device cuda:0
"train_file" and "valid_file" represent the training set and validation set, respectively, generated in the previous step.