Jacek Karolczak, Anna Przybyłowska, Konrad Szewczyk, Witold Taisner, John M. Heumann, Michael H.B. Stowell, Michał Nowicki, Dariusz Brzezinski
Accurately identifying ligands plays a crucial role in structure-guided drug design. Based on density maps from X-ray diffraction or cryogenic-sample electron microscopy (cryoEM), scientists verify whether small-molecule ligands bind to active sites. However, the interpretation of density maps is challenging, and cognitive bias can sometimes mislead investigators into modeling fictitious compounds. Ligand identification can be aided by automatic methods, but existing approaches are available only for X-ray diffraction. Here, we propose to identify ligands using a deep learning approach that treats density maps as 3D point clouds. We show that the proposed model is on par with existing methods for X-ray crystallography while also being applicable to cryoEM density maps. Our study demonstrates that electron density map fragments can be used to train models that can be applied to cryoEM structures, but also highlights challenges associated with the standardization of electron microscopy maps and the quality assessment of cryoEM ligands.
In the repository, we provide the code for the experiments conducted in the paper, including model implementations and
transformations for generating datasets.
To reproduce the results, use scripts from the scripts
directory.
Configuration files for the experiments are available in the cfg
directory.
We provide weights of the model trained on cryoEM and X-ray crystallography
as model.pt
(link).
Presented below are schematics of deep learning architectures used to predict ligands:
All the architectures were modified to take as input the same sample of 2000 voxels (or less in case of ligands is described by default by smaller number of voxels) and output the probability scores of all the studied 219 ligand groups.
Here are some snapshots of ligand identifications made by the proposed MinkLoc3Dv2 model.
Each ligand is labeled by its Chemical Component Dictionary ID, structure resolution, and (in parentheses) the PDB ID, chain, and residue number. X-ray diffraction ligands shown in green mesh based on Fo-Fc maps contoured at 2.8σ calculated after removal of solvent and other small molecules (including the ligand) from the model. CryoEM ligands depicted in pink mesh based on difference maps contoured according to the proposed automatic density thresholding method (13.642, 3.385, 17.997, 7.850, and 5.613 V for panels F–J, respectively). The white mesh in panel J shows a manually selected contour threshold of 11.000 V. Atomic coordinates were taken from the PDB deposits.
The model trained on blobs from cryoEM and X-ray crystallography can be tested without the need to install anything. The model is deployed as a Streamlit app under the link ligands.cs.put.poznan.pl.
The Ligand Classification API provides endpoints for classifying ligands from 3D point cloud data using a model trained on all the data mentioned in the paper, including blobs from cryoEM and X-ray crystallography. The API supports various file formats for point cloud input and returns the top 10 predicted ligand classes along with their probabilities.
Each user is limited to one request per second.
Base URL: http://ligands.cs.put.poznan.pl
[GET]
http://ligands.cs.put.poznan.pl/api
Checks if the Ligand Classification API is operational.
200: Success
500: Server Error
[POST]
http://ligands.cs.put.poznan.pl/api/predict
Classifies the uploaded 3D point cloud data and returns the top 10 most likely ligand classes along with their respective probabilities.
file (string
, binary
, required):
Supported formats: .npy
, .npz
, .pts
, .xyz
, .txt
, .csv
rescale_cryoem (string
, optional):
Indicates whether to rescale the cryoEM data. Accepts "true"
or "false"
.
Example: "false"
resolution (number
, optional):
The resolution value for cryoEM data rescaling. Required if rescale_cryoem
is "true"
.
Example: 1.5
200: Successfully classified ligand
400: Bad Request (Multiple Possible Errors)
500: Internal Server Error
To simplify the setup and ensure consistency, we provide a Docker configuration that includes all necessary dependencies.
Ensure you have the following installed:
sudo chmod 744 ./start.sh ./stop.sh
docker/.env
file:
PYTORCH
, CUDA
, and CUDNN
settings if needed (for GPU use).DATA_PATH
to point to your data directory. Default is ../../data/
../start.sh
./start.sh cpu
./stop.sh
./stop.sh cpu
All the data necessary to reproduce results is available at Zenodo.
Repository with code for extracting ligands from CryoEM difference maps is a submodule of this repository, but can be also found here.
Additionally, the preprocessed data (uniformly sampled and max pooled 2000 points per ligand) that were used to train the final model are available here.
@article {Karolczak2024.08.27.610022,
author = {Karolczak, Jacek and Przyby{\l}owska, Anna and Szewczyk, Konrad and Taisner, Witold and Heumann, John M. and Stowell, Michael H.B. and Nowicki, Micha{\l} and Brzezinski, Dariusz},
title = {Ligand Identification using Deep Learning},
elocation-id = {2024.08.27.610022},
year = {2024},
doi = {10.1101/2024.08.27.610022},
publisher = {Cold Spring Harbor Laboratory},
abstract = {Motivation Accurately identifying ligands plays a crucial role in the process of structure-guided drug design. Based on density maps from X-ray diffraction or cryogenic-sample electron microscopy (cryoEM), scientists verify whether small-molecule ligands bind to active sites of interest. However, the interpretation of density maps is challenging, and cognitive bias can sometimes mislead investigators into modeling fictitious compounds. Ligand identification can be aided by automatic methods, but existing approaches are available only for X-ray diffraction and are based on iterative fitting or feature-engineered machine learning rather than end-to-end deep learning.Results Here, we propose to identify ligands using a deep learning approach that treats density maps as 3D point clouds. We show that the proposed model is on par with existing machine learning methods for X-ray crystallography while also being applicable to cryoEM density maps. Our study demonstrates that electron density map fragments can be used to train models that can be applied to cryoEM structures, but also highlights challenges associated with the standardization of electron microscopy maps and the quality assessment of cryoEM ligands.Availability Code and model weights are available on GitHub at https://github.com/jkarolczak/ligands-classification. Datasets used for training and testing are hosted at Zenodo: 10.5281/zenodo.10908325.Contact dariusz.brzezinski{at}cs.put.poznan.plCompeting Interest StatementThe authors have declared no competing interest.},
URL = {https://www.biorxiv.org/content/early/2024/08/28/2024.08.27.610022},
eprint = {https://www.biorxiv.org/content/early/2024/08/28/2024.08.27.610022.full.pdf},
journal = {bioRxiv}
}