We are a drug discovery community project :hugs:
Lets crack the 100+ ns MD, 30000+ protein-ligand structures and a whole new world of AI models for drug discovery together.
Want to get hands-on for drug discovery using AI?
Check out our Hugging Face spaces to run and visualize the adaptability model and to perform QM property predictions.
In this repository, we show how to download and apply the Misato database for AI models. You can access the calculated properties of different protein-ligand structures and use them for training in Pytorch based dataloaders. We provide a small sample of the dataset along with the repo.
You can freely download the FULL MISATO dataset from Zenodo using the links below:
wget -O data/MD/h5_files/MD.hdf5 https://zenodo.org/record/7711953/files/MD.hdf5
wget -O data/QM/h5_files/QM.hdf5 https://zenodo.org/record/7711953/files/QM.hdf5
Start with the notebook src/getting_started.ipynb to :
We recommend to pull our MISATO image from DockerHub or to create your own image (see docker/). The images use cuda version 11.8. We recommend to install on your own system a version of CUDA that is a least 11.8 to ensure that the drivers work correctly.
# clone project
git clone https://github.com/t7morgen/misato-dataset.git
cd misato-dataset
For singularity use:
# get the container image
singularity pull docker://sab148/misato-dataset
singularity shell misato.sif
For docker use:
sudo docker pull sab148/misato-dataset:latest
bash docker/run_bash_in_container.sh
βββ data <- Project data
β βββMD
β β βββ h5_files <- storage of dataset
β β βββ splits <- train, val, test splits
β βββQM
β β βββ h5_files <- storage of dataset
β β βββ splits <- train, val, test splits
β
βββ src <- Source code
β βββ data
β β βββ components <- Datasets and transforms
β β βββ md_datamodule.py <- MD Lightning data module
β β βββ qm_datamodule.py <- QM Lightning data module
β β β
β β βββ processing <- Skripts for preprocessing, inference and conversion
β β βββ...
β βββ getting_started.ipynb <- notebook : how to load data and interact with it
β βββ inference.ipynb <- notebook how to run inference
β
βββ docker <- Dockerfile and execution script
βββ README.md
In case you want to use conda for your own installation please create a new misato environment.
In order to install pytorch geometric we recommend to use pip (within conda) for installation and to follow the official installation instructions:pytorch-geometric/install
Depending on your CUDA version the instructions vary. We show an example for the CUDA 11.8.
conda create --name misato python=3
conda activate misato
conda install -c anaconda pandas pip h5py
pip3 install torch --index-url https://download.pytorch.org/whl/cu118 --no-cache
pip install joblib matplotlib
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
pip install pytorch-lightning==1.8.3
pip install torch-geometric
pip install ipykernel==5.5.5 ipywidgets==7.6.3 nglview==2.7.7
conda install -c conda-forge nb_conda_kernels
To run inference for MD you have to install ambertools. We recommend to install it in a separate conda environment.
conda create --name ambertools python=3
conda activate ambertools
conda install -c conda-forge ambertools nb_conda_kernels
pip install h5py jupyter ipykernel==5.5.5 ipywidgets==7.6.3 nglview==2.7.7
If you found this work useful please consider citing the article.
@article{siebenmorgen2024misato,
title={MISATO: machine learning dataset of protein--ligand complexes for structure-based drug discovery},
author={Siebenmorgen, Till and Menezes, Filipe and Benassou, Sabrina and Merdivan, Erinc and Didi, Kieran and Mour{\~a}o, Andr{\'e} Santos Dias and Kitel, Rados{\l}aw and Li{\`o}, Pietro and Kesselheim, Stefan and Piraud, Marie and Theis, Fabian J. and Sattler, Michael and Popowicz, Grzegorz M.},
journal={Nature Computational Science},
pages={1--12},
year={2024},
publisher={Nature Publishing Group US New York}
}