PLINDER, short for protein ligand interactions dataset and evaluation resource, is a comprehensive, annotated, high quality dataset and resource for training and evaluation of protein-ligand docking algorithms:
The PLINDER project is a community effort, launched by the University of Basel, SIB Swiss Institute of Bioinformatics, VantAI, NVIDIA, MIT CSAIL, and will be regularly updated.
To accelerate community adoption, PLINDER will be used as the field’s new Protein-Ligand interaction dataset standard as part of an exciting competition at the upcoming 2024 Machine Learning in Structural Biology (MLSB) Workshop at NeurIPS, one of the field's premiere academic gatherings. More details about the competition will be announced shortly.
We version the plinder
dataset with two controls:
PLINDER_RELEASE
: the month stamp of the last RCSB syncPLINDER_ITERATION
: value that enables iterative development within a releaseWe version the plinder
application using an automated semantic
versioning scheme based on the git
commit history.
The plinder.data
package is responsible for generating a dataset
release and the plinder.core
package makes it easy to interact
with the dataset.
2024-06/v2 (Current):
2024-04/v1: Version described in the preprint, with updated redundancy removal by protein pocket and ligand similarity.
2024-04/v0: Version used to re-train DiffDock in the paper, with redundancy removal based on \<pdbid>_\<ligand ccd codes>
As part of PLINDER resource we provide train, validation and test splits that are
curated to minimize the information leakage based on protein-ligand interaction
similarity.
In addition, we have prioritized the systems that has a linked experimental apo
structure or matched molecular series to support realistic inference scenarios for hit
discovery and optimization.
Finally, a particular care is taken for test set that is further prioritized to contain
high quality structures to provide unambiguous ground-truths for performance
benchmarking.
Moreover, as we enticipate this resource to be used for benchmarking a wide range of methods, including those simultaneously predicting protein structure (aka. co-folding) or those generating novel ligand structures, we further stratified test (by novel ligand, pocket, protein or all) to cover a wide range of tasks.
The PLINDER dataset is provided in two ways:
plinder
Python package for interfacing the data.The dataset can be downloaded from the bucket with gsutil.
$ export PLINDER_RELEASE=2024-06 # Current release
$ export PLINDER_ITERATION=v2 # Current iteration
$ mkdir -p ~/.local/share/plinder/${PLINDER_RELEASE}/${PLINDER_ITERATION}/
$ gsutil -m cp -r "gs://plinder/${PLINDER_RELEASE}/${PLINDER_ITERATION}/*" ~/.local/share/plinder/${PLINDER_RELEASE}/${PLINDER_ITERATION}/
For details on the sub-directories, see Documentation.
plinder
is available on PyPI.
pip install plinder
A more detailed description is available on the documentation website.
Durairaj, Janani, Yusuf Adeshina, Zhonglin Cao, Xuejin Zhang, Vladas Oleinikovas, Thomas Duignan, Zachary McClure, Xavier Robin, Gabriel Studer, Daniel Kovtun, Emanuele Rossi, Guoqing Zhou, Srimukh Prasad Veccham, Clemens Isert, Yuxing Peng, Prabindh Sundareson, Mehmet Akdel, Gabriele Corso, Hannes Stärk, Gerardo Tauriello, Zachary Wayne Carpenter, Michael M. Bronstein, Emine Kucukbenli, Torsten Schwede, Luca Naef. 2024. “PLINDER: The Protein-Ligand Interactions Dataset and Evaluation Resource.” bioRxiv ICML'24 ML4LMS
Please see the citation file for details.