Question on Docking Multi-Ligands and DiffDock Dataset Splits

danielzeng-gt commented 2 months ago

Hi Plinder Team,

Thanks once again for the great effort. A few questions have come up as I've been looking into the data more:

Handling Multiple Ligands: For the test set, how should we approach rigid docking when multiple ligands are present? I've observed that in cases with multiple ligands, some have the ligand_is_ion flag set to True, and others have ligand_is_cofactor as True. Should we treat cofactors and ions as rigid and flexible-dock the remaining ligands simultaneously. Is it correct to assume that if a ligand is neither a cofactor nor an ion, in the test scenario, would it be treated flexibly as the ligand of interest for docking?
Clarification on DiffDock Split: I'm trying to understand the exact split used for the DiffDock evaluation. The repository mentions that "a non-redundant and single-ligand smaller subset of this version was used to train DiffDock in the paper and is available at 2024-04/v0." However, my analysis shows that only about 80% of the test set has system_pass_validation_criteria=True, which seems to differ from what was reported in the paper, e.g. in Algorithm 1 and Table 2. Could you clarify which exact split was used for the DiffDock evaluation, and whether it has changed since publication?

For reference, here's the analysis notebook: https://colab.research.google.com/drive/1AC0QlrlNRFoJjgTfZJXVWT1UKUROEwAN
suitable_for_ml_training tag: I noticed that the suitable_for_ml_training tag is included in the metadata but doesn't seem to be used. Are there any plans to filter datasets by this tag in the future?

Thanks for taking the time!

OleinikovasV commented 2 months ago

@danielzeng-gt , thanks a lot for your interest! Please, find the comments below. If this answers your questions, please feel free to close the issue.

Handling Multiple Ligands:

We would like to enable the developers to come up with their own strategy when it comes to solving the problem of multiple ligand pose prediction rather than prescribe a particular flow or task. One could either consider predicting ligand pose one at a time or simultaneously.

Thus, if the trained algorithm does not have a feature to predict more than one ligand at once - one could predict poses sequentially or in parallel. Alternatively, one could choose to limit the training and predictions only to single ligand system applications, using the annotations table to filter the appropriate systems. We plan to provide an example for such queries soon in our example notebooks.

In terms of PLINDER system definition - we only consider as system holo, if it has at least one "proper" ligand (i.e. ligand that is not ligand_is_ion or ligand_as_artifact), otherwise the system is classified as apo. Thus, when dealing with multiple ligands, one could choose to not consider such ligand entries as part of prediction.

Keeping the ions or cofactors fixed - is also an option - but this would limit the general applicability of the tool due to the requirement of having these components specified in the receptor (ideally apo, for the realistic inference). However, it may be considered useful for different tasks, eg. ligand optimization scenarios, where a similar ligand (eg. matched molecular series) binding structure is known. It is important to note though that for multi-ligand system prediction - if the final structure of all ligands are part of evaluation - using fixed ligands would not provide an accurate assessment of the method performance.

Clarification on DiffDock Split:

We have used slightly different criteria (as specified in the paper) than the system_pass_validation_criteria=True when performing PLINDER splits, which explain the observed differences for the pl50 split. We would like to highlight that we have now introduced (2024-06/v2) split that further refined the quality curation, notably now including filtering for ligand crystal contacts. We found that in our earlier version as many as around 17 percent of the systems in test had at least one ligand atom in crystal contact (within 5 A heavy atom distance with a lattice neighbour). We strongly encourage to use (2024-06/v2) for further method development. We plan to use this split for the MLSB benchmark to compare different method performance.

suitable_for_ml_training tag:

Thanks for noting this. We have included this tag at the early iteration, and since then we have improved significantly the way our systems are being processed and saved. This tag will be retired in the future iterations to avoid confusion, as our intention is to make the entire dataset to be considered "suitable", albeit varied quality.

danielzeng-gt commented 2 months ago

Thank you for the detailed response and for addressing my questions!

I have 2 followups: Regarding 1): For the MLSB benchmark, will test set performance be assessed on single-ligand predictions, multi-ligand system predictions, or both? If both tracks are included, will models trained on multi-ligand systems be eligible for the single-ligand prediction track as well?

Regarding 2): Based on this, is the Plinder team planning to retrain and evaluate Diffdock on the 2024-06/v2 dataset for comparison? If so, will it be on a subset of the 2024-06/v2 splits subsetted to only single-ligand systems?

yusuf1759 commented 2 months ago

Thanks for the feedback.

will test set performance be assessed on single-ligand predictions, multi-ligand system predictions, or both?

The MLSB challenge will be focusing on single-protein vs single-ligand flexible docking. However, methods can train on multi-ligand systems as well if their architecture allows it. We along with the MLSB team are working to put together all the details for the challenge to release shortly.

Retraining DiffDock on 2024-06/v2:

We understand that re-training DiffDock on the latest version of PLINDER will provide a useful basis for method comparison, and it's in our plan. Also, as part of the public leaderboard, we envision that multiple methods (including flexible and co-folding methods) will be evaluated on the same split, providing a standardized benchmark for comparing performance in publications and other research applications.

plinder-org / plinder

Question on Docking Multi-Ligands and DiffDock Dataset Splits #36