plinder-org / plinder

Protein Ligand INteraction Dataset and Evaluation Resource
https://plinder.sh
Apache License 2.0
140 stars 8 forks source link

Annotation updates #14

Closed Ninjani closed 1 month ago

Ninjani commented 2 months ago

TODO:

github-actions[bot] commented 2 months ago

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/plinder/core
  __init__.py
  src/plinder/data
  __init__.py
  get_system_annotations.py
  splits.py 256-257, 661
  src/plinder/data/pipeline
  config.py
  io.py 154-161, 190
  utils.py
  src/plinder/data/utils/annotations
  aggregate_annotations.py 167, 339, 349, 1212-1213
  get_ligand_validation.py
  interaction_utils.py 461
  ligand_utils.py 330, 426, 1099, 1164, 1214
  rdkit_utils.py 397, 399, 414
  save_utils.py
  src/plinder/eval/docking
  utils.py
Project Total  

This report was generated by python-coverage-comment-action

OleinikovasV commented 2 months ago

TODO:

  • Crystal contacts

    • [ ] add fraction of crystal contacts as part of validation criteria
    • [ ] add number of atoms to symmetry_mate_contacts instead of residues
    • [ ] add tests
  • Binding affinity

    • [ ] drop IC50
    • [ ] add to split as prioritization
  • Split criteria

    • [ ] add min/max number of pocket residues and interactions for test
Ninjani commented 2 months ago
* [ ]  remove `system.cif`

* [ ]  add unique plip counter

Was trying these out, and both not immediately trivial:

system.cif removal itself is easy but needs changing final_structure_qc to load receptor and ligand separately. @yusuf1759 would need changing all the complex_paths.

similarity scoring for unique pli, what's the strategy here - do we count each type of interaction once for each residue or just the residues themselves? e.g

system_1: {res1: [hbond, hbond, saltbridge], res2: [hydrophobic, hbond]}
system_2: {res1: [hbond, saltbridge, saltbridge, hydrophobic], res2: [hbond], res3: [hydrophobic]}

do we want similarity to be 2/2 for system_1 vs system_2 since they both share res1 and res2 as interacting residues (irrespective of actual interactions), or we want to compare system_1_res1: {hbond, saltbridge} vs system_2_res1: {hbond, saltbridge, hydrophobic} (i.e taking the set and ignoring the count)? Or both with former being pocket_interacting_qcov and latter being pli_unique_qcov?

OleinikovasV commented 1 month ago
system_1: {res1: [hbond, hbond, saltbridge], res2: [hydrophobic, hbond]}
system_2: {res1: [hbond, saltbridge, saltbridge, hydrophobic], res2: [hbond], res3: [hydrophobic]}

do we want similarity to be 2/2 for system_1 vs system_2 since they both share res1 and res2 as interacting residues (irrespective of actual interactions), or we want to compare system_1_res1: {hbond, saltbridge} vs system_2_res1: {hbond, saltbridge, hydrophobic} (i.e taking the set and ignoring the count)? Or both with former being pocket_interacting_qcov and latter being pli_unique_qcov?

@Ninjani, this is a good question. I like pocket_interacting_qcov for matched residues that are interacting - but as long as there is at least one matched interaction, eg. if there is a 'hydrophobic' vs 'salt bridge' - I do not consider that it would be reasonable to match them. These would be already matched by the "neighbouring" residues metric of pocket_qcov, so, matching interaction type seems more reasonable to me.

The pli_unique_qcov would be the same as pli_qcov but only counting each unique match once. image

Ninjani commented 1 month ago

@OleinikovasV I've implemented pli_unique_qcov already, would consider deferring pocket_interacting_qcov and the removal of system.cif to a later stage so we can do the rerun and have the new test set asap.