Closed Ninjani closed 1 month ago
Click to see where and how coverage changed
File Statements Missing Coverage Coverage
(new stmts)Lines missing
src/plinder/core
__init__.py
src/plinder/data
__init__.py
get_system_annotations.py
splits.py
256-257, 661
src/plinder/data/pipeline
config.py
io.py
154-161, 190
utils.py
src/plinder/data/utils/annotations
aggregate_annotations.py
167, 339, 349, 1212-1213
get_ligand_validation.py
interaction_utils.py
461
ligand_utils.py
330, 426, 1099, 1164, 1214
rdkit_utils.py
397, 399, 414
save_utils.py
src/plinder/eval/docking
utils.py
Project Total
This report was generated by python-coverage-comment-action
TODO:
Crystal contacts
- [ ] add fraction of crystal contacts as part of validation criteria
- [ ] add number of atoms to
symmetry_mate_contacts
instead of residues- [ ] add tests
Binding affinity
- [ ] drop IC50
- [ ] add to split as prioritization
Split criteria
- [ ] add min/max number of pocket residues and interactions for test
system.cif
* [ ] remove `system.cif` * [ ] add unique plip counter
Was trying these out, and both not immediately trivial:
system.cif removal itself is easy but needs changing final_structure_qc
to load receptor and ligand separately. @yusuf1759 would need changing all the complex_path
s.
similarity scoring for unique pli, what's the strategy here - do we count each type of interaction once for each residue or just the residues themselves? e.g
system_1: {res1: [hbond, hbond, saltbridge], res2: [hydrophobic, hbond]}
system_2: {res1: [hbond, saltbridge, saltbridge, hydrophobic], res2: [hbond], res3: [hydrophobic]}
do we want similarity to be 2/2 for system_1 vs system_2 since they both share res1 and res2 as interacting residues (irrespective of actual interactions), or we want to compare system_1_res1: {hbond, saltbridge} vs system_2_res1: {hbond, saltbridge, hydrophobic} (i.e taking the set and ignoring the count)? Or both with former being pocket_interacting_qcov
and latter being pli_unique_qcov
?
system_1: {res1: [hbond, hbond, saltbridge], res2: [hydrophobic, hbond]} system_2: {res1: [hbond, saltbridge, saltbridge, hydrophobic], res2: [hbond], res3: [hydrophobic]}
do we want similarity to be 2/2 for system_1 vs system_2 since they both share res1 and res2 as interacting residues (irrespective of actual interactions), or we want to compare system_1_res1: {hbond, saltbridge} vs system_2_res1: {hbond, saltbridge, hydrophobic} (i.e taking the set and ignoring the count)? Or both with former being
pocket_interacting_qcov
and latter beingpli_unique_qcov
?
@Ninjani, this is a good question. I like pocket_interacting_qcov
for matched residues that are interacting - but as long as there is at least one matched interaction, eg. if there is a 'hydrophobic' vs 'salt bridge' - I do not consider that it would be reasonable to match them. These would be already matched by the "neighbouring" residues metric of pocket_qcov
, so, matching interaction type seems more reasonable to me.
The pli_unique_qcov
would be the same as pli_qcov
but only counting each unique match once.
@OleinikovasV I've implemented pli_unique_qcov
already, would consider deferring pocket_interacting_qcov
and the removal of system.cif
to a later stage so we can do the rerun and have the new test set asap.
TODO:
symmetry_mate_contacts
instead of residues