plinder-org / plinder

Protein Ligand INteraction Dataset and Evaluation Resource
https://plinder.sh
Apache License 2.0
170 stars 9 forks source link

System 5gr1__1__1.A__1.D missing template ligand to holo ligand atom order stacks #76

Closed rachitk closed 1 month ago

rachitk commented 1 month ago

It seems that system 5gr1__1__1.A__1.D is missing valid data in ligand_template2resolved_atom_order_stacks (returning an empty atom stack for both the template ligand and holo ligand atom orders.

Is this intended or is this an error? Curious if there is something I should do to work around this (currently, I just skip those systems, but I'm not sure if there's a better or recommended approach).

rachitk commented 1 month ago

After looking at the log, it seems this is a problem for more than just this one system (there are a few others with the same or similar issues seemingly resulting from a failure to align the template to the ligand mol)

2024-10-14 22:15:00,174 | plinder.core.structure.atoms:214 | WARNING : get_template_to_mol_matches: could not match template fully - retry with unmatched bonds set as UNSPECIFIED Too many matching bond pairs (1854) so can't continue. Too many matching bond pairs (1841) so can't continue.

For now, I'm skipping these systems, but I am curious if there is a filter I can apply when querying the plindex to avoid/prune these systems in advance.

maciejwisniewski-drugdiscovery commented 1 month ago

I also ran into this problem when parsing rdkit molecules from the holo structure. Are there any cached molecules somewhere, to skip generation of all conformations every time... 2024-10-15 15:11:50,521 | plinder.core.structure.atoms:140 | WARNING : generate_conformer: default EmbedMolecule - failed, try using useBasicKnowledge=False

OleinikovasV commented 1 month ago

@rachitk - thanks for reporting this - the issue was due to RascalMCES algorithm having too low of the default limit for maxBondMatchPairs - increased it in the new PR to 5000 and it fixes the given system. And should make it unlikely to re-occur, but please let me know if you still experience problems with this.

@maciejwisniewski-drugdiscovery - this is a feature request - it sounded reasonable and quick to add - included in the same PR. You can now load PlinderSystem with passing an additional flag skip_3d_confgen=True, eg:

system1 = PlinderSystem(system_id="5gr1__1__1.A__1.D", skip_3d_confgen=True)
system1.holo_structure.input_ligand_conformers['1.D']

This then simply returns 2D coordinates for the ligands without attempting to generate a reasonable 3D molecule - this way it is a couple times faster when loading a system, but should be used only for methods when the method is not expecting a 3D molecule as an input.

PR for the above issues: https://github.com/plinder-org/plinder/pull/77 You can wait it for it to be merged or you can already start testing it from the branch. Closing for now, but feel free to re-open when appropriate. :)