Open yusuf1759 opened 1 month ago
Click to see where and how coverage changed
File Statements Missing Coverage Coverage
(new stmts)Lines missing
src/plinder/core/index
system.py
362-365
src/plinder/core/loader
__init__.py
dataset.py
featurizer.py
transforms.py
utils.py
src/plinder/core/scores
index.py
src/plinder/core/split
plot.py
utils.py
src/plinder/core/structure
atoms.py
397-401
diffdock_utils.py
structure.py
vendored.py
src/plinder/core/utils
dataclass.py
src/plinder/eval/docking
utils.py
97-100
write_scores.py
Project Total
This report was generated by python-coverage-comment-action
Do you have the SDF file so that we could potentially fix this directly in OpenStructure?
I was able to reproduce the behavior by changing a bond type to 9 in an arbitrary SDF file manually.
There's a fix in OST now (upcoming 2.9.0 release branch) where you can set fault_tolerant=True (on the call to LoadSDF directly, or on the IO profile for LoadEntity) to force OST to read the file with the invalid bond type.
However I'm not sure exactly in which context this came up. RDKit itself doesn't like SDF files with a bond type 9, and if I read it with Chem.SDMolSupplier
I get a very similar warning:
[13:20:16] unrecognized query bond type, 9, found on line 16. Using an "any" query.
The bond type is then marked as unspecified in the resulting mol, not as dative:
>>> mol.GetBonds()[5].GetBondType()
rdkit.Chem.rdchem.BondType.UNSPECIFIED
I was also not able to trigger RDKit to save a V3000 file with a dative bond:
>>> mol.GetBonds()[5].SetBondType(Chem.rdchem.BondType.DATIVE)
>>> mol.GetBonds()[5].GetBondType()
rdkit.Chem.rdchem.BondType.DATIVE
>>> print(Chem.MolToMolBlock(mol))
Simple Ligand
RDKit 2D
6 6 0 0 1 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
1.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 1.0000 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.0000 1.0000 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0
2.0000 2.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.0000 -1.0000 0.0000 Cl 0 0 0 0 0 0 0 0 0 0 0 0
1 2 2 0
1 3 1 0
1 6 1 0
2 4 1 0
3 4 1 0
4 5 8 0
M CHG 1 1 1
M END
Clearly it looks like dative bonds (whatever they are) should not end up in SDF files to start with. Bond type 9 is not part of the SDF standard. I don't know how the invalid file was created.
Regarding the fix: I'm not sure what bond type number results from setting it to unspecified in RDKit. Bond types 4-8 should also not be in SDF files (they are reserved for queries). OpenStructure 2.9.0 will complain about it but read it anyways. It is unlikely to have any effect in any algorithm in OpenStructure as we ignore bond order throughout. It might affect other external tools you are using in Plinder, though.
Context: rdkit automatically saves any molecule with dative bond (e.g HEM) automatically as v3000 sdf. However,
ost
can't load v3000 files with DATIVE bond. ThrowsException: Bad bond line 100: Bond type number '9' not within accepted range (1-8).
Fix: Change DATIVE bond to UNSPECIFIED on the fly