kienerj / pycdxml

Tools to automatically convert and proccess cdx and cdxml files in python
GNU General Public License v3.0
35 stars 5 forks source link

Potential issue with representation of aromatic Ns with Hs #20

Closed baoilleach closed 1 year ago

baoilleach commented 1 year ago

CHEMBL6509.from_cd.cdxml.txt chembl6509.from_pycdxml.cdxml.txt CHEMBL6509.mol.txt pycdxml version: e2fefc82a44bda17b5c91de208947185f36ecaad image

From roundtrip testing versus the RDKit reader (mol->cdxml->cansmi) I found that mismatches in cansmi occured when dealing with aromatic Ns with hydrogens.

Here's a specific example: CHEMBL6509, a MOL file provided by ChEMBL. I've attached the original MOL file, along with the PyCDXML CDXML file. If opened in ChemDraw it all looks fine. In contrast, when read by RDKit it's missing the H on the nitrogen:

NCC1=C[N]c2ccccc21 # RDKit reading PyCDXML CDXML

At this point, I thought it could be an error in the RDKit reader. However, if I read the original MOL file in ChemDraw and save as CDXML (attached), then RDKit converts it as expected:

NCc1c[nH]c2ccccc12 # RDKit reading ChemDraw CDXML
baoilleach commented 1 year ago

Presumably <n id="4" p="67.57 66.77" Z="24" Element="7" NumHydrogens="0" AS="N"> should have NumHydrogens="1" instead?

baoilleach commented 1 year ago

Here's the fix - I'll submit a PR:

diff --git a/pycdxml/cdxml_converter/rdkit_chemdraw.py b/pycdxml/cdxml_converter/rdkit_chemdraw.py
index d38e99c..5e48a1b 100644
--- a/pycdxml/cdxml_converter/rdkit_chemdraw.py
+++ b/pycdxml/cdxml_converter/rdkit_chemdraw.py
@@ -98,7 +98,7 @@ def mol_to_document(mol: Chem.Mol, chemdraw_style: dict = None, conformer_id: in
         props = {"p": p, "Z": str(20 + object_id), "Element": str(atom.GetAtomicNum())}

         if atom.GetAtomicNum() != 6:
-            props["NumHydrogens"] = str(atom.GetNumImplicitHs())
+            props["NumHydrogens"] = str(atom.GetTotalNumHs())

         if atom.HasProp('_CIPCode'):
             props["AS"] = atom.GetProp('_CIPCode')