Closed baoilleach closed 1 year ago
Presumably <n id="4" p="67.57 66.77" Z="24" Element="7" NumHydrogens="0" AS="N">
should have NumHydrogens="1"
instead?
Here's the fix - I'll submit a PR:
diff --git a/pycdxml/cdxml_converter/rdkit_chemdraw.py b/pycdxml/cdxml_converter/rdkit_chemdraw.py
index d38e99c..5e48a1b 100644
--- a/pycdxml/cdxml_converter/rdkit_chemdraw.py
+++ b/pycdxml/cdxml_converter/rdkit_chemdraw.py
@@ -98,7 +98,7 @@ def mol_to_document(mol: Chem.Mol, chemdraw_style: dict = None, conformer_id: in
props = {"p": p, "Z": str(20 + object_id), "Element": str(atom.GetAtomicNum())}
if atom.GetAtomicNum() != 6:
- props["NumHydrogens"] = str(atom.GetNumImplicitHs())
+ props["NumHydrogens"] = str(atom.GetTotalNumHs())
if atom.HasProp('_CIPCode'):
props["AS"] = atom.GetProp('_CIPCode')
CHEMBL6509.from_cd.cdxml.txt chembl6509.from_pycdxml.cdxml.txt CHEMBL6509.mol.txt pycdxml version: e2fefc82a44bda17b5c91de208947185f36ecaad
From roundtrip testing versus the RDKit reader (mol->cdxml->cansmi) I found that mismatches in cansmi occured when dealing with aromatic Ns with hydrogens.
Here's a specific example: CHEMBL6509, a MOL file provided by ChEMBL. I've attached the original MOL file, along with the PyCDXML CDXML file. If opened in ChemDraw it all looks fine. In contrast, when read by RDKit it's missing the H on the nitrogen:
At this point, I thought it could be an error in the RDKit reader. However, if I read the original MOL file in ChemDraw and save as CDXML (attached), then RDKit converts it as expected: