Open drkeoni opened 1 year ago
When you start with a smiles string, there are no coordinates so when you add Hs it has to generate coordinates (the RDKit does this by default when generating a molblock)
If the molecule already has coordinates, which you get from making a molblock, addHs doesn't add coordinates by default and rdkit thinks the structure already has coords.
So in one case you have
smiles -> mol block(gen coords) -> addHs -> molblock (thinks coords are already generated and doesn't fiddle with the hydrogens)
smiles -> addHs -> molbock (gen coords, even for hs)
This is why the hydrogens don't have coords. But this is a long winded way of saying you need addCoords=True in AddHs if you already have coordinates.
print(Chem.MolToMolBlock(Chem.AddHs(m_from_mol_block, addCoords=True)))
I tend to think that AddHs should add coordinates if the molecule already has them as this follow the path of least surprise, but this is a change to current behavior.
I understand the explanation, thanks for helping me see this!
I'd still like advice on how to have the result of "creation from SMILES" and the result of "creation from Mol block" provide the same starting point. It sounds to me like I might need to add coordinates to the SMILES-produced compound before calling AddHs
? My goal here isn't to change how AddHs
is called (that was an example); my goal is to have both methods of creation start from the same starting state for molecular operations (except of course the exact coordinates would be different).
(in my example I'm demonstrating that round-tripping is not what one might think; MolFromMolBlock(MolToMolBlock(m)
) does not provide the same result as m
)
I understand the explanation, thanks for helping me see this!
I'd still like advice on how to have the result of "creation from SMILES" and the result of "creation from Mol block" provide the same starting point. It sounds to me like I might need to add coordinates to the SMILES-produced compound before calling
AddHs
?
Since the coordinate generation from SMILES will, in general, produce different coords than what you find in the molecule from the mol block, the only way to get to the "same starting point" is to remove the coordinates from the molecule that came from the mol block.
My goal here isn't to change how
AddHs
is called (that was an example); my goal is to have both methods of creation start from the same starting state for molecular operations (except of course the exact coordinates would be different).
The coordinates themselves can actually matter, particularly for the assignment of stereochemistry.
I tend to think that AddHs should add coordinates if the molecule already has them as this follow the path of least surprise, but this is a change to current behavior.
I agree with this. We should change the default for addCoords to true and then make sure it doesn't try to add coordinates to anything that doesn't have a conformer.
This issue was marked as stale because it has been open for 90 days with no activity.
Describe the bug When I create a molecule from SMILES and then
AddHs
, the 2D molecule behaves as expected. When that same molecule is created from a Mol block,AddHs
places all hydrogens at the origin. I would like both molecules to be in the same starting state.This is similar in spirit to #6349 but the specific test case is fairly different.
To Reproduce
In the first print out you'll see something like
In the second print out you'll see
Expected behavior My issue isn't with
AddHs
; it's with the starting state of two molecules. I'd like to discover what operations or flags to set to put both of these molecules into the same state. Or if this is deemed a bug have it fixed in new code :)Configuration (please complete the following information):