isayevlab / Auto3D_pkg

Auto3D generates low-energy conformers from SMILES/SDF
MIT License
148 stars 34 forks source link

question : why do you try to amend the SMILES of the Diastereomers generated by rdkit ? #18

Closed EtienneReboul closed 2 years ago

EtienneReboul commented 2 years ago

Hello,

I have a question : inside the class rd_isomer in the isomer_engine.py you go to great length to amend the Diastereomers's SMILES using the amend_configuration_w function. Is rdkit has trouble generating all possible diastereomer ?

Best, Etienne Reboul

LiuCMU commented 2 years ago

Hi,

Thanks for the question! RDKit is mostly fine for generating all possible diastereomers, but it could miss some configurations for stereo centers in a ring (example N=C1OC(CN2CC(C)OC(C)C2)CN1).

The total number of stereoisomers should be equal to 2**num_unspecified_steric_centers. If RDKit didn't give all configurations, the missing configurations were amended by finding enantiomers of existing SMILES (treating SMILES as text and inverting stereo centers).

Please let me know if you have further questions.

Best, Zhen (Jack) Liu

EtienneReboul commented 2 years ago

Hello,

Thanks for your fast reply. So rdkit does miss some stereocenters for some reason but it seems like a minor problem. I do have an another question : In the utils.py script inside the amend_configuration function , you are not using the unspecified_steric_centers , it is computed but never used : image

As it is commented out of the script , you use the number of stereo-centers instead. Is this an oversight or am I missing something ?

Best, Etienne Reboul

LiuCMU commented 2 years ago

Hi Etienne,

This is a great question! I did swing over using num_centers and num_unspecified_centers, but decided to use the total number of stereo-centers, num_centers here. This won't change the following condition because num is changed by the power of 2. https://github.com/isayevlab/Auto3D_pkg/blob/ee93d7beae190f7b6c454d0488be3f41f5a5eb62/src/Auto3D/utils.py#L652

Please be reminded that the function in our discussion will only be triggered when the user sets enumerate_isomer = True. There are two scenarios, one is that the SMILES does not contain any stereo information at all. If RDKit lost some configuration, we could safely recover all by finding enantiomers. This is also the case where we found that RDKit missed some configuration in a ring; the other situation is that the SMILES specifies the part of the stereo information. There is a risk that we get extra configurations by finding enantiomers (chance is rare).

This is like a trade-off between the two scenarios, and I thought giving some extra enantiomers is better than missing some configurations. It will be much appreciated if you got an idea and want to contribute to this project :)

Best, Zhen