Error while calculating 3D descriptors: missing 3D coordinate (RNCS/RNCG/AtomicCharge/Propc/AtomicSurfaceArea)

cartilage-ftw commented 3 years ago

Description

I'm trying to calculate a whole bunch of descriptors, including some 3D descriptors using a set of SMILES. It doesn't give me any values for RASA, TASA, TPSA, etc., just an error in place of the values "missing 3D coordinate".

Code

I am loading a bunch of smiles and making a list of RDKit Mol objects of the corresponding SMILES molecules_list = [] After doing

desc_needed = ['SIC0', 'IC0', 'CIC0', 'nRot', 'nN', 'nH', 'nC', 'nS', 'nO', 'nHBDon', 'nHBAcc',
               'GeomDiameter', 'TopoPSA', 'SLogP', 'RASA', 'TASA', 'TPSA', 'RNCS', 'RPCS', 'RPSA']
calc = Calculator(descriptors, ignore_3D=False)
calc.descriptors = [d for d in calc.descriptors if str(d) in desc_needed]
result = calc.pandas(molecules_list)

I get the following output

The particular text output for RASA is,

missing 3D coordinate (RNCS/RNCG/AtomicCharge/Propc/AtomicSurfaceArea)

In case you need some of my SMILES for reproducing this

C(C(CC(C(CO)O)O)=O)=O
C(CC(C(C(CO)O)O)=O)=O
C(C(C(CC(CO)O)=O)O)=O
C(CC(C=O)O)(C(CO)O)=O
C(C(C(C=O)O)O)(CCO)=O
C(C(C(CC(CO)=O)O)O)=O
C(C(C(C(C(C)=O)O)O)O)=O
C(CC(C(C(C=O)O)O)O)=O
C(CO)=O
C(C(C(CO)O)O)=O
C(CO)(C(C(C(CO)O)O)O)=O
C(C(C1C(C(C(O)O1)O)O)O)O
C(C1C(C(C(C(O)O1)O)O)O)O
C1C(C(C(C(C(O)O1)O)O)O)O

Environment

OS/distribution

Manjaro KDE Plasma Kernel: 5.8.18-1-MANJARO

conda or pip

Using conda (an environment called my-rdkit-env)

python version

Python 3.7.9

library version

Please execute the command and paste result.

conda

conda list
# packages in environment at /home/aayush/miniconda3/envs/my-rdkit-env:
#
# Name                    Version                   Build  Channel
    _libgcc_mutex             0.1                        main  
argon2-cffi               20.1.0           py37h7b6447c_1    anaconda
async_generator           1.10             py37h28b3542_0    anaconda
attrs                     20.2.0                     py_0    anaconda
backcall                  0.2.0                      py_0    anaconda
blas                      1.0                         mkl  
bleach                    3.2.1                      py_0    anaconda
bzip2                     1.0.8                h7b6447c_0  
ca-certificates           2020.12.5            ha878542_0    conda-forge
cairo                     1.14.12              h8948797_3  
certifi                   2020.12.5        py37h89c1867_0    conda-forge
cffi                      1.14.3           py37he30daa8_0    anaconda
chemopy                   1.0                      pypi_0    pypi
cycler                    0.10.0                   py37_0  
dbus                      1.13.18              hb2f20db_0  
decorator                 4.4.2                      py_0    anaconda
defusedxml                0.6.0                      py_0    anaconda
entrypoints               0.3                      py37_0    anaconda
expat                     2.2.10               he6710b0_2  
fontconfig                2.13.0               h9420a91_0  
freetype                  2.10.4               h5ab3b9f_0  
glib                      2.66.1               h92f7085_0  
gst-plugins-base          1.14.0               hbbd80ab_1  
gstreamer                 1.14.0               hb31296c_0  
icu                       58.2                 he6710b0_3  
importlib-metadata        2.0.0                      py_1    anaconda
importlib_metadata        2.0.0                         1    anaconda
intel-openmp              2020.2                      254  
ipykernel                 5.3.4            py37h5ca1d4c_0    anaconda
ipython                   7.18.1           py37h5ca1d4c_0    anaconda
ipython_genutils          0.2.0                    py37_0    anaconda
ipywidgets                7.5.1                      py_1    anaconda
jedi                      0.17.2                   py37_0    anaconda
jinja2                    2.11.2                     py_0    anaconda
jpeg                      9b                   h024ee3a_2  
jpype1                    1.1.2            py37hff7bd54_0  
jsonschema                3.2.0                      py_2    anaconda
jupyter                   1.0.0                    py37_7    anaconda
jupyter_client            6.1.7                      py_0    anaconda
jupyter_console           6.2.0                      py_0    anaconda
jupyter_core              4.6.3                    py37_0    anaconda
jupyterlab_pygments       0.1.2                      py_0    anaconda
kiwisolver                1.3.0            py37h2531618_0  
lcms2                     2.11                 h396b838_0  
ld_impl_linux-64          2.33.1               h53a641e_7  
libboost                  1.73.0              hf484d3e_11  
libedit                   3.1.20191231         h14c3975_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.1.0                hdf63c60_0  
libpng                    1.6.37               hbc83047_0  
libsodium                 1.0.18               h7b6447c_0    anaconda
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.1.0                h2733197_1  
libuuid                   1.0.3                h1bed415_2  
libxcb                    1.14                 h7b6447c_0  
libxml2                   2.9.10               hb55368b_3  
lz4-c                     1.9.2                heb0550a_3  
markupsafe                1.1.1            py37h14c3975_1    anaconda
matplotlib                3.3.2                h06a4308_0  
matplotlib-base           3.3.2            py37h817c723_0  
mistune                   0.8.4           py37h14c3975_1001    anaconda
mkl                       2020.2                      256  
mkl-service               2.3.0            py37he8ac12f_0  
mkl_fft                   1.2.0            py37h23d657b_0  
mkl_random                1.1.1            py37h0573a6f_0  
mordred                   1.2.0              pyhe5148d4_0    mordred-descriptor
nbclient                  0.5.1                      py_0    anaconda
nbconvert                 6.0.7                    py37_0    anaconda
nbformat                  5.0.8                      py_0    anaconda
ncurses                   6.2                  he6710b0_1  
nest-asyncio              1.4.1                      py_0    anaconda
networkx                  2.5                        py_0  
notebook                  6.1.4                    py37_0    anaconda
numpy                     1.19.2           py37h54aff64_0  
numpy-base                1.19.2           py37hfa32c7d_0  
olefile                   0.46                     py37_0  
openssl                   1.1.1i               h27cfd23_0  
packaging                 20.4                       py_0    anaconda
pandas                    1.1.3            py37he6710b0_0  
pandoc                    2.11                 hb0f4dca_0    anaconda
pandocfilters             1.4.2                    py37_1    anaconda
parso                     0.7.0                      py_0    anaconda
pcre                      8.44                 he6710b0_0  
pexpect                   4.8.0                    py37_1    anaconda
pickleshare               0.7.5                 py37_1001    anaconda
pillow                    8.0.1            py37he98fc37_0  
pip                       20.3.1           py37h06a4308_0  
pixman                    0.40.0               h7b6447c_0  
prometheus_client         0.8.0                      py_0    anaconda
prompt-toolkit            3.0.8                      py_0    anaconda
prompt_toolkit            3.0.8                         0    anaconda
ptyprocess                0.6.0                    py37_0    anaconda
py-boost                  1.73.0          py37h04863e7_11  
pycparser                 2.20                       py_2    anaconda
pygments                  2.7.1                      py_0    anaconda
pyparsing                 2.4.7                      py_0  
pyqt                      5.9.2            py37h05f1152_2  
pyrsistent                0.17.3           py37h7b6447c_0    anaconda
python                    3.7.9                h7579374_0  
python-dateutil           2.8.1                      py_0  
python_abi                3.7                     1_cp37m    conda-forge
pytz                      2020.4             pyhd3eb1b0_0  
pyzmq                     19.0.2           py37he6710b0_1    anaconda
qt                        5.9.7                h5867ecd_1  
qtconsole                 4.7.7                      py_0    anaconda
qtpy                      1.9.0                      py_0    anaconda
rdkit                     2020.09.1.0      py37hd50e099_1    rdkit
readline                  8.0                  h7b6447c_0  
send2trash                1.5.0                    py37_0    anaconda
setuptools                51.0.0           py37h06a4308_2  
sip                       4.19.8           py37hf484d3e_0  
six                       1.15.0           py37h06a4308_0  
sqlite                    3.33.0               h62c20be_0  
terminado                 0.9.1                    py37_0    anaconda
testpath                  0.4.4                      py_0    anaconda
tk                        8.6.10               hbc83047_0  
tornado                   6.1              py37h27cfd23_0  
tqdm                      4.55.0             pyhd3eb1b0_0  
traitlets                 5.0.5                      py_0    anaconda
typing_extensions         3.7.4.3                    py_0    conda-forge
wcwidth                   0.2.5                      py_0    anaconda
webencodings              0.5.1                    py37_1    anaconda
wheel                     0.36.2             pyhd3eb1b0_0  
widgetsnbextension        3.5.1                    py37_0    anaconda
xz                        5.2.5                h7b6447c_0  
zeromq                    4.3.3                he6710b0_3    anaconda
zipp                      3.3.1                      py_0    anaconda
zlib                      1.2.11               h7b6447c_3  
zstd                      1.4.5                h9ceee32_0

pip

    (my-rdkit-env) [aayush@aayush-tuf ~]$ python -m pip list
Package             Version
------------------- -------------------
argon2-cffi         20.1.0
async-generator     1.10
attrs               20.2.0
backcall            0.2.0
bleach              3.2.1
certifi             2020.12.5
cffi                1.14.3
chemopy             1.0
cycler              0.10.0
decorator           4.4.2
defusedxml          0.6.0
entrypoints         0.3
importlib-metadata  2.0.0
ipykernel           5.3.4
ipython             7.18.1
ipython-genutils    0.2.0
ipywidgets          7.5.1
jedi                0.17.2
Jinja2              2.11.2
JPype1              1.1.2
jsonschema          3.2.0
jupyter             1.0.0
jupyter-client      6.1.7
jupyter-console     6.2.0
jupyter-core        4.6.3
jupyterlab-pygments 0.1.2
kiwisolver          1.3.0
MarkupSafe          1.1.1
matplotlib          3.3.2
mistune             0.8.4
mkl-fft             1.2.0
mkl-random          1.1.1
mkl-service         2.3.0
mordred             1.2.0
nbclient            0.5.1
nbconvert           6.0.7
nbformat            5.0.8
nest-asyncio        1.4.1
networkx            2.5
notebook            6.1.4
numpy               1.19.2
olefile             0.46
packaging           20.4
pandas              1.1.3
pandocfilters       1.4.2
parso               0.7.0
pexpect             4.8.0
pickleshare         0.7.5
Pillow              8.0.1
pip                 20.3.1
prometheus-client   0.8.0
prompt-toolkit      3.0.8
ptyprocess          0.6.0
pycparser           2.20
Pygments            2.7.1
pyparsing           2.4.7
pyrsistent          0.17.3
python-dateutil     2.8.1
pytz                2020.4
pyzmq               19.0.2
qtconsole           4.7.7
QtPy                1.9.0
Send2Trash          1.5.0
setuptools          51.0.0.post20201207
six                 1.15.0
terminado           0.9.1
testpath            0.4.4
tornado             6.1
tqdm                4.55.0
traitlets           5.0.5
typing-extensions   3.7.4.3
wcwidth             0.2.5
webencodings        0.5.1
wheel               0.36.2
widgetsnbextension  3.5.1
zipp                3.3.1

(my-rdkit-env) [aayush@aayush-tuf ~]$     python -c 'import rdkit; print("rdkit " + rdkit.__version__)'
rdkit 2020.09.1

plkx commented 3 years ago

You have to provide the 3D structures if you want to calculate 3D descriptors with Mordred. I've found DataWarrior to be an exceptionally easy to use free software package for going from smiles to 2D and/or 3D structures for LARGE compound sets.

Paul

cartilage-ftw commented 3 years ago

You have to provide the 3D structures if you want to calculate 3D descriptors with Mordred. I've found DataWarrior to be an exceptionally easy to use free software package for going from smiles to 2D and/or 3D structures for LARGE compound sets.

Paul

Hey, thank you for your reply, Paul. But how do you provide 3D molecules to Mordred using RDKit? DataWarrior is great but I'm not sure if it will cover the entire list of descriptors I need to calculate.

plkx commented 3 years ago

Put your smiles in a text document (attachment #1).

Open the text document in DWarrior.

Save it (default save, as DWAR file). This is not essential, but makes future work easier, such as adding a name column. (attachment #2, after deleting the .txt extension because github does not allow much).

Generate 2D atom coordinates ( Chemistry → Generate 2D Atom Coordinates…)

Generate 3D structures (Chemistry → Generate Conformers…; Leave Max. Conformer Count = 1; DO NOT CHECK SAVE TO FILE). In this case, I chose the systematic, low energy bias algorithm; initial torsions from the crystallographic database; and minimized energy using MMFF94s+ forcefield. That took ~3 seconds on my laptop, but can take minutes or hours for large compound sets, e.g. recently took ~20 minutes for an 1800 cmpd set. You can choose not to minimize for fastest results.

Save changes.

Now save it as an SDF (File → Save Special → SD-File). A dialog pops up. Click "save" under "MDL SD-files (.sdf)." In the next dialog: (1) leave Structure Column as is; (2) change SD-file version to version 2 (Mordred may not like version 3 files); (3) change Atom Coordinates form 2D to "3D if available"; (4) choose you compound name column from the dropdown. Keep in mind program limitations (e.g. names with commas will give garbled output from Mordred if you choose csv file output). I used the smiles you provided as the compound name column. (attachment #3, after deleting the .txt extension because github does not allow much).

The new SDF contains 3D structures of your smiles. Note: your smiles did not specify stereoisomers, but a 3D structure requires such specificity. Data Warrior creates one stereoisomer per smiles in this case, as indicated by the usual "R" & "S" atom labels.

The attached files include (1) your smiles in a text file (2) the DWar file ready to save as an SDF and (3) a pruned SDF of 3D structures from your smiles. Pruning involved deleting columns such as "minimization energy" & smiles.

This is barely the beginning of what you can do using DataWarrior - I invite you to read the better-than-average (for free software) documentation with DataWarrior for the good stuff.

Good Luck,

Paul

smiles_cartilage-ftw.txt

Remove the .txt extension on this file to get the DWAR file: smiles_cartilage-ftw.dwar.txt

Remove the .txt extention on this file to get the SDF: smiles_cartilage-ftw_pruned.sdf.txt

plkx commented 3 years ago

The SDF file above is your input file for Mordred, as in

$ python -m mordred smiles_cartilage-ftw_pruned.sdf -o MORD_cartilage-ftw_pruned.csv

You supply the output file name (after -o). In this example, I chose MORD_cartilage-ftw_pruned.csv

plkx commented 3 years ago

An SDF serves as the input file for Mordred.

It is entirely possible to generate 3D structures from smiles using RDKit.

Personally, I find the DataWarrior GUI preferable, as it has extensive manipulation and visualization capabilities.

In the case exemplified above, a combination using the rdkit.Chem.EnumerateStereoisomers module to generate all possible stereoisomer SMILES (saved in a text file) can then be imported (opened) in DataWarrior. Elimination of duplicates is trivial in DataWarrior (Data → Delete Rows → Duplicate Rows…).

On another note, I do further molecular modeling for QM properties. I have found that DWar provides 3D structures that require less curation/refinement during geometry and energy minimization by semiempirical methods. Sometimes, unminimized 3D structures using torsions from the crystallographic database provide best starting structures. This is especially the case for sets of homologous molecular structures because the initial conformers retain more apparent homolog consistency.

Of course, these are my personal experiences with molecule sets I have studied, versus either Open Babel or RDKit, so your mileage may vary.

Paul

plkx commented 3 years ago

Window capture from DataWarrior with all 3D structures. DWARCapture

plkx commented 3 years ago

Oops - I moused over a structure that was not selected, causing the 3D structure above to be for the compound above the highlighted heptacyclo compound. Here is the a snip showing the same compound in all panes. DWAR_Capture

ky66 commented 3 years ago

Yeah Paul I tried this and it is missing all the 3D features still. It is supposed to have 1800+ features. Your SDF gives 1614 features only.

xdn-github commented 1 year ago

Hi! Maybe it's toooo late to answer this. And for the questioner the problem may have been solved. But I would likely to write my solution for the problem to help new comers.

IN mordred 1.2.0 you CAN NOT get all the descriptos by using code below:

$ python -m mordred [molecular.sdf] -o [molecular_descriptors.csv] you only get 1614 descriptors as ky66 mentioned, which excludes 3D descriptors.

what you should do is using code like this:

$ python -m mordred -3 [molecular.sdf] -o [molecular_descriptors.csv] just add the "-3" in it! And you can get all descriptors result in [molecular_descriptors.csv]

the whole sample code you can try is like this:

from mordred import Calculator, descriptors
from rdkit import Chem
from rdkit.Chem import AllChem
import pandas as pd
import os

# using RDkit create 3D sdf file or you can uising DataWar as mentioned ahead
smiles = 'CCCC(=O)O[C@@H]1C[C@H]2C=CC[C@@H]21'
mol = AllChem.AddHs(Chem.MolFromSmiles(smiles))
AllChem.EmbedMolecule(mol)
AllChem.MMFFOptimizeMolecule(mol)
Chem.MolToMolFile(mol,  "test.sdf")

# create all descriptors using sdf file
os.system("python -m mordred -3 test.sdf -o test.csv")
# if you use jupyter the code ahead can be"!python -m mordred -3 test.sdf -o test.csv"

# you can check the descriptors files as blow in jupyter
df = pd.read_csv('test.csv')
df.head()

Thanks @ky66 a lot! I find this solution in his assay. In the end, I think it's ok to close this issue?

mordred-descriptor / mordred