Closed milicazmarkovic closed 1 month ago
I can look into the environment later this week (don't have access to the old environments anymore), but could you meanwhile provide the terminal output of the script? I wonder whether there were e.g. all reactions lost along the pipeline
This is how it should look:
@milicazmarkovic And this is the exact environment I used just now (just now on my laptop, not back when we wrote the paper), but I am not aware of a package mismatch that could cause the output to be empty. If you send me a list of package versions you used I will try to recreate the error and track down the package that causes this, then we can specify that version in the environment.
# Name Version Build Channel
appnope 0.1.3 pyhd8ed1ab_0 conda-forge
argon2-cffi 21.3.0 pyhd8ed1ab_0 conda-forge
argon2-cffi-bindings 21.2.0 py38hed1de0f_2 conda-forge
asttokens 2.0.5 pyhd8ed1ab_0 conda-forge
attrs 21.4.0 pyhd8ed1ab_0 conda-forge
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 py_2 conda-forge
backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge
beautifulsoup4 4.11.1 pyha770c72_0 conda-forge
bleach 5.0.0 pyhd8ed1ab_0 conda-forge
boost 1.74.0 py38hb0f0857_5 conda-forge
boost-cpp 1.74.0 hdbf7018_7 conda-forge
brotli 1.0.9 h5eb16cf_7 conda-forge
brotli-bin 1.0.9 h5eb16cf_7 conda-forge
bzip2 1.0.8 h0d85af4_4 conda-forge
ca-certificates 2022.6.15 h033912b_0 conda-forge
cairo 1.16.0 h9e0e54b_1010 conda-forge
certifi 2022.6.15 py38h50d1736_0 conda-forge
cffi 1.15.0 py38h1a44b6c_0 conda-forge
cli-exit-tools 1.2.3.2 pypi_0 pypi
click 8.1.3 pypi_0 pypi
colorama 0.4.4 pyh9f0ad1d_0 conda-forge
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
dbus 1.13.6 h811a1a6_3 conda-forge
debugpy 1.6.0 py38h038c8f4_0 conda-forge
decorator 5.1.1 pyhd8ed1ab_0 conda-forge
defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge
dill 0.3.5.1 pypi_0 pypi
entrypoints 0.4 pyhd8ed1ab_0 conda-forge
enzymemap 0.0.0+1.geddd0ce.dirty dev_0 <develop>
executing 0.8.3 pyhd8ed1ab_0 conda-forge
expat 2.4.8 h96cf925_0 conda-forge
flit-core 3.7.1 pyhd8ed1ab_0 conda-forge
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 hab24e00_0 conda-forge
fontconfig 2.14.0 h676cef8_0 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
fonttools 4.33.3 py38h0dd4459_0 conda-forge
freetype 2.10.4 h4cff582_1 conda-forge
gettext 0.19.8.1 hd1a6beb_1008 conda-forge
giflib 5.2.1 hbcb3906_2 conda-forge
greenlet 1.1.2 py38h038c8f4_2 conda-forge
icu 69.1 he49afe7_0 conda-forge
importlib-metadata 4.11.4 py38h50d1736_0 conda-forge
importlib_resources 5.7.1 pyhd8ed1ab_1 conda-forge
ipykernel 6.13.1 py38h60dac5d_0 conda-forge
ipython 8.4.0 py38h50d1736_0 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
ipywidgets 7.7.0 pyhd8ed1ab_0 conda-forge
jedi 0.18.1 py38h50d1736_1 conda-forge
jinja2 3.1.2 pyhd8ed1ab_1 conda-forge
joblib 1.1.0 pyhd8ed1ab_0 conda-forge
jpeg 9e h5eb16cf_1 conda-forge
jsonschema 4.6.0 pyhd8ed1ab_0 conda-forge
jupyter 1.0.0 py38h50d1736_7 conda-forge
jupyter_client 7.3.3 pyhd8ed1ab_0 conda-forge
jupyter_console 6.4.3 pyhd8ed1ab_0 conda-forge
jupyter_core 4.10.0 py38h50d1736_0 conda-forge
jupyterlab_pygments 0.2.2 pyhd8ed1ab_0 conda-forge
jupyterlab_widgets 1.1.0 pyhd8ed1ab_0 conda-forge
kiwisolver 1.4.2 py38h8b7791e_1 conda-forge
krb5 1.19.3 hb98e516_0 conda-forge
lcms2 2.12 h577c468_0 conda-forge
lerc 3.0 he49afe7_0 conda-forge
lib-detect-testenv 2.0.2.2 pypi_0 pypi
libblas 3.9.0 15_osx64_openblas conda-forge
libbrotlicommon 1.0.9 h5eb16cf_7 conda-forge
libbrotlidec 1.0.9 h5eb16cf_7 conda-forge
libbrotlienc 1.0.9 h5eb16cf_7 conda-forge
libcblas 3.9.0 15_osx64_openblas conda-forge
libclang 13.0.1 default_he082bbe_0 conda-forge
libcxx 14.0.4 hc203e6f_0 conda-forge
libdeflate 1.10 h0d85af4_0 conda-forge
libedit 3.1.20191231 h0678c8f_2 conda-forge
libffi 3.4.2 h0d85af4_5 conda-forge
libgfortran 5.0.0 9_3_0_h6c81a4c_23 conda-forge
libgfortran5 9.3.0 h6c81a4c_23 conda-forge
libglib 2.70.2 hf1fb8c0_4 conda-forge
libiconv 1.16 haf1e3a3_0 conda-forge
liblapack 3.9.0 15_osx64_openblas conda-forge
libllvm13 13.0.1 h64f94b2_2 conda-forge
libopenblas 0.3.20 openmp_hb3cd9ec_0 conda-forge
libpng 1.6.37 h7cec526_2 conda-forge
libpq 14.3 h2b7167c_0 conda-forge
libsodium 1.0.18 hbcb3906_1 conda-forge
libtiff 4.4.0 hfca7e8f_0 conda-forge
libwebp 1.2.2 h28dabe5_0 conda-forge
libwebp-base 1.2.2 h0d85af4_1 conda-forge
libxcb 1.13 h0d85af4_1004 conda-forge
libzlib 1.2.12 h6c3fc93_0 conda-forge
llvm-openmp 14.0.4 ha654fa7_0 conda-forge
lz4-c 1.9.3 he49afe7_1 conda-forge
markupsafe 2.1.1 py38hed1de0f_1 conda-forge
matplotlib-base 3.5.2 py38h1b6b9d1_0 conda-forge
matplotlib-inline 0.1.3 pyhd8ed1ab_0 conda-forge
mistune 0.8.4 py38h96a0964_1005 conda-forge
multiprocess 0.70.13 pypi_0 pypi
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
mysql-common 8.0.29 h9525fdc_1 conda-forge
mysql-libs 8.0.29 h8d0e597_1 conda-forge
nbclient 0.6.4 pyhd8ed1ab_1 conda-forge
nbconvert 6.5.0 pyhd8ed1ab_0 conda-forge
nbconvert-core 6.5.0 pyhd8ed1ab_0 conda-forge
nbconvert-pandoc 6.5.0 pyhd8ed1ab_0 conda-forge
nbformat 5.4.0 pyhd8ed1ab_0 conda-forge
ncurses 6.3 h96cf925_1 conda-forge
nest-asyncio 1.5.5 pyhd8ed1ab_0 conda-forge
notebook 6.4.11 pyha770c72_0 conda-forge
nspr 4.32 hcd9eead_1 conda-forge
nss 3.78 ha8197d3_0 conda-forge
numpy 1.22.4 py38h3ad0702_0 conda-forge
openjpeg 2.4.0 h6e7aa92_1 conda-forge
openssl 3.0.5 hb81d4ab_1 conda-forge
packaging 21.3 pyhd8ed1ab_0 conda-forge
pandas 1.4.2 py38h2b30649_2 conda-forge
pandoc 2.18 h694c41f_0 conda-forge
pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge
parso 0.8.3 pyhd8ed1ab_0 conda-forge
patsy 0.5.2 pyhd8ed1ab_0 conda-forge
pcre 8.45 he49afe7_0 conda-forge
pexpect 4.8.0 pyh9f0ad1d_2 conda-forge
pickleshare 0.7.5 py_1003 conda-forge
pillow 9.1.1 py38h21af888_1 conda-forge
pip 22.1.2 pyhd8ed1ab_0 conda-forge
pixman 0.40.0 hbcb3906_0 conda-forge
prometheus_client 0.14.1 pyhd8ed1ab_0 conda-forge
prompt-toolkit 3.0.29 pyha770c72_0 conda-forge
prompt_toolkit 3.0.29 hd8ed1ab_0 conda-forge
psutil 5.9.1 py38h0dd4459_0 conda-forge
pthread-stubs 0.4 hc929b4f_1001 conda-forge
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge
pycairo 1.21.0 py38h2e817b2_1 conda-forge
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pygments 2.12.0 pyhd8ed1ab_0 conda-forge
pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge
pyqt 5.12.3 py38hca2ab18_4 conda-forge
pyqt5-sip 4.19.18 pypi_0 pypi
pyqtchart 5.12 pypi_0 pypi
pyqtwebengine 5.12.1 pypi_0 pypi
pyrsistent 0.18.1 py38hed1de0f_1 conda-forge
python 3.8.13 h66c20e1_0_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-fastjsonschema 2.15.3 pyhd8ed1ab_0 conda-forge
python_abi 3.8 2_cp38 conda-forge
pytz 2022.1 pyhd8ed1ab_0 conda-forge
pyzmq 23.1.0 py38h34ba744_0 conda-forge
qt 5.12.9 h2a607e2_5 conda-forge
qtconsole 5.3.1 pyhd8ed1ab_0 conda-forge
qtconsole-base 5.3.1 pyha770c72_0 conda-forge
qtpy 2.1.0 pyhd8ed1ab_0 conda-forge
rdchiral_cpp 1.1.2 py38he65332d_0 conda-forge
rdkit 2021.09.5 py38hc2778ef_0 conda-forge
readline 8.1 h05e3726_0 conda-forge
reportlab 3.5.68 py38hf6ac518_1 conda-forge
scipy 1.9.0 py38hb261484_0 conda-forge
seaborn 0.11.2 hd8ed1ab_0 conda-forge
seaborn-base 0.11.2 pyhd8ed1ab_0 conda-forge
send2trash 1.8.0 pyhd8ed1ab_0 conda-forge
setuptools 62.3.2 py38h50d1736_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
soupsieve 2.3.1 pyhd8ed1ab_0 conda-forge
sqlalchemy 1.4.37 py38h0dd4459_0 conda-forge
sqlite 3.38.5 hd9f0692_0 conda-forge
stack_data 0.2.0 pyhd8ed1ab_0 conda-forge
statsmodels 0.13.2 py38hc1426ef_0 conda-forge
templatecorr 0+untagged.21.g7095b4b dev_0 <develop>
terminado 0.15.0 py38h50d1736_0 conda-forge
tinycss2 1.1.1 pyhd8ed1ab_0 conda-forge
tk 8.6.12 h5dbffcc_0 conda-forge
tornado 6.1 py38hed1de0f_3 conda-forge
tqdm 4.64.0 pyhd8ed1ab_0 conda-forge
traitlets 5.2.2.post1 pyhd8ed1ab_0 conda-forge
typing-extensions 4.2.0 hd8ed1ab_1 conda-forge
typing_extensions 4.2.0 pyha770c72_1 conda-forge
tzdata 2022a h191b570_0 conda-forge
unicodedata2 14.0.0 py38hed1de0f_1 conda-forge
wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge
webencodings 0.5.1 py_1 conda-forge
wheel 0.37.1 pyhd8ed1ab_0 conda-forge
widgetsnbextension 3.6.0 py38h50d1736_0 conda-forge
wrapt 1.14.1 pypi_0 pypi
wrapt-timeout-decorator 1.3.12.2 pypi_0 pypi
xorg-libxau 1.0.9 h35c211d_0 conda-forge
xorg-libxdmcp 1.1.3 h35c211d_0 conda-forge
xz 5.2.5 haf1e3a3_1 conda-forge
zeromq 4.3.4 he49afe7_1 conda-forge
zipp 3.8.0 pyhd8ed1ab_0 conda-forge
zlib 1.2.12 h6c3fc93_0 conda-forge
zstd 1.5.2 ha9df2e0_1 conda-forge
Another question: which script exactly did you run? correct.py
on the uspto_50k.csv
?
Ok so, I tried multiple versions of rdkit and rdchiral, I haven't touched the rest of the packages. I ran the following command:
python correct.py --path data/uspto_50k --reaction_column rxn_smiles --name template --nproc 20 --data_format csv
And here is the output that I am getting:
` Reading file... Preprocessing reactants... [Parallel(n_jobs=20)]: Using backend LokyBackend with 20 concurrent workers. [Parallel(n_jobs=20)]: Done 10 tasks | elapsed: 11.6s [Parallel(n_jobs=20)]: Done 308 tasks | elapsed: 11.7s [Parallel(n_jobs=20)]: Done 14481 tasks | elapsed: 13.7s [Parallel(n_jobs=20)]: Done 48545 tasks | elapsed: 17.1s [Parallel(n_jobs=20)]: Done 49977 out of 50016 | elapsed: 17.2s remaining: 0.0s [Parallel(n_jobs=20)]: Done 50016 out of 50016 | elapsed: 17.3s finished Extracting templates (Radius 1 with special groups)... 100%|██████████████████████████████████████████████████████████| 50016/50016 [00:03<00:00, 15084.85it/s] Extracting templates (Radius 1 without special groups)... 100%|██████████████████████████████████████████████████████████| 50016/50016 [00:03<00:00, 15408.17it/s] Extracting templates (Radius 0 without special groups)... 100%|██████████████████████████████████████████████████████████| 50016/50016 [00:02<00:00, 17921.67it/s] Hierarchically correcting templates... ...Unique templates in column template_r0 : 0 ...Unique templates in column template_r1 : 0 ...Correcting templates in column template_r1 [Parallel(n_jobs=20)]: Using backend LokyBackend with 20 concurrent workers. ...Unique corrected templates in column template_r1 : 0
...Unique templates in column template_r1 : 0 ...Unique templates in column template : 0 ...Correcting templates in column template [Parallel(n_jobs=20)]: Using backend LokyBackend with 20 concurrent workers. ...Unique corrected templates in column template : 0
Wrote dataframe to data/uspto_50k_corrected.csv`
Additionally, I tried running scripts/01 and 02 steps separately, but 02 failed because output of 01 was empty...
Here are my current packages list. I assumed this was an issue with compatibility/dependency because I have not changed the code or got any meaningful errors.
boost 1.74.0 py38h1e2c3d9_5 conda-forge
boost-cpp 1.74.0 h32e41df_4 conda-forge
brotli 1.1.0 hb547adb_1 conda-forge
brotli-bin 1.1.0 hb547adb_1 conda-forge
bzip2 1.0.8 h99b78c6_7 conda-forge
ca-certificates 2024.7.4 hf0a4a13_0 conda-forge
cairo 1.16.0 h302bd0f_5
certifi 2024.7.4 py38hca03da5_0
chardet 5.2.0 py38h10201cd_1 conda-forge
colorama 0.4.6 pyhd8ed1ab_0 conda-forge
contourpy 1.1.1 py38h9afee92_1 conda-forge
cycler 0.12.1 pyhd8ed1ab_0 conda-forge
expat 2.6.2 hebf3989_0 conda-forge
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 h77eed37_2 conda-forge
fontconfig 2.14.2 h82840c6_0 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
fonttools 4.53.1 py38h3237794_0 conda-forge
freetype 2.12.1 hadb7bae_2 conda-forge
freetype-py 2.3.0 pyhd8ed1ab_0 conda-forge
gettext 0.21.0 h826f4ad_0
glib 2.78.4 h313beb8_0
glib-tools 2.78.4 h313beb8_0
greenlet 3.0.3 py38hb9fa5a8_0 conda-forge
icu 68.1 hc377ac9_0
importlib-resources 6.4.0 pyhd8ed1ab_0 conda-forge
importlib_resources 6.4.0 pyhd8ed1ab_0 conda-forge
joblib 1.4.2 pyhd8ed1ab_0 conda-forge
kiwisolver 1.4.5 py38h9afee92_1 conda-forge
krb5 1.21.3 h237132a_0 conda-forge
lcms2 2.16 ha0e7c42_0 conda-forge
lerc 4.0.0 h9a09cb3_0 conda-forge
libblas 3.9.0 23_osxarm64_openblas conda-forge
libbrotlicommon 1.1.0 hb547adb_1 conda-forge
libbrotlidec 1.1.0 hb547adb_1 conda-forge
libbrotlienc 1.1.0 hb547adb_1 conda-forge
libcblas 3.9.0 23_osxarm64_openblas conda-forge
libcxx 18.1.8 h5a72898_2 conda-forge
libdeflate 1.20 h93a5062_0 conda-forge
libedit 3.1.20191231 hc8eb9b7_2 conda-forge
libexpat 2.6.2 hebf3989_0 conda-forge
libffi 3.4.2 h3422bc3_5 conda-forge
libgfortran 5.0.0 13_2_0_hd922786_3 conda-forge
libgfortran5 13.2.0 hf226fd6_3 conda-forge
libglib 2.78.4 h0a96307_0
libiconv 1.17 h0d3ecfb_2 conda-forge
libintl 0.22.5 h8fbad5d_2 conda-forge
libjpeg-turbo 3.0.0 hb547adb_1 conda-forge
liblapack 3.9.0 23_osxarm64_openblas conda-forge
libopenblas 0.3.27 openmp_h517c56d_1 conda-forge
libpng 1.6.43 h091b4b1_0 conda-forge
libpq 16.3 h7afe498_0 conda-forge
libsqlite 3.46.0 hfb93653_0 conda-forge
libtiff 4.6.0 h07db509_3 conda-forge
libwebp-base 1.4.0 h93a5062_0 conda-forge
libxcb 1.15 hf346824_0 conda-forge
libxml2 2.9.14 h8c5e841_0
libzlib 1.2.13 hfb2fe0b_6 conda-forge
llvm-openmp 18.1.8 hde57baf_0 conda-forge
matplotlib-base 3.7.3 py38hef9d0d7_0 conda-forge
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
ncurses 6.5 hb89a1cb_0 conda-forge
numpy 1.24.4 py38ha84db1f_0 conda-forge
openjpeg 2.5.2 h9f1df11_0 conda-forge
openssl 3.3.1 hfb2fe0b_2 conda-forge
packaging 24.1 pyhd8ed1ab_0 conda-forge
pandas 2.0.3 py38hefb543e_1 conda-forge
pcre2 10.42 hb066dcc_1
pillow 10.3.0 py38h9ef4633_0 conda-forge
pip 24.2 pyhd8ed1ab_0 conda-forge
pixman 0.43.4 hebf3989_0 conda-forge
pthread-stubs 0.4 h27ca646_1001 conda-forge
pycairo 1.23.0 py38hc7d53f0_0
pyparsing 3.1.2 pyhd8ed1ab_0 conda-forge
python 3.8.19 h2469fbe_0_cpython conda-forge
python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge
python-tzdata 2024.1 pyhd8ed1ab_0 conda-forge
python_abi 3.8 4_cp38 conda-forge
pytz 2024.1 pyhd8ed1ab_0 conda-forge
rdchiral 1.1.0 pypi_0 pypi
rdkit 2021.03.1 py38hbcbf861_0 conda-forge
readline 8.2 h92ec313_1 conda-forge
reportlab 4.2.2 py38h3237794_0 conda-forge
rlpycairo 0.2.0 pyhd8ed1ab_0 conda-forge
setuptools 72.1.0 pyhd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
sqlalchemy 2.0.32 py38h3237794_0 conda-forge
templatecorr 0+untagged.21.g7095b4b.dirty dev_0
Python is 3.8.19 and I don't see any package compatibility issues, but output is invariably empty regardless of input.
@milicazmarkovic Ok so your template extraction does not work (and will not, with whatever input csv you use), you can see in the terminal output that the extraction only takes 3 seconds and yields 0 templates. There is probably an error when rdchiral is called that was erroneously caught by an exception. From your list of packages, it seems like you did not install rdchiral_cpp. This is necessary for the code. The regular rdchiral does not take arguments for the radius or special group. Did you also try this with rdchiral_cpp?
If you want to, you could replace https://github.com/hesther/templatecorr/blob/7095b4b8fecd6ea06f8c603b8e8641518b37d931/templatecorr/extract_templates.py#L41 (Line 41) in extract_templates.py with
except Exception as e:
print(e)
which will probably tell you that rdchiral does not take the arguments you provided because you installed rdchiral instead of rdchiral_cpp
I will fix this in a PR soon so it gives a meaningful error, thank you very much for finding that bug!
I actually attempted running it with rdchiral_cpp first, but got the empty data frame then switched to rdchiral and ended up with the same result, but this is super helpful! I just realized that this has to do with installation of this package -- I have weird issues with some conda packages due to M1 Mac chip... I think I know how to fix this now on my end and that has nothing to do with your code. :)
Thanks for a quick response and confirmation that the code works as intended!
I created a small pull request with option to run script using docker, which resolved this issue for me. Hopefully it helps other people with similar problem :)
Thanks! I merged the change with the docker container, as well as added a more meaningful error message when using the wrong rdchiral version. Thanks again for finding this bug!
Hello!
I tried replicating the results from the paper, but ended up getting an empty csv file as output. I suspect divergence in required packages versions could be causing this, since I have not modified the code at all. Could you specify versions of required packages and python version in environment.yml? Also if you know of any other issues that could cause this to happen, let me know!