Closed isty2e closed 3 weeks ago
Many thanks for pointing me to the numpy version issue! Has now been fixed by adding a version specifier to the numpy entry in pyproject.toml
Regarding the collision: I followed your steps but could not reproduce the observed problem. Both code snippets execute without producing errors!!
Huh, that's strange. Presumably it is dependent on the system environment or something. I will try to reproduce it in a Docker container so a better reproducibility is guaranteed, but I'm afraid I don't have time to do that for now.
Many thanks for your efforts! I did a completely fresh installation of Miniforge, maybe this information is useful for you.
Hi I am able to reproduce it consistently. Under Ubuntu 22.04 do the following:
mamba create -n cdpkit_rdkit
mamba activate cdpkit_rdkit
mamba install python=3.10 rdkit "numpy<2" -c conda-forge
pip install cdpkit
Then run the following in a python console
>>> from rdkit import Chem
>>> import CDPL.Chem as CDPChem
>>> CDPChem.MoleculeReader("/home/sdoerr/Downloads/00-All-1700.sdf")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
Boost.Python.ArgumentError: Python argument types in
MoleculeReader.__init__(MoleculeReader, str)
did not match C++ signature:
__init__(_object* self, std::istream {lvalue} is, CDPL::Base::DataFormat fmt)
__init__(_object* self, std::istream {lvalue} is, std::string fmt)
__init__(_object* self, std::string file_name, CDPL::Base::DataFormat fmt, std::_Ios_Openmode mode=CDPL.Base._base.OpenMode(12))
__init__(_object* self, std::string file_name, std::string fmt, std::_Ios_Openmode mode=CDPL.Base._base.OpenMode(12))
__init__(_object* self, std::string file_name, std::_Ios_Openmode mode=CDPL.Base._base.OpenMode(12))
Trying the inverse causes a segmentation fault
>>> import CDPL.Chem as CDPChem
>>> from rdkit import Chem
[1] 246425 segmentation fault (core dumped) python
This happens with rdkit 2024.03.5 and some previous versions which I've tested. It can be fixed by downgrading rdkit to version 2022.09.1
It can be reproduced consistently on other machines (I had it happen on Github Actions as well as two local machines, it's very consistent, here are some jobs all crashing with this issue, you can search for Boost.Python.ArgumentError
)
I attach you here also the conda env file if you want to install directly from this to reproduce cdpkit_rdkit.zip
Hi Stefan,
thanks for also looking into this! I was now able to reproduce the errors. During my first attempt I didn't recognize that my PYTHONPATH variable was still pointing to a compiled development version of the CDPKit CDPL python bindings which was then imported instead of the 1.1.1 release version installed via pip. However, this mistake helped me to identify the cause of the reported problems: The 1.1.1 release of the CDPKit python bindings and the RDKit package installed via conda are both compiled against the same boost .python version (1.84.0). RDKit as well as CDPKit bring their own copies of the boost .python library and usually the dynamic linker will load them. However, the linker only loads the private copy if the linked against boost.python library version has not yet been loaded. When RDKit gets imported its private copy of boost.python 1.84.0 is loaded - as expected. But during the subsequent import of the CDPL.Chem package the dynamic linker does not load the CDPL provided copy anymore since it is already present. Although both boost.python libraries are of exactly the same version they seem to be incompatible and lead to the observed problems.
Fixes:
For future binary releases on PyPI I will see if I can set the version of the private boost.python copy to something that is valid but does not exist anywhere else. This tweak should then fix the problem by forcing the dynamic linker to always load the installed private copy....
Wow, thanks for the deep investigation!
I hope the next versions fix this then. I will try the source installation from PyPI and see if it works. Btw you have a minor typo on the installation command on that page. It says pip install cpkit --no-binary :all:
but is missing the d
in the package name.
Happy that I could help! Thanks for pointing me to the typo, that one was already fixed in the master branch.
Do you have an ETA for the next release which will fix the rdkit collision?
I am planning to release an official V1.2.0 before the end of 2024. However, I will try to find some time in the next two weeks to release a 1.2.0 pre-version (for Linux only) on PyPI. This pre-version will not yet provide all features that I planned for 1.2.0 but will at least fix the RDKit and nasty NumPy2 issue for Linux users. I will keep you posted regarding the pre-release!
@seidelt If you need to support, I can join hands to do because I am really keen on your codebase. I found that because of the conflicts when importing rdkit along with CDPkit, my code have to split into files instead of combining into one file.
As promised, I made Linux wheels for a CDPKit 1.2.0 pre-release available on PyPI. The RDKit as well as NumPy2 compatibility issues are gone with this version!
The CDPKit 1.2.0 pre-release can be installed as follows: $ pip install CDPKit==1.2.0.dev1
Please let me know whether the new version also works for your setup!
Tested on a fresh conda environment.
This works without any problem:
from rdkit import Chem
import CDPL.Chem as CDPChem
reader = CDPChem.MoleculeReader("any-arbitrary-string")
But when the import order is changed like below,
import CDPL.Chem as CDPChem
from rdkit import Chem
The following error occurs:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/path/to/conda/envs/cdpkit-test/lib/python3.10/site-packages/rdkit/Chem/__init__.py", line 16, in <module>
from rdkit.Chem import rdchem
ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.31' not found (required by /path/to/conda/envs/cdpkit-test/lib/python3.10/site-packages/rdkit/Chem/../../../../libboost_serialization.so.1.86.0)
Many thanks for checking!
I also encountered this error when the CDPL.Chem package was imported first. The reason for this is that CDPL will use whatever libstdc++ it finds. The default is to use the libstdc++ installed on the Linux system. The RDKit conda package brings its own much newer libstdc++ library which will be loaded when an RDKit package gets imported first. Since the RDKit provided libstdc++ is newer than the system libstdc++ CDPL is fine with it and the import after RDKit does not fail. However, when a CDPL package is imported first then the system libstdc++ will be loaded which is too old for RDKit and the RDKit package import fails (as observed).
There is not much I can do about this since pip wheel files for Linux deposited on PyPI have to be built in a standardized docker environment (-> manylinux_2_28, based on AlmaLinux 8 which uses much older libraries than a current conda distribution). This needs to be done to achieve maximum compatibility with a wide panel of Linux distributions and a packaging of own copies of basic system libraries is thus strongly discouraged. RDKit is to blame here since it has such stringent libstdc++ version requirements!
However, you can force CDPL to load a more recent libstdc++ that will also be compatible with RDKit as follows:
1) Install a current libstdc++ in your conda base environment: $ conda install libstdcxx-ng
2) set and export LD_LIBRARY_PATH: $ export LD_LIBRARY_PATH=/path/to/your/conda/lib/directory
CDPL and RDKit package imports will then work in any order...
Statement of the problem
With recent versions of rdkit, when
from rdkit import Chem
is executed before importing CDPKit, CDPKit becomes broken for unknown reasons (maybe due to namespace collision?). This does not happen for rdkit < 2023.An example of the error reproduced in python REPL:
To reproduce:
First, create a conda env on a linux system (by the way,
numpy < 2
is required for pip installation ofcdpkit
, but it is not specified neither inpyproject.toml
norsetup.py
):This works:
but this does not: