REPL cannot load `libpython` when built and run under Python environment

vsbogd commented 1 year ago

I use Conda environment to manage Python versions and dependencies. When I build MeTTa REPL under Conda and try to run it after the build I see the following error:

$ ./target/debug/metta 
./target/debug/metta: error while loading shared libraries: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory

PyO3 compile against and uses absolutely correct version of the Python library. I don't see any inconsistency here.

According to https://github.com/PyO3/pyo3/issues/1576 when using PyO3 app under Python env one need to add the path to the Python library into LD_LIBRARY_PATH. This kind of works but breaks other executables because lib directory in Conda env contains other libraries which overrides system library loading paths.

Steps to reproduce (Conda is expected to be installed):

conda create -n test python=3.7
conda activate test
cd repl
cargo run --features python

luketpeterson commented 1 year ago

When the repl is built, it looks like we can point pyo3 at the python version we want to use. https://pyo3.rs/v0.19.2/building_and_distribution.html#configuring-the-python-version

So I'll see what's involved in querying Conda, and potentially overriding pyo3's defaults if Conda is installed. The other option (which we might want to do regardless) is to document the PYO3_PYTHON environment variable in our docs.

vsbogd commented 1 year ago

I didn't mention it in an issue description, but in fact PyO3 compile against and uses absolutely correct version of the library. So I don't see any inconsistency. The issue is that this shared Python library from Conda is not found by OS when program is executed under the same environment. It sounds more like a Conda issue btw.

vsbogd commented 1 year ago

This is how Conda manages it for its packages:

On Linux, conda-build modifies any shared libraries or generated executables to use a relative dynamic link by calling the patchelf tool. On macOS, the install_name_tool tool is used.

vsbogd commented 1 year ago

I would say the proper way to fix is to have two different dynamic links for two different cases:

when executable is being built under Python virtual env (library is not visible to the system loader) we should use absolute link to the Python library inside executable
when executable is being built against system Python library (library is visible to the system loader) we use just the name of the library without path as a dynamic loading link.

luketpeterson commented 1 year ago

This is how Conda manages it for its packages:

On Linux, conda-build modifies any shared libraries or generated executables to use a relative dynamic link by calling the patchelf tool. On macOS, the install_name_tool tool is used.

I see. It seems like installing the REPL as part of a package under Conda's control would be the "right" fix from Conda's point of view.

I don't know anything about how Conda and pip interact, but would installing the REPL as part of the current Python package accomplish this?

vsbogd commented 1 year ago

would installing the REPL as part of the current Python package accomplish this?

I guess it is not a common use-case to share native executable as a part of a Python package. Thus I don't think it will work actually. At the link above I believe they are talking about a situation when a Conda package is being built. And to me it is an open question how to distribute the app which depends on Python. Probably like any other binary app.

For distribution purposes we should use the version without specific path to the Python library. Two cases above which are proposed in https://github.com/trueagi-io/hyperon-experimental/issues/432#issuecomment-1717355410 are for the local builds only.

Btw it makes sense to switch between Python and non-Python in runtime instead of build time. Thus user can install MeTTa and use it with Python only when it is required.

luketpeterson commented 1 year ago

For distribution purposes we should use the version without specific path to the Python library. Two cases above which are proposed in #432 (comment) are for the local builds only.

If you run patchelf yourself on the built metta repl binary, can you confirm this allows it to find the right Python shared library without setting the LD_LIBRARY_PATH variable?

Btw it makes sense to switch between Python and non-Python in runtime instead of build time. Thus user can install MeTTa and use it with Python only when it is required

This is actually how I had it originally, but it was a lot of complexity and I didn't see the benefit so I removed it.

There are two issues to doing this the "clean" way. 1.) the python shared lib dependency is a "Strong" link, so the OS won't even run our main unless the link can be resolved. To fix this we'd either need a way to make the dependency weak (~~I don't see a feature for that so I might need to submit a PR to pyo3. But I'd ask around first~~ update, it might be possible by hijacking the extension-module feature) The ugly hack-around is to have our own loader binary that springboards into another executable. 2.) We'd need to load the MeTTa libraries appropriate for the language we'll be interoperating with. This will get easier when one core stdlib can interoperate with any language, but right now we need to know in advance.

luketpeterson commented 1 year ago

A related but different issue is here: https://github.com/PyO3/pyo3/issues/1554 It's useful to gain additional background to the python init under Conda.

vsbogd commented 1 year ago

If you run patchelf yourself on the built metta repl binary, can you confirm this allows it to find the right Python shared library without setting the LD_LIBRARY_PATH variable?

Yes, sure, patchelf --replace-needed libpython3.7m.so.1.0 $HOME/miniconda3/envs/<env-name>/lib/libpython3.7m.so.1.0 metta helps.

2.) We'd need to load the MeTTa libraries appropriate for the language we'll be interoperating with.

I am not sure what is an issue with that. We load MeTTa libraries on the start anyway. Looks like we can choose one set or another depending on whether --python is passed via the command line.

1.) the python shared lib dependency is a "Strong" link, so the OS won't even run our main unless the link can be resolved.

Yes, it is more serious. If PyO3 cannot load libpython....so on the fly (and from the error message in the issue description it looks like it doesn't do this) then it probably required to add this feature into PyO3.

it might be possible by hijacking the extension-module feature) The ugly hack-around is to have our own loader binary that springboards into another executable.

Interesting.

A related but different issue is here: https://github.com/PyO3/pyo3/issues/1554 It's useful to gain additional background to the python init under Conda.

Looks like it was came up with conclusion it is a Windows Conda's version issue. Anyway not only Conda is under the trouble, we should also test Python virtualenv (another popular Python env isolation framework).

luketpeterson commented 1 year ago

Just to update the ticket with my current thinking / findings:

1.) pyo3's build.rs script successfully locates the Conda python and libpython. (I think @vsbogd already mentioned this, but I've confirmed it using the PYO3_PRINT_CONFIG environment variable) This is presumably why the build is successful in spite of the emitted binary not being able to find all its dependencies.

2.) It appears that the python interpreter binary statically links libpython, when running under Conda. So by extension, I would guess this is what the Conda designers intended for other apps that need access to libpython.

3.) So I tried switching to static linking of libpython, but I hit an issue with LTO (link-time-optimization) because of compiler mismatch. I spent way more time than I'd like to admit trying to get rustc not to attempt to LTO libpython, and I eventually found this comment in the pyo3 source. https://github.com/PyO3/pyo3/blob/8f4a26a66ecee4cfa473ada8cb27a57c4533d04f/pyo3-build-config/src/impl_.rs#L202

4.) So static linking of libpython is going to be fraught with problems. (They practically beg you not to do it here: https://pyo3.rs/main/building_and_distribution#statically-embedding-the-python-interpreter) And Conda doesn't like its libraries to be in a system-wide search path for dynamic linking. (Which seems silly to me because that seems like one of the basic properties of having an environment activated... But here we are.)

5.) As @vsbogd suggested, we could detect conda / pyenv in a build script somehow, and then update the binary to link libpython by absolute path. That would fix the issue, but would break if the user switched conda environments. (Which if I understand, is a common thing for Conda users to do)

6.) Apparently the idea of springboarding isn't so crazy in this case. I found this line in the pyo3 docs (https://pyo3.rs/main/building_and_distribution#dynamically-embedding-the-python-interpreter)

For distributing your program to non-technical users, you will have to consider including the Python shared library in your distribution as well as setting up wrapper scripts to set the right environment variables (such as LD_LIBRARY_PATH on UNIX, or PATH on Windows).

So in conclusion: A.) We know pyo3's config logic succeeds in locating conda's python in all cases. B.) But we don't know how to get an interpreter binary that reliably links Python in every case.

Therefore, it seems to me that the right solution is to write a wrapper / springboard that uses the python location logic from pyo3 to find the correct libpython for the current environment, and the execs the real binary.

vsbogd commented 1 year ago

5.) As @vsbogd suggested, we could detect conda / pyenv in a build script somehow, and then update the binary to link libpython by absolute path. That would fix the issue, but would break if the user switched conda environments. (Which if I understand, is a common thing for Conda users to do)

Well I would not bother about it. In fact Conda users should be aware that each environment is separated and if you built some app under environment A it is not expected to work under environment B.

Therefore, it seems to me that the right solution is to write a wrapper / springboard that uses the python location logic from pyo3 to find the correct libpython for the current environment, and the execs the real binary.

I would not complicate things here to that degree. First just be aware that updating LD_LIBRARY_PATH is not a viable solution for a Conda env, because it mixes Conda's libraries with system libraries as a result system apps stop working.

Second, when we are distributing binaries it is natural to expect the library is installed in the system and can be loaded by OS loader. If we want to distribute binaries for the Conda users we need to prepare and publish Conda package (they have separate package management system) which is separate issue. Anyway I would not distribute Rust app through PyPi/Conda packages. I would use OS specific or universal Unix packages (like Snap or similar).

Third if user is brave enough to build an application I would simplify the task to him by ensuring the app can be used in environment it was built for out of the box but not further. For example if it is built under specific Conda or virtualenv it should be usable from this env but not necessary with a system Python. If it is built for a system Python it should be usable with system Python even after library update. But if user switches the environments it is his responsibility to understand what he does.

I am writing this I am trying to encourage you to do a simpler solution and save the time because experimenting with a big number of complex setups is time consuming. I would vote for making a trampoline layer for the loading libpython only in case it is a quickest possible solution.

luketpeterson commented 1 year ago

First just be aware that updating LD_LIBRARY_PATH is not a viable solution for a Conda env, because it mixes Conda's libraries with system libraries as a result system apps stop working.

Do you mean we should not set LD_LIBRARY_PATH globally? ie. in the user's shared configs. If so, then I completely agree.

But I think setting LD_LIBRARY_PATH locally, ie. for the environment of the app we're going to jump into, is the recommended solution.

vsbogd commented 1 year ago

But I think setting LD_LIBRARY_PATH locally, ie. for the environment of the app we're going to jump into, is the recommended solution.

Yes, sure, locally it should work as far as I see.

vsbogd commented 4 months ago

According to #748 PyEnv is also affected.

luketpeterson commented 4 months ago

Yes. It's fundamentally a Python issue, not specific to PyEnv or Conda. It's not even really Pyo3's fault as it's doing what any other application is supposed to do to load a library.

The fact that Python's compatibility solution requires multiple entire separate virtual environments is the source of the issue.

If we want to fix this we have to "enter through the door" provided by the environment manager. ie. probably by locating the python executable by scanning directories in the $PATH environment variable. (or letting the shell do that)

luketpeterson commented 4 months ago

Re-reading the issue: I think we had it right last year: https://github.com/trueagi-io/hyperon-experimental/issues/432#issuecomment-1720709565

Therefore, it seems to me that the right solution is to write a wrapper / springboard that uses the python location logic from pyo3 to find the correct libpython for the current environment, and the execs the real binary.

This should be robust against all python environments and environment switching after build, etc.

I wonder if somebody else has already made a nice wrapper to handle this... I didn't see one when I looked last year, but maybe something exists now.

UPDATE: Doesn't look like it... My thread https://github.com/PyO3/pyo3/discussions/3454 is still the top Google hit for me although I may not be searching with the right terms.

vsbogd commented 4 months ago

First thing which comes into my mind is to split REPL on Python extension library and springboard executable and run python -m extension from springboard after looking for the correct Python executable.

vsbogd commented 4 months ago

Adding a separate shell script which find correct Python version and runs the binary has many problems. It means user should be aware that cargo run cannot be used. Shell script can be different for each supported platform. Shell script should know where REPL binaries are. I would say Ideally it should be done from the metta-repl executable itself.

luketpeterson commented 4 months ago

Adding a separate shell script which find correct Python version and runs the binary has many problems. It means user should be aware that cargo run cannot be used. Shell script can be different for each supported platform. Shell script should know where REPL binaries are. I would say Ideally it should be done from the metta-repl executable itself.

Either would solve the problem, and you are right that a single binary avoids many of the issues of a trampoline script. A tool re-execing itself is a tried and true unix pattern, although it's always struck me as a little hackish. But I guess we are in the land of hacks*.

If you think it's a high priority, I may be able to try to make something to do this soon (maybe this weekend).

*I mean Conda and PyEnv are both essentially crazy hacks to get around the idea that Python is not designed to be self-contained, but real-world uses require multiple versions of Python to be installed.

vsbogd commented 4 months ago

If you think it's a high priority, I may be able to try to make something to do this soon (maybe this weekend).

Thanks Luke, yes, I think we need to fix it after all. Looks like we found all information we need here.

trueagi-io / hyperon-experimental

REPL cannot load `libpython` when built and run under Python environment #432