Open aaliddell opened 4 years ago
A similar issue occurs when trying to install PySide2
the same way. The shiboken2
package gets mangled from shiboken2/files.dir/shibokensupport/...
to shiboken2/files/dir/shibokensupport/...
, so again it's the .
that's getting replaced with /
.
Likewise, fixing the output directory layout to match the wheel (and also adding filesystem_importer = True
this time) appears to make the import work.
This seems like a legitimate bug report. Thank you for filing.
Would you be willing to try reproducing this using the latest code in the main
branch? Instructions for that are at https://pyoxidizer.readthedocs.io/en/latest/getting_started.html#installing-pyoxidizer. Please note the backwards incompatible changes at https://pyoxidizer.readthedocs.io/en/latest/history.html#version-history from the 0.7 release. If you don't have time to reproduce, don't worry: I'll likely get around to reproducing it myself as part of fixing it. But if you can help me save some time, it would be very much appreciated!
I'm making a hard push in 0.8 for better compatibility with extension modules. And I'm very interest in fixing problems like this.
I tried with main but was hitting #254 and haven't used the workaround there yet. I'll have another look later.
Should you want to try also: the numpy example is pretty easy to replicate, just create a new config project and change those two lines. When running the executable in install
, running import numpy
will either give you the error or succeed. And you can look at the file paths given too.
I've made a lot of progress towards better support for extension modules and shared library dependencies in the past few weeks. However, numpy is special and triggers a currently unhandled corner case. What's happening is that numpy installs some shared libraries in unconventional locations on the filesystem. This confuses PyOxidizer's parser for discovering resources. In some cases, numpy's shared libraries are improperly being detected as Python extension modules. In other cases, they aren't being detected at all because they are inside a sys.path
directory that doesn't have an __init__.py
file.
While this is a corner case, the expectation is for PyOxidizer to just work. So I'll need to teach PyOxidizer to handle this scenario. This will require:
Please let me know the current state of matters with the latest commit in the main
branch. If it is just numpy
that is broken, we should probably close this issue in favor of an issue dedicated to numpy
compatibility.
Ok, excellent. I couldn't get main to work for other reasons when I tried last time, but I'm probably missing something obvious and will try again. PySide2 was the other example you can look at to see if they're doing anything in common with numpy.
PyOxidizer 0.9 has a new files mode which can be leveraged to enable PyOxidizer to work with complex packages like NumPy. See https://pyoxidizer.readthedocs.io/en/v0.9.0/packaging_additional_files.html#installing-unclassified-files-on-the-filesystem for an example of how to get NumPy working with PyOxidizer.
Would the new "files mode" help fix the issue @aaliddell had with PySide2?
I can now get numpy and PySide2 to work on 0.10.3, after following similar steps to what's in that guide.
In the PySide2 case, the path I originally gave (.../files.dir/...
) remains mangled until the 'classify' mode is disabled, as with numpy.
With regard to resolving the classification failure: am I correct in believing that from Python's perspective, a .so file can be either a python extension or just a normal shared library?
As you mention, scanning the symbols of the shared library would be a workable - if unpleasant - solution; cpython itself checks for the PyInit_<moduleName>
/PyInitU_<moduleName>
(with _
prefix on BSDs) dynamic symbol to determine if a library it has loaded is a valid extension:
As a quick check, I tried the following with the numpy files:
> objdump -T build/x86_64-unknown-linux-gnu/debug/install/lib/numpy/linalg/lapack_lite.cpython-38-x86_64-linux-gnu.so | grep PyInit
0000000000001f00 g DF .text 0000000000000262 Base PyInit_lapack_lite
> objdump -T build/x86_64-unknown-linux-gnu/debug/install/lib/numpy.libs/libopenblasp-r0-ae94cfde.3.9.dev.so | grep PyInit
<no output>
Another observation is that python extension shared libraries don't follow the libsomething.so
convention and optionally have the cpython...
or abiX
infix, so a really simple (but error prone) classification would just be to look for those. This would fail for any python extension that happens to start with lib, but this could be used as a heuristic to reduce the number of .so files you need to scan the symbols of:
cpython-... infix |
abiX infix |
no cpython-... or abiX infix |
|
---|---|---|---|
lib prefix |
Most likely is an extension. Perhaps scan symbols | Indecisive. Scan symbols | Most likely not an extension. Scan symbols |
no lib prefix |
Most likely is an extension. Common case. Don't scan symbols | Most likely is an extension. Scan symbols | Indecisive. Scan symbols |
Other heuristics could be the presence of extra .
or invalid module characters in the filename, since extensions appear to only split to two or three elements: modulename.cpython-blah.so
, modulename.abiX.so
or modulename.so
. See PEP 3149.
Obviously if symbol scanning is fast enough, you wouldn't bother with these heuristics and instead just always scan a .so file.
Here's an example of some code that attempts to classify the .so files based on the presence of the relevant PyInit
/PyInitU
symbol from PEP 489:
import pathlib
import elftools
def is_python_extension(file_path):
# Get init hook symbol name, as per https://www.python.org/dev/peps/pep-0489/#export-hook-name
expected_modulename = pathlib.Path(file_path).name.partition('.')[0]
if expected_modulename.isascii():
expected_symbol_name = 'PyInit_' + expected_modulename
else:
expected_symbol_name = 'PyInitU_' + expected_modulename.encode('punycode').replace(b'-', b'_').decode('ascii')
# Search for symbol in dynamic segment
is_python_ext = False
with file_path.open('rb') as f:
ef = elftools.elf.elffile.ELFFile(f)
for seg in ef.iter_segments():
if isinstance(seg, elftools.elf.dynamic.DynamicSegment):
for sym in seg.iter_symbols():
if sym.name == expected_symbol_name:
is_python_ext = True
break
# Don't bother scanning any more segments if we've already found PyInit
if is_python_ext:
break
return is_python_ext
Running this against the .so files from numpy, gets:
numpy/linalg/lapack_lite.cpython-38-x86_64-linux-gnu.so True
numpy/linalg/_umath_linalg.cpython-38-x86_64-linux-gnu.so True
numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so True
numpy/core/_umath_tests.cpython-38-x86_64-linux-gnu.so True
numpy/core/_multiarray_tests.cpython-38-x86_64-linux-gnu.so True
numpy/core/_operand_flag_tests.cpython-38-x86_64-linux-gnu.so True
numpy/core/_struct_ufunc_tests.cpython-38-x86_64-linux-gnu.so True
numpy/core/_rational_tests.cpython-38-x86_64-linux-gnu.so True
numpy/random/_sfc64.cpython-38-x86_64-linux-gnu.so True
numpy/random/_pcg64.cpython-38-x86_64-linux-gnu.so True
numpy/random/mtrand.cpython-38-x86_64-linux-gnu.so True
numpy/random/bit_generator.cpython-38-x86_64-linux-gnu.so True
numpy/random/_common.cpython-38-x86_64-linux-gnu.so True
numpy/random/_generator.cpython-38-x86_64-linux-gnu.so True
numpy/random/_philox.cpython-38-x86_64-linux-gnu.so True
numpy/random/_mt19937.cpython-38-x86_64-linux-gnu.so True
numpy/random/_bounded_integers.cpython-38-x86_64-linux-gnu.so True
numpy/fft/_pocketfft_internal.cpython-38-x86_64-linux-gnu.so True
numpy.libs/libopenblasp-r0-ae94cfde.3.9.dev.so False
numpy.libs/libquadmath-2d0c479f.so.0.0.0 False
numpy.libs/libgfortran-2e0d59d6.so.5.0.0 False
numpy.libs/libz-eb09ad1d.so.1.2.3 False
This appears to be classifying correctly in this particular case and I have also checked against PySide2/shiboken2. It is not particularly fast, but it's not attempting to use any of the above heuristics and is only in Python.
I unfortunately don't know enough Rust to be helpful with a PR :confused:
But #183 looks like it has the fundamentals for ELF file symbol parsing, albeit it's missing the PyInitU
variant.
I may also have completely missed the point here :roll_eyes:
Thank you for all the investigations here.
If PyOxidizer moves forward with extension module file detection, the code will be implemented in pure Rust using a symbol name sniffing strategy similar to what people in this issue have implemented.
When trying to use 0.7.0 to build a basic app with only numpy added, the new
prefer-in-memory-fallback-filesystem-relative
resource policy can be used. This almost works, but one of the shared libraries is copied into the incorrect location in the specified folder.The
pyoxidizer.bzl
file is effectively the default generated file from the getting started guide, but with the following lines:resources_policy='prefer-in-memory-fallback-filesystem-relative:lib',
exe.add_filesystem_relative_python_resources('lib', dist.pip_install(['numpy']))
When running the binary, then doing
import numpy
, the following (trimmed) error occurs:Original error was: libopenblasp-r0-ae94cfde.3.9.dev.so: cannot open shared object file: No such file or directory
When looking at the file tree alongside the binary, there is the file at
build/x86_64-unknown-linux-gnu/debug/install/lib/numpy/libs/libopenblasp-r0-ae94cfde/3/9/dev.so
, which should instead be atbuild/x86_64-unknown-linux-gnu/debug/install/lib/numpy.libs/libopenblasp-r0-ae94cfde.3.9.dev.so
It appears
.
in the file name (libopenblasp-r0-ae94cfde.3.9.dev.so
) and folder name (numpy.libs
) have been converted to/
. If you look in the numpy wheel, you can see the expected file tree structure. Having a vague dig through the PyOxidizer code, I can't see why this one file is getting mis-copied, particularly when there are three other shared libraries in that folder which are copied correctly. The only difference is the order of the version numbers around the.so
. Is the shared lib path perhaps getting interpretted as a python module name?If I manually move that file to the
numpy.libs
folder, numpy now imports correctly and works for the few things I've tested. If this can be resolved, numpy will effectively work out-of-the-box and resolve #65 .