lmmx / impscan

Command line tool to identify minimal imports list and repository sources by parsing package dependency trees
MIT License
0 stars 1 forks source link

Verify (if needed) dynamic modules are importable #10

Closed lmmx closed 3 years ago

lmmx commented 3 years ago

If needed (i.e. if the name to be verified is not already imported as .py as it often is) it's advisable to verify that a would-be importable dynamic module name (exposed as /site-packages/ path) is actually so, else you get:

ImportError: dynamic module does not define module export function (PyInit_libmlpack)

The 'recipe' for the function name to check is simply PyInit_ + the would-be imported dynamic module name (i.e. the part prefixing .so).

In the case of libmlpack.so, the "file" ~/miniconda3/envs/mlpack/lib/python3.9/site-packages/libmlpack.so is a symlink to ../../libmlpack.so.3.4 which can be inspected by nm -D --defined-only libmlpack.so.3.4 the output of which does not contain the function name PyInit_libmlpack

As a counterexample, take another module, newt, which has site-packages: _snack.so and snack.py. Both import _snack and import snack succeed, meaning the dynamic module _snack.so either is or is symlinked to a library which I expect nm will show includes PyInit__snack...

In fact the site-packages directory contains _snack.cpython-39-x86_64-linux-gnu.so:

nm -D --defined-only _snack.cpython-39-x86_64-linux-gnu.so | grep -Eo "PyInit_.*"

PyInit__snack
lmmx commented 3 years ago

For mlpack (via https://conda.anaconda.org/conda-forge/linux-64/mlpack-3.4.2-py39hd7fc29b_0.tar.bz2) then lib/python3.9/site-packages/libmlpack.so is a symlink to ../../libmlpack.so which in turn symlinks itself to libmlpack.so.3.4 which can be read:

import subprocess
so_path = "lib/libmlpack.so.3.4"
cmd = ["nm", "-D", "--defined-only", so_path]
exported_funcs = subprocess.run(cmd, capture_output=True).stdout.decode().split("\n")
py_funcs = [f for f in exported_funcs if "PyInit_" in f]
print(f"{py_funcs=}")

py_funcs=[]
lmmx commented 3 years ago

For newt (via https://conda.anaconda.org/conda-forge/linux-64/newt-0.52.21-py39h3811e60_5.tar.bz2) then lib/python3.9/site-packages/_snack.cpython-39-x86_64-linux-gnu.so can be read:

import subprocess
so_path = "lib/python3.9/site-packages/_snack.cpython-39-x86_64-linux-gnu.so"
cmd = ["nm", "-D", "--defined-only", so_path]
exported_funcs = subprocess.run(cmd, capture_output=True).stdout.decode().split("\n")
py_funcs = [f for f in exported_funcs if "PyInit_" in f]
print(f"{py_funcs=}")

py_funcs=['0000000000006a00 T PyInit__snack']

You could then extract it as:

set([f.split("PyInit_")[1] for f in py_funcs])

{'_snack'}
lmmx commented 3 years ago

nm will not accept /dev/fd/0 as a valid file when trying to pass the bytes over stdin (unclear if due to null byte stripping by cat or simply checking file extension) so seems necessary to read bytes from the archive path and write to a temporary directory copy of the .so, calling subprocess.run with capture_output=True to run nm -D --defined-only on the temporary file

lmmx commented 3 years ago

Above now also complete for tar.bz2 format a63e665b96287ae61e509defdfc617998de84da7