tlambert03 / nd2

Full-featured nd2 (Nikon NIS Elements) file reader for python. Outputs to numpy, dask, and xarray. Exhaustive metadata extraction
https://tlambert03.github.io/nd2
BSD 3-Clause "New" or "Revised" License
53 stars 15 forks source link

Shared library problems #53

Closed shenker closed 2 years ago

shenker commented 2 years ago

Description

I think I am running into two separate issues: 1) When building locally, either with pip install or pip install -e, the shared library is not installed correctly (related to https://github.com/tlambert03/nd2/issues/24). 2) Even when I run pytest from the root of the repo, to work around the previous problem, liblimfile.so complains about undefined symbols.

I spent a couple hours playing around and am rapidly running against the limits of my understanding of modern Python packaging, so any assistance or hints you could provide @tlambert03 would be much appreciated!

What I Did

g cl git@github.com:tlambert03/nd2.git
cd nd2
mamba create -n nd2test
mamba activate nd2test
mamba install python=3.10.4 pip gcc
pip install -e .[dev]

(installing gcc in the virtualenv is necessary because the default gcc on HMS O2 is ancient)

If I then run pytest from the root of the repo, I get the error

============================= test session starts ==============================
platform linux -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/jqs1/_temp/nd2, configfile: setup.cfg
plugins: cov-3.0.0
collected 30 items / 2 errors

==================================== ERRORS ====================================
___________________ ERROR collecting tests/test_aicsimage.py ___________________
ImportError while importing test module '/home/jqs1/_temp/nd2/tests/test_aicsimage.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../mambaforge/envs/nd2test/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_aicsimage.py:11: in <module>
    from aicsimageio.readers.nd2_reader import ND2Reader
E   ModuleNotFoundError: No module named 'aicsimageio'
______________________ ERROR collecting tests/test_sdk.py ______________________
ImportError while importing test module '/home/jqs1/_temp/nd2/tests/test_sdk.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../mambaforge/envs/nd2test/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_sdk.py:5: in <module>
    from nd2._sdk import latest
src/nd2/_sdk/__init__.py:1: in <module>
    from . import latest
E   ImportError: src/sdk/Linux/x86_64/lib/liblimfile.so: symbol TIFFReadRGBATileExt, version LIBTIFF_4.0 not defined in file libtiff.so.5 with link time reference
=========================== short test summary info ============================
ERROR tests/test_aicsimage.py
ERROR tests/test_sdk.py
!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!
============================== 2 errors in 1.14s ===============================

If I then pip install aicsimageio and re-run pytest:

============================= test session starts ==============================
platform linux -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/jqs1/_temp/nd2, configfile: setup.cfg
plugins: cov-3.0.0
collected 40 items / 1 error

==================================== ERRORS ====================================
______________________ ERROR collecting tests/test_sdk.py ______________________
ImportError while importing test module '/home/jqs1/_temp/nd2/tests/test_sdk.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../mambaforge/envs/nd2test/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_sdk.py:5: in <module>
    from nd2._sdk import latest
src/nd2/_sdk/__init__.py:1: in <module>
    from . import latest
E   ImportError: src/sdk/Linux/x86_64/lib/liblimfile.so: symbol TIFFReadRGBATileExt, version LIBTIFF_4.0 not defined in file libtiff.so.5 with link time reference
=========================== short test summary info ============================
ERROR tests/test_sdk.py
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
=============================== 1 error in 2.08s ===============================

From the root of the repo, if I run python -c "from nd2._sdk import latest", I get:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/jqs1/_temp/nd2/src/nd2/_sdk/__init__.py", line 1, in <module>
    from . import latest
ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by src/sdk/Linux/x86_64/lib/liblimfile.so)

If I cd outside the repo, the same command gives:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/jqs1/_temp/nd2/src/nd2/_sdk/__init__.py", line 1, in <module>
    from . import latest
ImportError: liblimfile.so: cannot open shared object file: No such file or directory

I was thinking this had something to do with the editable pip install, so I tried installing via pip install .[dev] (no -e). This produces the same results as above. Here's a clue: find /home/jqs1/mambaforge/envs/nd2test/lib/python3.10/site-packages/nd2/ -name "*.so" gives

/home/jqs1/mambaforge/envs/nd2test/lib/python3.10/site-packages/nd2/_sdk/latest.cpython-310-x86_64-linux-gnu.so

Namely, the Nikon SDK shared libraries (liblimfile.so and libnd2readsdk-shared.so) aren't even being installed. I noticed that the paths in MANIFEST.in are wrong. I believe

graft src/sdk/*/lib
graft src/sdk/*/include

should be

graft src/sdk/*/*/lib
graft src/sdk/*/*/include

Unfortunately, this change does not appear to fix this problem, the Nikon shared libraries are still not installed.

tlambert03 commented 2 years ago

Operating System: Linux 3.10.0-1160.62.1.el7.x86_64 (this is on HMS O2 with conda-forge python installed via mamba)

ohhhh, you're running on centos. Yeah, support for centos has been very hard, and I suspect that's the actual issue here. I was able to get it to build and ship for CentOS on conda forge, so you could their docker setup if you want. but I've never got it to compile correctly locally.

I can try to take a crack at it on O2... but in general, haven't been developing there.

shenker commented 2 years ago

Gotcha.

I can believe that the undefined symbols (problem 2) is probably entirely a CentOS issue. I was hoping that installing all the relevant libraries (gcc, libstdcxx-ng, etc.) into a virtualenv from conda-forge would offer a way around the issue, if I can find the right versions of everything. This does appear to have been possible, at least a couple years ago. Some googling turned up:

It would be nice to have it build on O2, since that's where all my big ND2 files are (and where I can run dask distributed). I'm assuming the way to go about doing this is: 1) play around and see if we can find versions of libstdcxx, libtiff, etc. that make things work 2) if that fails, just use the conda-forge docker setup via singularity on O2 (I've never used singularity before)

It's less obvious to me why CentOS would cause the SDK .so files to be not installed at all (problem 1). I'm very confused by this and don't know quite how to attack it.

tlambert03 commented 2 years ago

If you (or anyone) wants to dig into this, I think digging through the logs for the successful centos builds using the conda-forge CI would be a good start. The last one is here: https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=474173&view=results

note the build & host requirements in the recipe ... all the env vars during build here After building, I think there is still an important (re)localization step, and I think it starts around here.

Note that most of the difficulty here comes from the fact that we're using precompiled sdk shared library files provided by LIM ... so library linking and dependencies is more complicated as a result

shenker commented 2 years ago

Finally got around to playing around with this more. I can run pytest locally on O2 with nd2 from conda-forge with all tests passing, and I can build nd2 on O2 and get all tests to pass except those in tests/test_dask_dispatch.py and tests/test_segfaults.py. (See MWEs below.) There the memmaps don't seem to be closed when they should be, and I cannot fathom why it works for the conda-forge package but doesn't when I build myself on O2. Unless you recognize the BufferErrors I'm seeing, I think I'm going to give up (being able to run most of the test suite is good enough).

Building locally

mamba env create -n nd2test
mamba activate nd2test
mamba install -c conda-forge cython black flake8 flake8-docstrings imagecodecs aicsimageio ipython isort mypy pre-commit psutil pydocstyle pytest pytest-cov wurlitzer xarray resource_backed_dask_array python=3.10.4 requests
git clone git@github.com:tlambert03/nd2.git
cd nd2
python scripts/download_samples.py
make
pip install --no-deps -e .
LD_LIBRARY_PATH=$CONDA_PREFIX/lib pytest

This gives 512 errors, all of them either BufferError or AssertionError. I rerun with pytest tests/test_segfaults.py to show the error for a single test (it's the same as all the other failed tests):

============================= test session starts ==============================
platform linux -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/jqs1/projects/nd2, configfile: setup.cfg
plugins: cov-3.0.0
collected 525 items / 524 deselected / 1 selected

tests/test_segfaults.py FE                                               [100%]

==================================== ERRORS ====================================
________________________ ERROR at teardown of test_seg _________________________

    @pytest.fixture(autouse=True)
    def no_files_left_open():
        files_before = {p for p in psutil.Process().open_files() if p.path.endswith("nd2")}
        yield
        files_after = {p for p in psutil.Process().open_files() if p.path.endswith("nd2")}
>       assert files_before == files_after == set()
E       AssertionError: assert set() == {popenfile(pa...flags=557056)}
E         Extra items in the right set:
E         popenfile(path='/home/jqs1/projects/nd2/tests/data/jonas_header_test2.nd2', fd=16, position=0, mode='r', flags=557056)
E         Use -v to get more diff

tests/conftest.py:46: AssertionError
=================================== FAILURES ===================================
___________________________________ test_seg ___________________________________

    def test_seg():
        with nd2.ND2File(str(DATA / "jonas_header_test2.nd2")) as f:
            img = f.to_xarray(delayed=True, squeeze=False, position=0)

>       a = img.compute()

tests/test_segfaults.py:13:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../mambaforge/envs/nd2/lib/python3.10/site-packages/xarray/core/dataarray.py:947: in compute
    return new.load(**kwargs)
../../mambaforge/envs/nd2/lib/python3.10/site-packages/xarray/core/dataarray.py:921: in load
    ds = self._to_temp_dataset().load(**kwargs)
../../mambaforge/envs/nd2/lib/python3.10/site-packages/xarray/core/dataset.py:861: in load
    evaluated_data = da.compute(*lazy_data.values(), **kwargs)
../../mambaforge/envs/nd2/lib/python3.10/site-packages/dask/base.py:600: in compute
    results = schedule(dsk, keys, **kwargs)
../../mambaforge/envs/nd2/lib/python3.10/site-packages/dask/threaded.py:81: in get
    results = get_async(
../../mambaforge/envs/nd2/lib/python3.10/site-packages/dask/local.py:508: in get_async
    raise_exception(exc, tb)
../../mambaforge/envs/nd2/lib/python3.10/site-packages/dask/local.py:316: in reraise
    raise exc
../../mambaforge/envs/nd2/lib/python3.10/site-packages/dask/local.py:221: in execute_task
    result = _execute_task(task, data)
../../mambaforge/envs/nd2/lib/python3.10/site-packages/dask/core.py:119: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
../../mambaforge/envs/nd2/lib/python3.10/site-packages/dask/core.py:119: in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
../../mambaforge/envs/nd2/lib/python3.10/site-packages/dask/core.py:119: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
../../mambaforge/envs/nd2/lib/python3.10/site-packages/dask/optimization.py:990: in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
../../mambaforge/envs/nd2/lib/python3.10/site-packages/dask/core.py:149: in get
    result = _execute_task(task, cache)
../../mambaforge/envs/nd2/lib/python3.10/site-packages/dask/core.py:119: in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
../../mambaforge/envs/nd2/lib/python3.10/site-packages/dask/utils.py:41: in apply
    return func(*args, **kwargs)
../../mambaforge/envs/nd2/lib/python3.10/site-packages/dask/array/core.py:514: in _pass_extra_kwargs
    return func(*args[len(keys) :], **kwargs)
src/nd2/nd2file.py:351: in _dask_block
    self.close()
src/nd2/nd2file.py:82: in close
    self._rdr.close()
src/nd2/_sdk/latest.pyx:57: in nd2._sdk.latest.ND2Reader.close
    cpdef close(self):
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   self._mmap.close()
E   BufferError: cannot close exported pointers exist

src/nd2/_sdk/latest.pyx:60: BufferError
=========================== short test summary info ============================
FAILED tests/test_segfaults.py::test_seg - BufferError: cannot close exported...
ERROR tests/test_segfaults.py::test_seg - AssertionError: assert set() == {po...
================== 1 failed, 524 deselected, 1 error in 3.45s =================

Using the conda-forge build

To determine whether this is a compile-time or run-time problem, I want to verify that the conda-forge version passes all tests when running on CentOS (it does):

mamba env create -n nd2test
mamba activate nd2test
mamba install -c conda-forge python=3.10.4 nd2 aicsimageio xarray resource_backed_dask_array pytest requests
git clone git@github.com:tlambert03/nd2.git
cd nd2
python scripts/download_samples.py
pytest

gives

============================= test session starts ==============================
platform linux -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/jqs1/projects/nd2, configfile: setup.cfg
collected 525 items

tests/test_aicsimage.py ..........                                       [  1%]
tests/test_dask_dispatch.py ........                                     [  3%]
tests/test_reader.py ................................................... [ 13%]
........................................................................ [ 26%]
...................................................sss.................. [ 40%]
........................................................................ [ 54%]
...................................ssssss............................... [ 68%]
............................................xxxx.x....x...xx............ [ 81%]
....................x.............                                       [ 88%]
tests/test_readme.py .                                                   [ 88%]
tests/test_rescue.py .                                                   [ 88%]
tests/test_sdk.py ...................................................... [ 98%]
.....                                                                    [ 99%]
tests/test_segfaults.py .                                                [100%]

============= 507 passed, 9 skipped, 9 xfailed in 63.28s (0:01:03) =============
tlambert03 commented 2 years ago

E BufferError: cannot close exported pointers exist

this is what I just fixed in https://github.com/tlambert03/nd2/pull/55

are you by chance using a main that is more than 6 days old?

shenker commented 2 years ago

Oh, I'm an idiot. All tests pass. Thanks Talley!

tlambert03 commented 2 years ago

Nice! That was easy 😂