nv-legate / legate.core

The Foundation for All Legate Libraries
https://docs.nvidia.com/legate/24.06/
Apache License 2.0
186 stars 61 forks source link

[BUG] HDF5 version mismatch with legate driver #927

Closed CharlelieLrt closed 8 months ago

CharlelieLrt commented 8 months ago

Software versions

Python : 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:10:55) [GCC 12.3.0] Platform : Linux-4.14.0-115.35.1.3chaos.ch6a.ppc64le-ppc64le-with-glibc2.17 Legion : v23.11.00.dev-37-gac081ac Legate : 23.11.00.dev+37.gac081ac Cunumeric : 23.11.00.dev+31.gc1bfd9d0 Numpy : 1.26.2 Scipy : 1.11.3 Numba : 0.58.1 CTK package : (failed to detect) GPU driver : 510.47.03 GPU devices :
GPU 0: Tesla V100-SXM2-16GB GPU 1: Tesla V100-SXM2-16GB GPU 2: Tesla V100-SXM2-16GB GPU 3: Tesla V100-SXM2-16GB

Expected behavior

After installing legate and cunumeric, h5py was installed in the conda environment with mamba install h5py. This installed h5py 3.10.0 and hdf5 1.14.3. The system already had an install of hdf5 1.8.12.

Trying to run import h5py with the standard python interpreter works just fine and h5py is usable as expected. This is expected to also work using the legate driver.

Observed behavior

However, running import h5py with the legate driver produces this error:

UserWarning: h5py is running against HDF5 1.8.12 when it was built against 1.14.3, this may cause problems
  _warn(("h5py is running against HDF5 {0} when it was built against {1}, "
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/g/g92/laurent3/miniforge3/envs/legate_base/lib/python3.10/site-packages/h5py/__init__.py", line 45, in <module>
    from ._conv import register_converters as _register_converters, \
  File "h5py/_conv.pyx", line 1, in init h5py._conv
  File "h5py/h5r.pyx", line 1, in init h5py.h5r
  File "h5py/h5p.pyx", line 1, in init h5py.h5p
  File "h5py/h5t.pyx", line 232, in init h5py.h5t
  File "h5py/h5t.pyx", line 468, in h5py.h5t.TypeID.copy
  File "h5py/defs.pyx", line 3707, in h5py.defs.H5Tcopy
  File "h5py/_errors.pyx", line 134, in h5py._errors.set_exception
RuntimeError: Failed to extract top-level error description

This suggests that the legate driver is picking up the system install of h5py instead of the one in the conda environmnet.

Example code or instructions

Legate and cunumeric were installed by running: ./scripts/generate-conda-envs.py --python 3.10 --ctk 12.0 --os linux, followed by: ./install.py --max-dim 5 --network gasnet1 --cuda --arch volta --openmp --hdf5 --conduit ibv

Then, h5py was installed in the conda environment with mamba install h5py. This installed h5py 3.10.0 and hdf5 1.14.3. The system already had an install of hdf5 1.8.12.

manopapad commented 8 months ago

Do you get the same error if you build Legate w/o --hdf? This is an old option, recently removed, that compiles Legion with its built-in HDF5 support, but that's not really used by Legate. Removing it shouldn't affect Legate in any way, but this may be what's adding the dependency on the system HDF5.

CharlelieLrt commented 8 months ago

I confirm that I don't have this bug when building legate without the --hdf option.

manopapad commented 8 months ago

Thanks for confirming. Since the --hdf5 build option has already been removed, I think we can close this.