tensorflow / io

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO
Apache License 2.0
702 stars 281 forks source link

_load_library fails on RHEL distributions due to `platlib` being different from `purelib` #1648

Open teocasse opened 2 years ago

teocasse commented 2 years ago

I an having some issue with tensorflow-io 0.24.0 when installing it in Docker.

My setup is as follows: to test and build my project, I am using tox, which also takes care of installing the project dependencies from pip (including tensorflow-io) in a dedicated virtual environment.

When I run tox on my mac, tensorflow-io is installed correctly in the virtual environment and all tests passed.

As part of our build system, I need however to do the test and build in a dedicated docker image. In this scenario, after tox initializes the virtual environment and installs the dependencies, the tests fail when importing tensorflow-io with the following error:

.../.tox/py38-unit-test-build/lib/python3.8/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: cannot open shared object file: No such file or directory'

I did some investigation, and it seems that the underlying issue is that, while the tensorflow-io files are installed under .../.tox/py38-unit-test-build/lib/python3.8/site-packages/tensorflow_io, the shared object is instead installed under lib64 (i.e. it can be found at .../.tox/py38-unit-test-build/lib64/python3.8/site-packages/tensorflow_io/python/ops/libtensorflow_io.so). In the tox environment on my mac instead, the shared object is at the expected lib location.

Any clues?

teocasse commented 2 years ago

Some more details:

teocasse commented 2 years ago

Anyone? As I see it, there are only 2 possible explanations:

I would appreciate some feedback to know how I should deal with this. Thanks!

teocasse commented 2 years ago

I have revisited this issue now and I have isolated its root cause.

The problem is unrelated from docker: it is due to the fact that, on RHEL-distributions, platlib is different from purelib, see for instance https://stackoverflow.com/a/27882460/7414397 and https://github.com/pypa/virtualenv/issues/1751.

The issue in _load_library is that the logic for identifying the path to the shared library does not take that possibility into account, and instead it implicitly assumes that platlib and purelib are the same.

This is definitely a bug that needs to be fixed in tensorflow-io, otherwise it won't work on RHEL distributions. Can someone assign this please? @yongtang