Open trevor-m opened 11 months ago
@angerson It seems like this might be related to hermetic python. Any thoughts?
Yeah, this is probably due to unsetting PATH while using a (generated?) shebang of the form #!/usr/bin/env python3
.
It breaks build isolation since if it works it picks up system /bin/python3
or /usr/bin/python3
on Linux instead of the Python Tensorflow was instructed to use, see https://linux.die.net/man/3/execl:
The file is sought in the colon-separated list of directory pathnames specified in the PATH environment variable. If this variable isn't defined, the path list defaults to the current directory followed by the list of directories returned by confstr(_CS_PATH). (This confstr(3) call typically returns the value "/bin:/usr/bin".)
Unsetting PATH may be fine but then execute env - <absolute path to python interpreter> ./script.py
instead of env - ./script.py
, or use the absolute path in the shebang (but note that has downsides too since the relevant executable may be in a long path, and Linux has a shebang line limit).
What commands are you using to start a build? It's been working fine in our nightly and continuous tests on the tensorflow/build containers.
Hi @angerson, I'm using ./configure && bazel build -c opt --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=1 --java_runtime_version=remotejdk_11 tensorflow/tools/pip_package:build_pip_package
.
This error only occurs with our manylinux build container which does not contain a python3
in the "unset PATH" directories that @haampie mentioned (/bin:/usr/bin/:/usr/local/bin
. I believe the tensorflow/build containers are ubuntu based and will have a system python3 in one of those directories which is currently being inadvertently used for these build rules.
It breaks build isolation since if it works it picks up system
/bin/python3
or/usr/bin/python3
on Linux instead of the Python Tensorflow was instructed to use, see https://linux.die.net/man/3/execl:
Yes, this appears to be exactly what's happening.
Unsetting PATH may be fine but then execute
env - <absolute path to python interpreter> ./script.py
instead ofenv - ./script.py
, or use the absolute path in the shebang (but note that has downsides too since the relevant executable may be in a long path, and Linux has a shebang line limit).
This makes sense, I think this change needs to be in bazel? It sounds like py_binary
should be setting up the command to use the hermetic python environment and it is not.
I bisected it to 539673ead2b66a9c2dce3fb90e3767efda5deef5
539673ead2b66a9c2dce3fb90e3767efda5deef5 is the first bad commit
commit 539673ead2b66a9c2dce3fb90e3767efda5deef5
Author: Marc Fisher II <fisherii@google.com>
Date: Fri Sep 8 09:27:41 2023 -0700
Switch to using new API generation.
ci/official/wheel_test/test_import_api_packages.py | 1 +
tensorflow/BUILD | 58 ++++++++--------------
.../python/tools/api/generator2/generate_api.bzl | 52 +++++++++++++++++--
3 files changed, 69 insertions(+), 42 deletions(-)
Ping @DrMarcII
I don't know bazel well enough to quickly see how to solve it, let's leave that to googlers ;p
Revert of 539673ead2b66a9c2dce3fb90e3767efda5deef5 applies cleanly to 2.15, but then the build fails with
ImportError: _pywrap_tensorflow_internal.so: cannot open shared object file: No such file or directory
so more is necessary. If someone could take over to fix it that'd be great.
If you want to reproduce, run mv /usr/bin/python3 /usr/bin/python3.tmp
and do an ordinary build (with another python)
I think that this may be related to https://github.com/bazelbuild/rules_python/issues/691. 539673ead2b66a9c2dce3fb90e3767efda5deef5 added an aspect that runs a py_binary
on each py_library
. It looks like the py_binary
bootstrap script currently has an implicit dependency on a system interpreter being installed.
Can you set the shebang line to PYTHON_BIN_PATH
? The linked issue mentions stubs for shebangs.
Or if possible: invoke the script directly $PYTHON_BIN_PATH script.py
. This is more robust as it allows for longer paths to the python executable.
See https://www.in-ulm.de/~mascheck/various/shebang/#issues for reference.
the length of the #! is much smaller than the maximum path length
Any updates on this? Would be great to be able to build TF without assuming that /usr/bin/python3
exists.
This affects multiple package managers that don't have a /usr/bin/python3
:
and possibly others. Can someone have a look at it?
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
No
Source
source
TensorFlow version
2.15
Custom code
No
OS platform and distribution
Manylinux 2.28 (AlmaLinux 8) quay.io/pypa/manylinux_2_28
Mobile device
No response
Python version
3.10
Bazel version
6.1.0
GCC/compiler version
11.2.1
CUDA/cuDNN version
12.3
GPU model and memory
No response
Current behavior?
Compiling TF 2.15 from source fails with the error
env: 'python3': No such file or directory
coming from all instances ofpy_strict_library
orpytype_strict_library
.Based on the commands bazel is issuing from
--verbose_failures
, bazel is usingenv -
to clear the environment which causes it to be unable to find python3 since it is not longer in the path. For example:Standalone code to reproduce the issue
Relevant log output