Closed BishopWolf closed 11 months ago
After adding the missing dependencies (roctracer
and gcc11
), I'm also getting this error.
Full build log: https://github.com/arch4edu/cactus/actions/runs/3059355491/jobs/4937911118
Currently there are dependencies for a lot of stand alone packages from rocm, fortunately the opencl-amd-dev package is functional and contains all what you possible need.
This will be a step to avoid all rocm dependency problems
The files listed in the error message are not missing. They exist but are just not declared in the rule.
PS. The rule is located in tensorflow/stream_executor/rocm/BUILD
.
@petronny May you please enlarge your description. How can I fix this issue?
I haven't figured it out neither. But building from 2.11.0 will be a good start.
At 2.10.0 The rules declared in tensorflow/stream_executor/rocm/BUILD
are different to the rules in the tensorflow-amd upstream which are working. And at 2.11.0 they are same now.
However, just upgrading pkgver to 2.11.0 in PKGBUILD won't fix the issue.
I'm getting the same error for version 2.12.0-3
of the package, using rocm 5.6.0
. The build worked fine before, I suppose a rocm update broke it.
ERROR: /home/bmonkey/code/aur/tensorflow-rocm/src/tensorflow-2.12.0-opt-rocm/tensorflow/compiler/xla/stream_executor/rocm/BUILD:406:11: Compiling tensorflow/compiler/xla/stream_executor/rocm/rocm_helpers.cu.cc failed: undeclared inclusion(s) in rule '//tensorflow/compiler/xla/stream_executor/rocm:rocm_helpers':
this rule is missing dependency declarations for the following files included by 'tensorflow/compiler/xla/stream_executor/rocm/rocm_helpers.cu.cc':
'/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_runtime_wrapper.h'
'/opt/rocm/llvm/lib/clang/16.0.0/include/cuda_wrappers/cmath'
'/opt/rocm/llvm/lib/clang/16.0.0/include/stddef.h'
'/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_libdevice_declares.h'
'/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_math.h'
'/opt/rocm/llvm/lib/clang/16.0.0/include/cuda_wrappers/algorithm'
'/opt/rocm/llvm/lib/clang/16.0.0/include/cuda_wrappers/new'
'/opt/rocm/llvm/lib/clang/16.0.0/include/limits.h'
'/opt/rocm/llvm/lib/clang/16.0.0/include/stdint.h'
'/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_stdlib.h'
'/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_cuda_math_forward_declares.h'
'/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_hip_cmath.h'
'/opt/rocm/llvm/lib/clang/16.0.0/include/__clang_cuda_complex_builtins.h'
'/opt/rocm/llvm/lib/clang/16.0.0/include/cuda_wrappers/complex'
'/opt/rocm/llvm/lib/clang/16.0.0/include/__stddef_max_align_t.h'
'/opt/rocm/llvm/lib/clang/16.0.0/include/stdarg.h'
Maybe I should also mention that I needed to do following hacks to come this far, as apparently rocm package paths have changed:
sudo ln -s /opt/rocm/bin/hipcc /opt/rocm/hip/bin/hipcc
sudo ln -s /opt/rocm/bin/hipcc.pl /opt/rocm/hip/bin/hipcc.pl
Otherwise I was getting:
sh: line 1: /opt/rocm/hip/bin/hipcc: No such file or directory
Can't open perl script "/opt/rocm/hip/bin//hipcc.pl": No such file or directory
@lubosz, I never ran into this issue. Have you tried building in a clean chroot? If your rocm installation is borked and pacman -Syu
doesn't fix it, doing a chroot build may be more practical than removing and reinstalling the entire rocm toolchain. It is surprisingly easy.
I have started an evening build and will report back tomorrow if it succeeds/fails.
Edit: Forgot to mention that, in addition to making the change export GCC_HOST_COMPILER_PATH=/usr/bin/gcc-12
, I also replace all instances of gcc
with gcc-12
and g++
with g++-12
. Might be important idk.
nvm, I'm getting the hipcc issue in a clean chroot build.
sh: line 1: /opt/rocm/hip/bin/hipcc: No such file or directory
I will confirm that my own tensorflow-amd-git
package still works before getting back to this. Edit: dang, tensorflow-upstream
is also broken.
sh: line 1: /opt/rocm/hip/bin/hipcc: No such file or directory
@mpeschel10 Opened a new bug report for this issue: https://github.com/rocm-arch/tensorflow-rocm/issues/57
So it turns out this issue here is a bazel feature. It happens when bazel runs into unexpected includes. This can happen due to caching issues as stated above or like in my current case due to actual changes in system includes.
Further reading: https://stackoverflow.com/questions/43921911/how-to-resolve-bazel-undeclared-inclusions-error https://github.com/tensorflow/tensorflow/issues/10665#issuecomment-308931453
In my case, tensorflow maintains a list of llvm rocm headers in their build system. Version by version. That version got bumped to 16, build went bad.
I have a fix for rocm 5.6 available on my branch of this package: https://github.com/lubosz/tensorflow-rocm/commit/9d540c96c60d38f73ab374c1194a7efb34160034
It's fixed on the master branch of tensorflow. Up to llvm 17. Future proof.
I have a fix for rocm 5.6 available on my branch of this package: lubosz@9d540c9
I confirm that this PKGBUILD builds without errors and appears to be as functional as it was before the 5.6 update. I did manually link /opt/rocm/hip/bin/hipcc
to /opt/rocm/bin/hipcc
, so I can't confirm if this also resolves issue #57, but it's probably cool.
(I can't thoroughly test it; I get a std::bad_variant_access
, when I call model.fit()
, but that was happening before the update.)
It's fixed on the master branch of tensorflow. Up to llvm 17. Future proof.
@lubosz thanks for the detailed investigation! Can you link the exact commit where the change occurs in upstream tensorflow?
Edit: Found it here: https://github.com/tensorflow/tensorflow/commit/c97cec76fc145c25543b0e7545d5ea3ad4f8e764
closed since 2.15.0
has the fix
The error I get is always the same: