Open snadampal opened 3 weeks ago
The issue is not specific to arm64, I see the same missing headers issue even on the other platforms, at least I have reproduced it on linx-x86_64 as well, with Ubuntu 22.04 OS. From the code it looks like it happens on every platform.
I have root-caused the issue to the fact that the dist_download
step is skipped for the native build, but the dist_download is the one setting up all the required native headers for the javacpp build. the non native build is working fine because dist_download
step executes there.
<!--
Download TensorFlow native libraries
This will download the official Python distribution for the active platform, and extract the `tensorflow_cc` library
from it so that we can generate the JavaCPP API bindings and distribute it as a JAR. This will be executed only
when not building a full native build.
-->
<id>dist-download</id>
<phase>initialize</phase>
<goals>
<goal>exec</goal>
</goals>
<configuration>
<skip>${dist.download.skip}</skip> <!-- skipped when full native build is enabled -->
<executable>bash</executable>
<arguments>
<argument>scripts/dist_download.sh</argument>
<argument>${dist.download.folder}</argument>
</arguments>
<environmentVariables>
<PLATFORM>${native.classifier}</PLATFORM>
</environmentVariables>
<workingDirectory>${project.basedir}</workingDirectory>
</configuration>
</execution>
</executions>
</plugin>
The backtrace:
[INFO] g++ -I/home/ubuntu/java/tensorflow-core/tensorflow-core-native/src/main/native/org/tensorflow/internal/c_api -I/home/ubuntu/.cache/bazel/_bazel_ubuntu/255b14aaecc232d3c121b5bd17b6e1a3/external/org_tensorflow -I/home/ubuntu/.cache/bazel/_bazel_ubuntu/255b14aaecc232d3c121b5bd17b6e1a3/external/org_tensorflow/third_party/xla/third_party/tsl -I/home/ubuntu/.cache/bazel/_bazel_ubuntu/255b14aaecc232d3c121b5bd17b6e1a3/execroot/tensorflow_java/bazel-out/k8-opt/bin/external/org_tensorflow -I/home/ubuntu/.cache/bazel/_bazel_ubuntu/255b14aaecc232d3c121b5bd17b6e1a3/external/com_google_protobuf/src -I/usr/lib/jvm/java-11-openjdk-amd64/include -I/usr/lib/jvm/java-11-openjdk-amd64/include/linux /home/ubuntu/java/tensorflow-core/tensorflow-core-native/target/native/org/tensorflow/internal/c_api/linux-x86_64/jnitensorflow.cpp /home/ubuntu/java/tensorflow-core/tensorflow-core-native/target/native/org/tensorflow/internal/c_api/linux-x86_64/jnijavacpp.cpp -march=x86-64 -m64 -O3 -s -std=c++17 -Wl,-rpath,$ORIGIN/ -Wl,-z,noexecstack -Wl,-Bsymbolic -Wall -fPIC -pthread -shared -o libjnitensorflow.so -L/home/ubuntu/.cache/bazel/_bazel_ubuntu/255b14aaecc232d3c121b5bd17b6e1a3/execroot/tensorflow_java/bazel-out/k8-opt/bin/external/org_tensorflow/tensorflow -Wl,-rpath,/home/ubuntu/.cache/bazel/_bazel_ubuntu/255b14aaecc232d3c121b5bd17b6e1a3/execroot/tensorflow_java/bazel-out/k8-opt/bin/external/org_tensorflow/tensorflow -ltensorflow_framework -ltensorflow_cc
In file included from /home/ubuntu/.cache/bazel/_bazel_ubuntu/255b14aaecc232d3c121b5bd17b6e1a3/external/org_tensorflow/third_party/xla/third_party/tsl/tsl/c/tsl_status_internal.h:19,
from /home/ubuntu/.cache/bazel/_bazel_ubuntu/255b14aaecc232d3c121b5bd17b6e1a3/external/org_tensorflow/tensorflow/c/tf_status_internal.h:19,
from /home/ubuntu/.cache/bazel/_bazel_ubuntu/255b14aaecc232d3c121b5bd17b6e1a3/external/org_tensorflow/tensorflow/c/c_api_internal.h:32,
from /home/ubuntu/java/tensorflow-core/tensorflow-core-native/src/main/native/org/tensorflow/internal/c_api/tfj_graph_impl.cc:18,
from /home/ubuntu/java/tensorflow-core/tensorflow-core-native/src/main/native/org/tensorflow/internal/c_api/tfj_graph.h:31,
from /home/ubuntu/java/tensorflow-core/tensorflow-core-native/target/native/org/tensorflow/internal/c_api/linux-x86_64/jnitensorflow.cpp:115:
/home/ubuntu/.cache/bazel/_bazel_ubuntu/255b14aaecc232d3c121b5bd17b6e1a3/external/org_tensorflow/third_party/xla/third_party/tsl/tsl/platform/status.h:28:10: fatal error: absl/base/attributes.h: No such file or directory
28 | #include "absl/base/attributes.h"
We modified where it's looking for the headers just before the rc1 release to fix this kind of issue. I tested it on macOS, and I thought I had tested it on a few Linuxes as well. I'll rerun the Linux build to see what's going on.
So it looks like the problem is that we used to get the absl headers from Bazel, but something has changed in the TF build process so it's not putting the absl repo in the bazel-tensorflow-core-native folder like it used to. We'd missed this because the clean is inconsistent between bazel & non-bazel builds.
Hi @Craigacp , it's not just the absl
, there are several other packages are missing too, like Eigen, ml_dtypes, protobuf......
they exist in the repo but the workspaces are not cloned.
I can replicate this, but we couldn't replicate it on Karl's machine, even after a clean of bazel. Both machines are running macOS 14.5 with the latest XCode, and the same version of bazel so I'm pretty confused as to what's causing the issue.
I'm surprised in the working case where it is getting the all absl/Eigen/ml_dtype headers from. Probably checking the include paths for libjnitensorflow.cpp compilation might give some clue? btw, it's consistently failing on linux.
No, in some cases the external folder in bazel-tensorflow-core-native
has extra folders in it linking to the dependencies we need the headers for, which we add to the include path in the pom. Not sure why bazel only puts them in some of the time. Not ruled out some memory on the machine that works yet.
Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template
System information
java -version
): openjdk 11.0.23 2024-04-16Describe the problem TensorFlow java source builds are failing on aarch64 linux system with the missing native headers. please let me know how it's built for x86_64 linux platform.
based on my debugging so far it looks like the dependency comes from this commit which added C API extension for custom gradient functions, and introduced these headers and .cc which requires several third_party libraries from tensorflow native but none of those bazel workspaces are cloned.
I tried to manually clone the missing workspaces into bazel cache, but the cycle is never ending, it's missing tsl, eigen, ml_dtype, absl, protobuf, and now compiled headers for protobuf....
Provide the exact sequence of commands / steps that you executed before running into the problem
Any other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.