Closed Ettrai closed 7 years ago
We do not have official support for RHEL. (However, I think I have a solution below) Bazel is complaining that "nasm" package tensorflow depends on is looking for explicit build dependencies for the listed headers. The problem looks unrelated to CUDA or TensorFlow. maybe it can be related to this build file, but for core system libraries and headers, bazel should not be asking for explicit dependencies.
A quick search shows me that this error looks very similar to this one: https://github.com/tensorflow/tensorflow/issues/3431#issuecomment-234131699
So in your case, let's try this. Right after this line: https://github.com/tensorflow/tensorflow/blob/master/third_party/gpus/crosstool/CROSSTOOL.tpl#L124 could you try adding: `cxx_builtin_include_directory: "/opt/rh/devtoolset-4/root/usr/lib/gcc/x86_64-redhat-linux/5.3.1/include"
Then run configure and build again. Please let me know if it works or not.
@gunan thank you for your reply! I did as you suggested and that error has not shown up anymore.
I have now a new error I have just started to troubleshoot :
ERROR: /home/emt1627/.cache/bazel/_bazel_emt1627/aeec3eab67314b40e280b02ed0028dfc/external/nasm/BUILD:8:1: Linking of rule '@nasm//:nasm' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command
(cd /home/emt1627/.cache/bazel/_bazel_emt1627/aeec3eab67314b40e280b02ed0028dfc/execroot/tensorflow-nightly && \
exec env - \
LD_LIBRARY_PATH=/home/emt1627/opt/cudnn-8.0-linux-x64-v5.1:/usr/lib64/nvidia:/home/emt1627/opt/cuda-8.0/lib64:/opt/rh/devtoolset-4/root/usr/lib64:/opt/rh/devtoolset-4/root/usr/lib:/opt/rh/python27/root/usr/lib64 \
PATH=/home/emt1627/virtualenv/tensorflow-nightly-GPU/bin:/home/emt1627/opt/cuda-8.0/bin:/home/emt1627/opt/git-2.11/bin:/home/emt1627/opt/htop-2.0.2/bin:/home/emt1627/opt/jdk1.8.0_112/bin:/home/emt1627/opt/bazel-0.4.3-dist/output:/sbin:/usr/sbin:/usr/local/sbin:/opt/rh/devtoolset-4/root/usr/bin:/opt/rh/rh-java-common/root/usr/bin:/opt/rh/python27/root/usr/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin \
external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -o bazel-out/host/bin/external/nasm/nasm -Wl,-no-as-needed -B/usr/bin/ -pie -Wl,-z,relro,-z,now -no-canonical-prefixes -pass-exit-codes '-Wl,--build-id=md5' '-Wl,--hash-style=gnu' -Wl,-S -Wl,--gc-sections -Wl,@bazel-out/host/bin/external/nasm/nasm-2.params): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
/usr/bin/ld: unrecognized option '-plugin'
/usr/bin/ld: use the --help option for usage information
collect2: error: ld returned 1 exit status
Not sure what is going on there. Maybe this helps? http://stackoverflow.com/questions/24890865/usr-bin-ld-unrecognized-option-plugin-error
I ended up using this solution : https://github.com/bazelbuild/bazel/issues/361
I managed to compile the nightly version of tensorflow, now I am rebuilding r0.12.1 to try some of the bundled examples (e.g. models/images/mnist/convolutional.py) .
I tried to run those with the nightly version but I ended up experiencing this issue : https://github.com/tensorflow/models/issues/857
As soon as I manage to run that example properly I will post my diff.
I managed to build TensorFlow 0.12.1 with GPU support on the following configuration :
Red Hat EL 6.8 (no root access) Python 2.7.8 virtualenv 13.1.0 devtoolset-4 (GCC 5.3.1) Bazel 0.4.3 (built from source) GeForce GTX680 (compute capability 3.0) Cuda Toolkit 8.0 cuDNN 5.1
This is my final diff :
diff --git a/configure b/configure
index 3fc0b5909..33e73b8d0 100755
--- a/configure
+++ b/configure
@@ -22,7 +22,7 @@ function bazel_clean_and_fetch() {
# bazel clean --expunge currently doesn't work on Windows
# TODO(pcloudy): Re-enable it after bazel clean --expunge is fixed.
if ! is_windows; then
- bazel clean --expunge
+ bazel clean --expunge_async
fi
bazel fetch //tensorflow/...
}
diff --git a/tensorflow/tensorflow.bzl b/tensorflow/tensorflow.bzl
index d78cb7b57..42bf7c8b6 100644
--- a/tensorflow/tensorflow.bzl
+++ b/tensorflow/tensorflow.bzl
@@ -792,7 +792,7 @@ def tf_custom_op_library(name, srcs=[], gpu_srcs=[], deps=[]):
)
def tf_extension_linkopts():
- return [] # No extension link opts
+ return ["-lrt"]
def tf_extension_copts():
return [] # No extension c opts
diff --git a/tensorflow/workspace.bzl b/tensorflow/workspace.bzl
index 06e16cdb0..d1ac0544e 100644
--- a/tensorflow/workspace.bzl
+++ b/tensorflow/workspace.bzl
@@ -228,7 +228,7 @@ def tf_workspace(path_prefix = "", tf_repo_name = ""):
native.new_http_archive(
name = "zlib_archive",
- url = "http://zlib.net/zlib-1.2.8.tar.gz",
+ url = "http://zlib.net/fossils/zlib-1.2.8.tar.gz",
sha256 = "36658cb768a54c1d4dec43c3116c27ed893e88b02ecfcb44f2166f9c0b7f2a0d",
strip_prefix = "zlib-1.2.8",
build_file = str(Label("//:zlib.BUILD")),
diff --git a/third_party/gpus/crosstool/CROSSTOOL.tpl b/third_party/gpus/crosstool/CROSSTOOL.tpl
index 3ce6b74a5..06e572691 100644
--- a/third_party/gpus/crosstool/CROSSTOOL.tpl
+++ b/third_party/gpus/crosstool/CROSSTOOL.tpl
@@ -55,7 +55,9 @@ toolchain {
# and the device compiler to use "-std=c++11".
cxx_flag: "-std=c++11"
linker_flag: "-lstdc++"
- linker_flag: "-B/usr/bin/"
+ linker_flag: "-lm"
+ linker_flag: "-lrt"
+ linker_flag: "-B/opt/rh/devtoolset-4/root/usr/bin"
%{gcc_host_compiler_includes}
tool_path { name: "gcov" path: "/usr/bin/gcov" }
@@ -121,6 +123,8 @@ toolchain {
# Include directory for cuda headers.
cxx_builtin_include_directory: "%{cuda_include_path}"
+ cxx_builtin_include_directory: "/opt/rh/devtoolset-4/root/usr/lib"
+ cxx_builtin_include_directory: "/opt/rh/devtoolset-4/root/usr/include"
compilation_mode_flags {
mode: DBG
I hope this helps and thank you @gunan !
I will try to see if I can add this to either an FAQ in our docs, or incorporate the modifications through our bazel switches. I will keep the issue open until then.
Thanks for reporting this bug!
@Sinan81, I hope this saved you some time!
Looks like this problem is resolved. I suspect some of the problems we ran into here have been due to you not having full access to the system. I was able to test build on a centos docker container without any modifications. But I will try to see if I can reproduce your problems.
Thanks for patiently working through all the issues and documenting your steps here!
Hello, I have been trying for days to take advantage of the in one of the machines I have access to. Given that I have no root access I had to compile everything from source. I tried both the last stable release and the current master branch but I had no luck at running TensorFlow on the GPU.
My setup is the following : Red Hat EL 6.8 (no root access) Python 2.7.8 virtualenv 13.1.0 devtoolset-4 (GCC 5.3.1) Bazel 0.4.3 (built from source) GeForce GTX680 (compute capability 3.0) Cuda Toolkit 8.0 cuDNN 5.1
I had to modify few configuration files such that the configure script could complete successfully :
At the point in which I have to ask bazel to build TensorFlow I face a weird problem. If I use --config=cuda8.0 the building process completes but the gpu is never used nor detected.
If I use --config=cuda the building process fails with the following error
If I run it again I get a similar error
I have also tried different configurations of Cuda toolkit and cuDNN library, but those all led nowhere near the solution.