Closed trungnt13 closed 8 years ago
Since we depend on bazel, this sounds like a bazel issue.
Feel free to re-open if bazel ends up supporting 2.12 or lower, and we can see what we can do.
Am I right that you depend on bazel only at build-time? If this is true then it can be viewed as something you could do something about too... You could also release static-linked packages that would be very useful to people stuck on clusters with old libraries...
So did anyone find some way past this problem? I'm using redhat 6.4, as is my entire corporation. We're stuck on redhat 6.4. I'm not sure how to end up running tensorflow on such a machine...
I managed to have it running on a CentOS 6.7 : http://stackoverflow.com/a/34897674/1990516 :) Tell me if it works for you.
Edit: I proposed an alternative solution also: http://stackoverflow.com/a/34900471/1990516
Thanks man! I'll look into it as soon as I can.
Sent from my IPhone
On Jan 20, 2016, at 2:41 AM, Théo Trouillon notifications@github.com wrote:
I managed to have it running on a CentOS 6.7 : http://stackoverflow.com/a/34897674/1990516 :) Tell me if it works for you
— Reply to this email directly or view it on GitHub.
Could you let me know if this worked? I can't seem to get any of these other solutions working.
Since @ttrouill only says he got it working on 6.7 so I didn't check whether this works on 6.4 actually...
Both solutions seem to work, but they're not optimal. TensorFlow and Python seem to run okay, but if I try and run IPython, then with the first solution I get an Invalid ELF error, and with the second solution there is a memory leak and IPython continues to absorb all memory with time. I believe that this can also happen with other Python imports that rely on libraries that were compiled using the older libc.
I'd love to see a straightforward how-to-compile-bazel-with-old-glibc guide, but I haven't come across one yet.
Also https://github.com/bazelbuild/bazel/issues/760 is relevant, but it's far from straightforward and my attempt to build bazel using this guide failed. Hopefully within the next few weeks I can give it some more time and continue that thread with the errors I end up getting.
Compiling on CentOS still isn't all that straightforward, but I figured I'd give an overview here for now. This works for me with CentOS 6.7
and gcc 4.8.2
, with GPU support (Cuda 7.0, cuDNN 4.0.7). A bazel
modification for building with a custom gcc
is in the works (https://github.com/bazelbuild/bazel/issues/760) and should help streamline this later on.
The instructions here are specific to my base gcc
path of /cm/shared/apps/gcc/4.8.2
, but it should work for other configurations just by modifying the base path.
Paths for reference:
gcc path
: /cm/shared/apps/gcc/4.8.2/bin/gcc
cpp path
: /cm/shared/apps/gcc/4.8.2/bin/cpp
lib64 path
: /cm/shared/apps/gcc/4.8.2/lib64
include1 dir
: /cm/shared/apps/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/include
include2 dir
: /cm/shared/apps/gcc/4.8.2/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/include-fixed
include3 dir
: /cm/shared/apps/gcc/4.8.2/include/c++/4.8.2
git clone https://github.com/bazelbuild/bazel.git && cd bazel
tools/cpp/CROSSTOOL
/usr/bin/gcc
with gcc path
/usr/bin/cpp
with cpp path
gcc path
, add the lines
lib64 path
"include1 dir
"include2 dir
"include3 dir
"scripts/bootstrap/buildenv.sh
atexit "rm -fr ${DIR}"
export EXTRA_BAZEL_ARGS='-s --verbose_failures --ignore_unsupported_sandboxing --genrule_strategy=standalone --spawn_strategy=standalone --jobs 8'
./compile.sh
git clone --recurse-submodules https://github.com/tensorflow/tensorflow && cd tensorflow
third_party/gpus/crosstool/CROSSTOOL
, making the same changes we made for Bazel. (/usr/bin/gcc
etc. likely won't need to be replaced, though.)third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
/usr/bin/gcc
with gcc path
.as
by commenting out the line cmd = 'PATH=' + PREFIX_DIR + ' ' + cmd
. (For me, this is necessary to find as
.)./configure
export EXTRA_BAZEL_ARGS='-s --verbose_failures --ignore_unsupported_sandboxing --genrule_strategy=standalone --spawn_strategy=standalone --jobs 8'
bazel build -c opt --config=cuda --linkopt '-lrt' --copt="-DGPR_BACKWARDS_COMPATIBILITY_MODE" --conlyopt="-std=c99" //tensorflow/tools/pip_package:build_pip_package
libc
, we'll get an error about secure_getenv
.bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow_pkg
pip install ~/tensorflow_pkg/*
Update: Previous process was for a commit after release 7.
Here are necessary changes for commit 1d4fd06, which is after release 8:
./compile.sh
. Thank you @damienmg !CROSSTOOL
etc. (For some reason the bazel auto config doesn't work here.)third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
and replace #!/usr/bin/env python2.7
with
#!/usr/bin/env /full/path/to/python2.7
. This is a hack to avoid bazel's confined environment from failing to pick up our custom Python location.bazel-out/host/bin/tensorflow/swig
and add
export LD_LIBRARY_PATH=custom:paths:$LD_LIBRARY_PATH
before swig
is run. Otherwise swig
won't find libraries that exist in our LD_LIBRARY_PATH
. This is another hack to get around the confined environment.bazel build
command from above: bazel build -c opt --config=cuda --linkopt '-lrt' --copt="-DGPR_BACKWARDS_COMPATIBILITY_MODE" --conlyopt="-std=c99" //tensorflow/tools/pip_package:build_pip_package
cd bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles
and cp -r __main__/* .
. This is a hack associated with https://github.com/tensorflow/tensorflow/issues/2040.bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tensorflow_pkg
, andpip install ~/tensorflow_pkg/*
Our administrator managed to run pip installed tensorflow package on RHEL 6.7 server (without building bazel and tensorflow source), the core idea is get separated newer version of GLIBC version:
Fast test:
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
a = tf.constant(10)
b = tf.constant(32)
print(sess.run(a + b))
Note: this approach is only for running python scripts, remember that, every time you add $libcroot to your path all the shell commands are corrupted (i.e you cannot use ls, cd ...). You might use bash -l, or screen, or byobu before you try this so you don't mess up your own session.
Yeah that was described here a while back, but as you mention, it's not ideal. For example if you run Jupyter it'll lead to a memory leak / crash (at least on the system I tried it with).
@rdipietro
Edit tools/cpp/CROSSTOOL After the toolpath containing gcc path, add the lines linker_flag: "-Wl,-Rlib64 path" cxx_builtin_include_directory: "include1 dir" cxx_builtin_include_directory: "include2 dir" cxx_builtin_include_directory: "include3 dir"
Should these lines be added after every occurence of the toolpath containing gcc path- i.e. twice wherever i changed the usr/bin/gcc ?
I don't know what you mean by twice. I'm pretty sure I only inserted those lines once, although if you were to insert them in multiple places it probably wouldn't do any harm.
@kskp @rdipietro : is that still needed with latest version of Bazel? If yes then we have an issue in the C++ detection code.
Bazel compiles out of the box as long as I set CC
correctly. I haven't tried with TensorFlow 0.9, but as of 0.8, I still had to make manual changes on CentOS.
You mean change to the cuda crosstool file?
On Fri, Jun 24, 2016 at 2:30 PM Robert DiPietro notifications@github.com wrote:
Bazel compiles out of the box as long as I set CC correctly. I haven't tried with TensorFlow 0.9, but as of 0.8, I still had to make manual changes on CentOS.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/110#issuecomment-228333271, or mute the thread https://github.com/notifications/unsubscribe/ADjHf_Ij539IWtrDlTebMajjTTI87GSBks5qO83SgaJpZM4Gf6Qp .
Yes. My May 17 comment above includes everything I needed to do. Specifically, needed to edit CROSSTOOL and needed to introduce two hacks to get bazel to find things outside of its isolated environment.
@rdipietro Thanks for your reply. Sorry for my ignorance, but could you please tell me what toolpath is? I am assuming it is the block of code where the gcc path had to be changed. I did that twice in the entire file (Since it said to replace all occurences of /usr/bin/gcc). So do I have to add those lines after the block of code where I changed the /usr/bin/gcc path??
@rdipietro @damienmg I am not using the latest version of Bazel. I need the 0.2.2b version. I ultimately have to run Syntaxnet on Cent OS 6.7.
0.2.2b should work too.
On Fri, Jun 24, 2016 at 2:55 PM kskp notifications@github.com wrote:
@rdipietro https://github.com/rdipietro @damienmg https://github.com/damienmg I am not using the latest version of Bazel. I need the 0.2.2b version. I ultimately have to run Syntaxnet on Cent OS 6.7.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/110#issuecomment-228337683, or mute the thread https://github.com/notifications/unsubscribe/ADjHf4sjm971bfucsyRzcsZk_rgAUo8qks5qO9ObgaJpZM4Gf6Qp .
Oh, I tried a couple of weeks ago but it did not work. Will do it again today. Thanks for your reply.
note that you still have to do the CUDA CROSSTOOL modification for doing it with --config cuda
Oops, I am not configuring it with CUDA support. Is it a must?
You need to update tensorflow's CROSSTOOL for CUDA support. @davidzchen is making the change to TF to have the same support but it has not yet landed.
On Fri, Jun 24, 2016 at 3:12 PM kskp notifications@github.com wrote:
Oops, I am not configuring it with CUDA support. Is it a must?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/110#issuecomment-228341016, or mute the thread https://github.com/notifications/unsubscribe/ADjHf4akIOCd-PCi8YNs-P7aoopVOUV2ks5qO9ejgaJpZM4Gf6Qp .
FYI Here is the tracking bug for CUDA autoconfiguration: #2873.
It is partially working, but I still need to fix the remaining path issues, such as getting the Python SWIG wrapper to find the tensorflow library correctly.
@damienmg @rdipietro Bazel still does not compile.
Just for your information, my system info:
[sree@ds1 bazel]$ gcc -v gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC)
[sree@ds1 bazel]$ ldd --version ldd (GNU libc) 2.12
[sree@ds1 bazel]$ which gcc /usr/bin/gcc
[sree@ds1 bazel]$ g++ -v gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC)
[sree@ds1 bazel]$ which g++ /usr/bin/g++
To build bazel, I do the following:
./compile.sh gives; [sree@ds1 bazel]$ ./compile.sh INFO: You can skip this first step by providing a path to the bazel binary as second argument: INFO: ./compile.sh compile /path/to/bazel 🍃 Building Bazel from scratch...... 🍃 Building Bazel with Bazel. INFO: Found 1 target... ERROR: /home/sree/bazel/src/main/cpp/util/BUILD:24:1: C++ compilation of rule '//src/main/cpp/util:md5' failed: gcc failed: error executing command (cd /tmp/bazel.NO5ObMNe/out/bazel && \ exec env - \ PATH=/home/sree/anaconda2/bin:/home/sree/bazel:/opt/jdk1.8.0_91/bin:/opt/jdk1.8.0_91/jre/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/sree/bin \ /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++0x' -iquote . -iquote bazel-out/local-fastbuild/genfiles -iquote external/bazel_tools -iquote bazel-out/local-fastbuild/genfiles/external/bazel_tools -isystem external/bazel_tools/tools/cpp/gcc3 -no-canonical-prefixes -fno-canonical-system-headers -Wno-builtin-macro-redefined '-DDATE="redacted"' '-DTIMESTAMP="redacted"' '-DTIME="redacted"' '-frandom-seed=bazel-out/local-fastbuild/bin/src/main/cpp/util/_objs/md5/src/main/cpp/util/md5.pic.o' -MD -MF bazel-out/local-fastbuild/bin/src/main/cpp/util/_objs/md5/src/main/cpp/util/md5.pic.d -fPIC -c src/main/cpp/util/md5.cc -o bazel-out/local-fastbuild/bin/src/main/cpp/util/_objs/md5/src/main/cpp/util/md5.pic.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1. gcc: error trying to exec 'cc1plus': execvp: No such file or directory Target //src:bazel failed to build INFO: Elapsed time: 3.147s, Critical Path: 0.07s
Building output/bazel
Am I even doing it right? I did not make any changes to tools/cpp/CROSSTOOL file.
What does echo | gcc -E -xc++ - -v
returns?
@damienmg
Using built-in specs. COLLECT_GCC=gcc Target: x86_64-redhat-linux Configured with: ../configure --prefix=/opt/rh/devtoolset-2/root/usr --mandir=/opt/rh/devtoolset-2/root/usr/share/man --infodir=/opt/rh/devtoolset-2/root/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,fortran,lto --enable-plugin --with-linker-hash-style=gnu --enable-initfini-array --disable-libgcj --with-isl=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/isl-install --with-cloog=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/cloog-install --with-mpc=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/mpc-install --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC) COLLECT_GCC_OPTIONS='-E' '-v' '-mtune=generic' '-march=x86-64' cc1plus -E -quiet -v -iprefix /usr/bin/../lib/gcc/x86_64-redhat-linux/4.8.2/ -D_GNU_SOURCE - -mtune=generic -march=x86-64 gcc: error trying to exec 'cc1plus': execvp: No such file or directory
Also, I installed gcc 4.8.2 using the instructions given at: http://superuser.com/questions/381160/how-to-install-gcc-4-7-x-4-8-x-on-centos. And since nothing happened, I did the following:
sudo mv /usr/bin/gcc /usr/bin/gcc.bak sudo cp /opt/rh/devtoolset-2/root/usr/bin/gcc /usr/bin/gcc sudo mv /usr/bin/g++ /usr/bin/g++.bak sudo cp /opt/rh/devtoolset-2/root/usr/bin/g++ /usr/bin/g++
export CC=/opt/rh/devtoolset-2/root/usr/bin/gcc
./compile.sh
should work (at least it works in our integration test).
I believe the cp made gcc a bit confused.
Thanks, Now I have different errors:
[sree@ds1 bazel]$ ./compile.sh INFO: You can skip this first step by providing a path to the bazel binary as second argument: INFO: ./compile.sh compile /path/to/bazel 🍃 Building Bazel from scratch...... 🍃 Building Bazel with Bazel. INFO: Found 1 target... ERROR: /home/sree/bazel/src/main/tools/BUILD:3:1: C++ compilation of rule '//src/main/tools:network-tools' failed: gcc failed: error executing command (cd /tmp/bazel.7v8MzbLT/out/bazel && \ exec env - \ PATH=/home/sree/anaconda2/bin:/home/sree/bazel:/opt/jdk1.8.0_91/bin:/opt/jdk1.8.0_91/jre/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/sree/bin \ /opt/rh/devtoolset-2/root/usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -B/opt/rh/devtoolset-2/root/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -iquote . -iquote bazel-out/local-fastbuild/genfiles -iquote external/bazel_tools -iquote bazel-out/local-fastbuild/genfiles/external/bazel_tools -isystem external/bazel_tools/tools/cpp/gcc3 '-std=c99' -no-canonical-prefixes -fno-canonical-system-headers -Wno-builtin-macro-redefined '-DDATE="redacted"' '-DTIMESTAMP="redacted"' '-DTIME="redacted"' '-frandom-seed=bazel-out/local-fastbuild/bin/src/main/tools/_objs/network-tools/src/main/tools/network-tools.pic.o' -MD -MF bazel-out/local-fastbuild/bin/src/main/tools/_objs/network-tools/src/main/tools/network-tools.pic.d -fPIC -c src/main/tools/network-tools.c -o bazel-out/local-fastbuild/bin/src/main/tools/_objs/network-tools/src/main/tools/network-tools.pic.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1. cc1: error: unrecognized command line option '-quiet' cc1: error: bazel-out/local-fastbuild/bin/src/main/tools/_objs/network-tools/src/main/tools/network-tools.pic.d: No such file or directory cc1: error: unrecognized command line option '-quiet' cc1: error: unrecognized command line option '-auxbase-strip bazel-out/local-fastbuild/bin/src/main/tools/_objs/network-tools/src/main/tools/network-tools.pic.o' Target //src:bazel failed to build INFO: Elapsed time: 3.917s, Critical Path: 0.31s
What does echo | /opt/rh/devtoolset-2/root/usr/bin/gcc -E -xc++ - -v
says?
It seems like your compiler doesn't like your own installation. Can you try to restore /usr/bin/gcc and /usr/bin/g++ to the default value?
[sree@ds1 ~]$ echo | /opt/rh/devtoolset-2/root/usr/bin/gcc -E -xc++ - -v Using built-in specs. COLLECT_GCC=/opt/rh/devtoolset-2/root/usr/bin/gcc Target: x86_64-redhat-linux Configured with: ../configure --prefix=/opt/rh/devtoolset-2/root/usr --mandir=/opt/rh/devtoolset-2/root/usr/share/man --infodir=/opt/rh/devtoolset-2/root/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,fortran,lto --enable-plugin --with-linker-hash-style=gnu --enable-initfini-array --disable-libgcj --with-isl=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/isl-install --with-cloog=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/cloog-install --with-mpc=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/mpc-install --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC) COLLECT_GCC_OPTIONS='-E' '-v' '-mtune=generic' '-march=x86-64' /opt/rh/devtoolset-2/root/usr/libexec/gcc/x86_64-redhat-linux/4.8.2/cc1plus -E -quiet -v -D_GNU_SOURCE - -mtune=generic -march=x86-64 ignoring nonexistent directory "/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/include-fixed" ignoring nonexistent directory "/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../x86_64-redhat-linux/include"
/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../include/c++/4.8.2 /opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../include/c++/4.8.2/x86_64-redhat-linux /opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../include/c++/4.8.2/backward /opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/include /usr/local/include /opt/rh/devtoolset-2/root/usr/include /usr/include End of search list.
COMPILER_PATH=/opt/rh/devtoolset-2/root/usr/libexec/gcc/x86_64-redhat-linux/4.8.2/:/opt/rh/devtoolset-2/root/usr/libexec/gcc/x86_64-redhat-linux/4.8.2/:/opt/rh/devtoolset-2/root/usr/libexec/gcc/x86_64-redhat-linux/:/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/:/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/ LIBRARY_PATH=/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/:/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/opt/rh/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/../../../:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-E' '-v' '-mtune=generic' '-march=x86-64'
Seems what you said is right. I will restrore both the files to default values.
My which gcc says: /usr/bin/gcc But echo $CC says: /opt/rh/devtoolset-2/root/usr/bin/gcc
And hence even after restoring older gcc, I still get gcc version as 4.8.2.
Did I ruin everything? I was super nervous that I might break the core by making changes to gcc on centos 6.
Is there a way I can rollback all the changes or can you point me to where I can get a good gcc latest version?
gcc -v
still says 4.8.2?
What does ./compile.sh
result in now?
gcc -v is still 4.8.2
./compile.sh still results in an error:
[sree@ds1 bazel]$ ./compile.sh INFO: You can skip this first step by providing a path to the bazel binary as second argument: INFO: ./compile.sh compile /path/to/bazel 🍃 Building Bazel from scratch...... 🍃 Building Bazel with Bazel. INFO: Found 1 target... ERROR: /home/sree/bazel/src/main/cpp/BUILD:53:1: C++ compilation of rule '//src/main/cpp:blaze_abrupt_exit' failed: gcc failed: error executing command (cd /tmp/bazel.HegZ1Mxo/out/bazel && \ exec env - \ PATH=/home/sree/anaconda2/bin:/home/sree/bazel:/opt/jdk1.8.0_91/bin:/opt/jdk1.8.0_91/jre/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/sree/bin \ /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wl,-z,-relro,-z,now -B/usr/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++0x' -iquote . -iquote bazel-out/local-fastbuild/genfiles -iquote external/bazel_tools -iquote bazel-out/local-fastbuild/genfiles/external/bazel_tools -isystem external/bazel_tools/tools/cpp/gcc3 -no-canonical-prefixes -fno-canonical-system-headers -Wno-builtin-macro-redefined '-DDATE="redacted"' '-DTIMESTAMP="redacted"' '-DTIME="redacted"' '-frandom-seed=bazel-out/local-fastbuild/bin/src/main/cpp/_objs/blaze_abrupt_exit/src/main/cpp/blaze_abrupt_exit.pic.o' -MD -MF bazel-out/local-fastbuild/bin/src/main/cpp/_objs/blaze_abrupt_exit/src/main/cpp/blaze_abrupt_exit.pic.d -fPIC -c src/main/cpp/blaze_abrupt_exit.cc -o bazel-out/local-fastbuild/bin/src/main/cpp/_objs/blaze_abrupt_exit/src/main/cpp/blaze_abrupt_exit.pic.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1. gcc: error trying to exec 'cc1plus': execvp: No such file or directory Target //src:bazel failed to build INFO: Elapsed time: 3.592s, Critical Path: 0.12s
Building output/bazel
Tensorflow is built successfully on CPU, however, it is failed on GPU.
I keep getting this error, even though I modified all path in CROSSTOOL and crosstool_wrapper... from /usr/bin to my gcc path
ERROR: /homeappl/home/trungnt/.cache/bazel/_bazel_trungnt/07601e513c2336fd42387644d3f95e2b/external/protobuf/BUILD:331:1: Linking of rule '@protobuf//:protoc' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command
(cd /homeappl/home/trungnt/.cache/bazel/_bazel_trungnt/07601e513c2336fd42387644d3f95e2b/execroot/tensorflow && \
exec env - \
third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -o bazel-out/host/bin/external/protobuf/protoc bazel-out/host/bin/external/protobuf/_objs/protoc/external/protobuf/src/google/protobuf/compiler/main.o bazel-out/host/bin/external/protobuf/libprotoc_lib.a bazel-out/host/bin/external/protobuf/libprotobuf.a bazel-out/host/bin/external/protobuf/libprotobuf_lite.a -lpthread -lstdc++ -B/appl/opt/gcc/4.9.1/bin/ -pie -Wl,-z,relro,-z,now -no-canonical-prefixes -pass-exit-codes '-Wl,--build-id=md5' '-Wl,--hash-style=gnu' -Wl,-S -Wl,--gc-sections): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
collect2: fatal error: cannot find 'ld'
compilation terminated.
Target //tensorflow/cc:tutorials_example_trainer failed to build
INFO: Elapsed time: 71.231s, Critical Path: 56.80s
Hello,
@rdipietro : I am trying to install tensorflow/0.9.0 on a cluster running CentOS 6.7. I have bazel installed already. Here is the error I am getting.
ERROR: /gpfs_home/mdave/.cache/bazel/_bazel_mdave/541ff47a1a214f62e91d090e1e816e43/external/highwayhash/BUILD:17:1: C++ compilation of rule '@highwayhash//:sip_hash' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object ... (remaining 36 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 127. /gpfs/runtime/opt/python/2.7.3/bin/python2.7: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory Target //tensorflow/tools/pip_package:build_pip_package failed to build
I suppose the fix for this, as mentioned by you in the step-wise directions is:
bazel-out/host/bin/tensorflow/swig
and add export LD_LIBRARY_PATH=custom:paths:$LD_LIBRARY_PATH
before swig
is run. Otherwise swig
won't find libraries that exist in our LD_LIBRARY_PATH
. This is another hack to get around the confined environment.This should add the python library path while setting up the build but I do not seem to find a file such as bazel-out/host/bin/tensorflow/swig
in the source tree, while the bazel-out/host/bin/tensorflow
directory does exist. If I create a file named swig
myself and add the command to export the paths, it still does not work. Any ideas? I have followed all other steps as mentioned.
Thank you for the help. Your responses here have already been very helpful. :)
Hi @mukul1992
Sorry, I'm still working with 0.8, so haven't battled with the 0.9 changes yet.
Here is a suggestion:
Use --verbose_failures
with bazel, so that error messages aren't truncated. Then sift through the failure to find out which script ends up causing the issue. Then try putting export LD_LIBRARY_PATH=your:custom:paths:$LD_LIBRARY_PATH
at the top of that file.
Hopefully that might help. I don't think I'll have the time to get around to compiling 0.9 for a while. If that doesn't work, I suggest shooting back to 0.8 for now (assuming you don't need something that's cutting edge?).
Hi @rdipietro , thanks for replying.
So, I switched back to 0.8. I am now using Bazel 0.3.0 (any previous version which would work better?). Here is the output. I am just including the ERROR part which is in Bold. Again, I did complete other steps. I cannot figure out where to add the LD_LIBRARY_PATH thing so that it picks up the libpython library.
[mdave@login001 tensorflow]$ bazel build -c opt --config=cuda --linkopt '-lrt' --copt="-DGPR_BACKWARDS_COMPATIBILITY_MODE" --conlyopt="-std=c99" //tensorflow/tools/pip_package:build_pip_package -s --verbose_failures --ignore_unsupported_sandboxing --genrule_strategy=standalone --spawn_strategy=standalone Warning: ignoring LD_PRELOAD in environment.
INFO: Found 1 target...
.(cd /gpfs_home/mdave/.cache/bazel/_bazel_mdave/c9818020e0087a4155dff2f5c73aa150/execroot/tensorflow && \ exec env - \ PATH=/gpfs/runtime/opt/git/2.2.1/bin:/gpfs/runtime/opt/gcc/4.9.2/bin:/gpfs/runtime/opt/java/8u66/bin:/gpfs/runtime/opt/bazel/0.3.0/bin:/gpfs/runtime/opt/matlab/R2014a/bin:/gpfs/runtime/opt/perl/5.18.2/bin:/gpfs/runtime/opt/python/2.7.3/bin:/gpfs/runtime/opt/intel/2013.1.106/bin:/gpfs/runtime/opt/centos-updates/6.3/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/gpfs/runtime/bin:/users/mdave/bin \ third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -g0 '-std=c++11' '-frandom-seed=bazel-out/host/bin/external/re2/_objs/re2/external/re2/re2/compile.o' -iquote external/re2 -iquote bazel-out/host/genfiles/external/re2 -iquote external/bazel_tools -iquote bazel-out/host/genfiles/external/bazel_tools -isystem external/re2 -isystem bazel-out/host/genfiles/external/re2 -isystem external/bazel_tools/tools/cpp/gcc3 -no-canonical-prefixes -Wno-builtin-macro-redefined '-DDATE="redacted"' '-DTIMESTAMP="redacted"' '-DTIME="redacted"' -fno-canonical-system-headers -MD -MF bazel-out/host/bin/external/re2/_objs/re2/external/re2/re2/compile.d -c external/re2/re2/compile.cc -o bazel-out/host/bin/external/re2/_objs/re2/external/re2/re2/compile.o)
ERROR: /gpfs_home/mdave/.cache/bazel/_bazel_mdave/c9818020e0087a4155dff2f5c73aa150/external/re2/BUILD:9:1: C++ compilation of rule '@re2//:re2' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command (cd /gpfs_home/mdave/.cache/bazel/_bazel_mdave/c9818020e0087a4155dff2f5c73aa150/execroot/tensorflow && \ exec env - \ PATH=/gpfs/runtime/opt/git/2.2.1/bin:/gpfs/runtime/opt/gcc/4.9.2/bin:/gpfs/runtime/opt/java/8u66/bin:/gpfs/runtime/opt/bazel/0.3.0/bin:/gpfs/runtime/opt/matlab/R2014a/bin:/gpfs/runtime/opt/perl/5.18.2/bin:/gpfs/runtime/opt/python/2.7.3/bin:/gpfs/runtime/opt/intel/2013.1.106/bin:/gpfs/runtime/opt/centos-updates/6.3/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/ibutils/bin:/gpfs/runtime/bin:/users/mdave/bin \ third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -g0 '-std=c++11' '-frandom-seed=bazel-out/host/bin/external/re2/_objs/re2/external/re2/re2/compile.o' -iquote external/re2 -iquote bazel-out/host/genfiles/external/re2 -iquote external/bazel_tools -iquote bazel-out/host/genfiles/external/bazel_tools -isystem external/re2 -isystem bazel-out/host/genfiles/external/re2 -isystem external/bazel_tools/tools/cpp/gcc3 -no-canonical-prefixes -Wno-builtin-macro-redefined '-DDATE="redacted"' '-DTIMESTAMP="redacted"' '-DTIME="redacted"' -fno-canonical-system-headers -MD -MF bazel-out/host/bin/external/re2/_objs/re2/external/re2/re2/compile.d -c external/re2/re2/compile.cc -o bazel-out/host/bin/external/re2/_objs/re2/external/re2/re2/compile.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 127. /gpfs/runtime/opt/python/2.7.3/bin/python2.7: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory
Target //tensorflow/tools/pip_package:build_pip_package failed to build INFO: Elapsed time: 13.883s, Critical Path: 5.22s
@rdipietro Hi, I have tried everything you gave here- changed the CROSSTOOL files and everything but it does not work. I started fresh again and believe I have bazel working. Can you please look at my description here and suggest something. Thanks a lot!
I really don't know what to suggest. Other than perhaps asking TensorFlow to build binaries for CentOS 6.7. I think this would save a lot of people a lot of trouble and would repeatedly save all this trouble each new release, but I don't know if they're willing to do it.
On Thu, Jul 21, 2016 at 11:21 AM, kskp notifications@github.com wrote:
@rdipietro https://github.com/rdipietro Hi, I have tried everything you gave here- changed the CROSSTOOL files and everything but it does not work. I started fresh again and believe I have bazel working. Can you please look at my description here https://github.com/tensorflow/models/issues/276 and suggest something. Thanks a lot!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/110#issuecomment-234287228, or mute the thread https://github.com/notifications/unsubscribe-auth/AE6XX5jGX-7ZS0arN1p7eyvJNSGB4QLjks5qX46DgaJpZM4Gf6Qp .
@rdipietro Sorry, but didn't you mention you had tensorflow running on centos 6.7 and gcc 4.8.2? Were you able to run Syntaxnet also? I am stuck with the Centos 6.6 cluster and need to get Syntaxnet running on this. It works fine on Centos 7. :(
@kskp I created a Dockerfile that compiles TensorFlow 0.9 CPU for CentOS 6, I tested in CentOS 6 and RedHat EL 6.5. You can use a standalone machine to generate the TensorFlow Package and test in you site. (your standalone machine will need to have Docker, I tested in linux and macOS with Docker for Mac installed)
https://github.com/cirocavani/tensorflow-poc/tree/master/tensorflow_centos6
(main.sh is the procedure script)
I did also an installer for TensorFlow with miniconda2 to run in Red Hat 6.5 without any pre-requirement software.
https://github.com/cirocavani/tensorflow-poc/tree/master/tensorflow_installer
(main.sh is the procedure script)
This procedure creates the installer file tensorflow.sh
with Miniconda2, TensorFlow 0.9, deps and python program (executing this files will install Miniconda, install TensorFlow and run the training script).
My main case is to run TensorFlow in Hadoop (Red Hat EL 6.5), there is another POC for this:
https://github.com/cirocavani/tensorflow-poc/tree/master/yarn_training
With this setup, I am running the TF Learn's Wide and Deep Example in Hadoop.
I have succeeded in compiling a GPU, Python 3.5 version of TensorFlow 0.10.0 on a CentOS 6 Docker, and it ran well on our university's CentOS 6 cluster. Check https://github.com/leelabcnbc/DevOps/tree/master/Docker/tensorflow/0.10.0/centos6/py35. Basically, it's replacing some hardcoded lines in CROSSTOOL-related items, and adding -lm
to everything to prevent errors like #2291. I think Google can make compiling TensorFlow on CentOS less frustrating, if they make some hardcoded stuff link to correct locations.
I've just managed to build tensorflow 0.12rc0 on CentOS6.5, which only had gcc-4.4.7 compiler by default, without having root privileges. (At least, it's successfully passing most simple tests, like this one).
In short, I had to:
Build newer gcc, hardcoding paths to as
,ld
and nm
(a workaround for gcc: error trying to exec 'as': execvp: No such file or directory
)
Since I've used gcc, installed to my own $HOME
, I had to explicitly specify correct linker library directories here (a workaround for version 'GLIBCXX_3.4.20' not found (required by bazel-out/host/bin/external/protobuf/protoc)
)
Add -lrt
and -lm
linker flags to the same place (just like suggested by @zym1010)
I built the latest Tensorflow (github master branch) with GPU support on a supercomputing center (CentOS 6.7 with gcc 4.9.2/Generally with a customized cc tool chain). I pointed out some of environment variables settings that are necessary for a success built. Just to document here for future reference:
Thanks @rdipietro ! I have been able to successfully install r0.12 with Bazel 0.4.3 on a cluster. Some of your suggestions needed to be modified to cater to the changes in the new version of TF and Bazel. But, your suggestions provided a solid starting point. When I get the time, I will write up the changes that I had to make.
Many clusters system using module with Redhat or Centos < 7 which is glibc 2.12
Since, bazel requires glibc 2.14 and the prebuilt version for linux requires glibc 2.17. It is hopeless to make tensorflow run on clusters.
Referred to this issue reported on bazel: https://github.com/bazelbuild/bazel/issues/583