snipsco / tensorflow-build

A set of scripts to (cross-)build the Tensorflow C lib for various architectures / OS
MIT License
178 stars 92 forks source link

Linking problem #20

Open ddresser opened 6 years ago

ddresser commented 6 years ago

Hello, I am trying to cross compile for armv7 using a custom toolchain. I am using bazel 0.8.1 and trying to compile tensorflow HEAD. (30b64a8d78b32db8f30957294efc9cac902b9fd3)

I made a small change to 'tf-crosscompile.patch' to make it patch successfully and used the following command ./cross-compile.sh /home/ddresser/.gradle/var/idexx/compilers/acadia arm-linux-gnueabihf HEAD

I also updated the cross-compile.sh script with '-march=armv7'

Here is the beginning of the build:

using gcc : /home/ddresser/.gradle/var/idexx/compilers/acadia/bin/arm-linux-gnueabihf-gcc version 6.3.1 Cloning into 'tensorflow'... remote: Counting objects: 281972, done. remote: Compressing objects: 100% (20/20), done. remote: Total 281972 (delta 7), reused 9 (delta 2), pack-reused 281950 Receiving objects: 100% (281972/281972), 139.08 MiB | 1.18 MiB/s, done. Resolving deltas: 100% (220740/220740), done. Checking connectivity... done. Your branch is up-to-date with 'origin/master'. Extracting Bazel installation... You have bazel 0.8.1 installed. Please specify the location of python. [Default is /usr/bin/python]:

Found possible Python library paths: /usr/local/lib/python2.7/dist-packages /usr/lib/python2.7/dist-packages Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages] Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: No CUDA support will be enabled for TensorFlow.

Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:

Add "--config=mkl" to your bazel command to build with MKL support. Please note that MKL on MacOS or windows is still not supported. If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds.

Configuration finished launching bazel with flags '' ........ Analyzing: target //tensorflow:libtensorflow.so (16 packages loaded)

Everything seems to compile fine, but I get this error when it tries to link:

ERROR: /home/ddresser/src/tensorflow-build/target/tensorflow/tensorflow/cc/BUILD:422:1: Linking of rule '//tensorflow/cc:ops/logging_ops_gen_cc' failed (Exit 1): gcc failed: error executing command (cd /home/ddresser/.cache/bazel/_bazel_ddresser/07218640e155d2003dbb6761ed58c2d0/execroot/org_tensorflow && \ exec env - \ PATH=/home/ddresser/bin:/home/ddresser/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \ PWD=/proc/self/cwd \ /usr/bin/gcc -o bazel-out/host/bin/tensorflow/cc/ops/logging_ops_gen_cc '-Wl,-rpath,$ORIGIN/../../../_solib_local/_U_S_Stensorflow_Scc_Cops_Slogging_Uops_Ugen_Ucc_Utensorflow' -Lbazel-out/host/bin/_solib_local/_U_S_Stensorflow_Scc_Cops_Slogging_Uops_Ugen_Ucc___Utensorflow '-Wl,-rpath,$ORIGIN/,-rpath,$ORIGIN/..,-rpath,$ORIGIN/../..' -pthread -B/usr/bin/ -Wl,-z,relro,-z,now -no-canonical-prefixes -pass-exit-codes '-Wl,--build-id=md5' '-Wl,--hash-style=gnu' -Wl,--gc-sections -Wl,-S -Wl,@bazel-out/host/bin/tensorflow/cc/ops/logging_ops_gen_cc-2.params) bazel-out/host/bin/tensorflow/cc/libcc_op_gen_main.a(cc_op_gen_main.o): In function main': cc_op_gen_main.cc:(.text.startup.main+0x45): undefined reference totensorflow::port::InitMain(char const, int, char**)' cc_op_gen_main.cc:(.text.startup.main+0x130): undefined reference to tensorflow::StringPiece::find(char, unsigned long) const' cc_op_gen_main.cc:(.text.startup.main+0x137): undefined reference totensorflow::StringPiece::npos' cc_op_gen_main.cc:(.text.startup.main+0x24b): undefined reference to tensorflow::OpList::OpList()' cc_op_gen_main.cc:(.text.startup.main+0x250): undefined reference totensorflow::OpRegistry::Global()' cc_op_gen_main.cc:(.text.startup.main+0x265): undefined reference to `tensorflow::OpRegistry::Export(bool, tensorflow::OpList) const' cc_op_gen_main.cc:(.text.startup.main+0x291): undefined reference to tensorflow::Env::Default()' cc_op_gen_main.cc:(.text.startup.main+0x496): undefined reference totensorflow::io::internal::JoinPathImplabi:cxx11' cc_op_gen_main.cc:(.text.startup.main+0x4dd): undefined reference to `tensorflow::Env::FileExists(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&)' cc_op_gen_main.cc:(.text.startup.main+0x52d): undefined reference to tensorflow::TfCheckOpHelperOutOfLine[abi:cxx11](tensorflow::Status const&, char const*)' cc_op_gen_main.cc:(.text.startup.main+0x575): undefined reference totensorflow::internal::LogMessageFatal::LogMessageFatal(char const, int)' cc_op_gen_main.cc:(.text.startup.main+0x595): undefined reference to tensorflow::internal::LogMessageFatal::~LogMessageFatal()' cc_op_gen_main.cc:(.text.startup.main+0x624): undefined reference totensorflow::OpList::~OpList()' bazel-out/host/bin/tensorflow/cc/libcc_op_gen_main.a(cc_op_gen.o): In function tensorflow::(anonymous namespace)::MakeComment(tensorflow::StringPiece, tensorflow::StringPiece)': cc_op_gen.cc:(.text._ZN10tensorflow12_GLOBAL__N_111MakeCommentENS_11StringPieceES1_+0xd7): undefined reference totensorflow::StringPiece::substr(unsigned long, unsigned long) const' cc_op_gen.cc:(.text._ZN10tensorflow12_GLOBAL__N_111MakeCommentENS11StringPieceES1+0x12a): undefined reference to `tensorflow::strings::StrAppend(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, tensorflow::strings::AlphaNum const&, tensorflow::strings::AlphaNum const&, tensorflow::strings::AlphaNum const&, tensorflow::strings::AlphaNum const&)' cc_op_gen.cc:(.text._ZN10tensorflow12_GLOBALN_111MakeCommentENS11StringPieceES1+0x1b8): undefined reference to tensorflow::strings::StrAppend(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, tensorflow::strings::AlphaNum const&, tensorflow::strings::AlphaNum const&)' bazel-out/host/bin/tensorflow/cc/libcc_op_gen_main.a(cc_op_gen.o): In functiontensorflow::(anonymous namespace)::PrintString(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&)': cc_op_gen.cc:(.text._ZN10tensorflow12_GLOBALN_111PrintStringERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x40): undefined reference to `tensorflow::str_util::CEscapeabi:cxx11' cc_op_gen.cc:(.text._ZN10tensorflow12_GLOBALN_111PrintStringERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x83): undefined reference to tensorflow::strings::StrCat[abi:cxx11](tensorflow::strings::AlphaNum const&, tensorflow::strings::AlphaNum const&, tensorflow::strings::AlphaNum const&)' bazel-out/host/bin/tensorflow/cc/libcc_op_gen_main.a(cc_op_gen.o): In functiontensorflow::(anonymous namespace)::OpInfo::GetConstructorDecl(tensorflow::StringPiece, bool) const': cc_op_gen.cc:(.text._ZNK10tensorflow12_GLOBALN_16OpInfo18GetConstructorDeclENS_11StringPieceEb+0x77): undefined reference to `tensorflow::strings::StrCat[abi:cxx11](tensorflow::strings::AlphaNum const&, tensorflow::strings::AlphaNum const&, tensorflow::strings::AlphaNum const&)' cc_op_gen.cc:(.text._ZNK10tensorflow12_GLOBALN_16OpInfo18GetConstructorDeclENS_11StringPieceEb+0x129): undefined reference to tensorflow::strings::StrAppend(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, tensorflow::strings::AlphaNum const&, tensorflow::strings::AlphaNum const&, tensorflow::strings::AlphaNum const&)' cc_op_gen.cc:(.text._ZNK10tensorflow12_GLOBAL__N_16OpInfo18GetConstructorDeclENS_11StringPieceEb+0x166): undefined reference totensorflow::strings::StrAppend(std::__cxx11::basic_string<char, std::char_traits, std::allocator >*, tensorflow::strings::AlphaNum const&)'

It is curious to me that it is using /usr/bin/gcc to try to link when I have specified another toolchain. I have confirmed that the .o files that are created are ARM binaries. for example:

file /home/ddresser/.cache/bazel/_bazel_ddresser/07218640e155d2003dbb6761ed58c2d0/execroot/org_tensorflow/bazel-out/armeabi-opt/bin/tensorflow/core/_objs/framework_internal_impl/tensorflow/core/util/tensor_format.pic.o /home/ddresser/.cache/bazel/_bazel_ddresser/07218640e155d2003dbb6761ed58c2d0/execroot/org_tensorflow/bazel-out/armeabi-opt/bin/tensorflow/core/_objs/framework_internal_impl/tensorflow/core/util/tensor_format.pic.o: ELF 32-bit LSB relocatable, ARM, EABI5 version 1 (SYSV), not stripped

However, the .so file I see in the build output seems to be for x86_64:

file /home/ddresser/.cache/bazel/_bazel_ddresser/07218640e155d2003dbb6761ed58c2d0/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/libtensorflow_framework.so

/home/ddresser/.cache/bazel/_bazel_ddresser/07218640e155d2003dbb6761ed58c2d0/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/libtensorflow_framework.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[md5/uuid]=ae5ad2a42b549582f28457f02a1d5932, not stripped

Any ideas on this error an next steps?

Thanks, Derek

fredszaq commented 6 years ago

Hello !

I didn't have time to update the scripts for the last versions of TF, so this is quite possibly linked to that. The last version I build this git hash 107cc777af7880c140d089e44ad898a6ba929286 which is basically 1.3.1 with some bazel fixes.

Could you try and build this version with you toolchain so that we know if the problem is related to the toolchain or to the patch / scripts ?

Regarding the use of /usr/bin/gcc this seems odd on that file indeed (but it is normal to use the build system gcc during the build as some of the programs build are executed further in the build) maybe the bazel configuration for the crosstool has changed ? last time I build I was using bazel 0.5.1

Do not hesitate to do a pull request if you manage to get it working !

MarcTreySonos commented 6 years ago

Hello Derek,

I have just build head using the provided script, there are some minors changes

in tensorflow/BUILD : set s3 support to false (was getting some undefined reference to aws)

define_values = {"with_s3_support": "false"}

make sure there is a define for RASPBERRY_PI in

tensorflow/core/platform/platform.h

this last part maybe be the root cause of your issue

ddresser commented 6 years ago

Thanks so much for your helpful responses. I have to admit I'm new to bazel and tensorflow so I'm not quite sure how to accomplish what you suggested.

I wasn't sure what 'config_setting' to add the 'define_values = {"with_s3_support": "false"}' to so I tried adding a new one:

config_setting( name = "without_s3_support", define_values = {"with_s3_support": "false"}, visibility = ["//visibility:public"], )

and referencing it on the bazel command line in cross-compile.sh with '--config=without_s3_support'

but got this warning.

43WARNING: Config values are not defined in any .rc file: without_s3_support

Any chance you can provide the patch that worked for you building HEAD?

Thanks again for your help. -Derek

MarcTreySonos commented 6 years ago

will make a clean patch this weekend , already out of time here :)

you can simply edit the tensorflow/workspace.bzl file and replace all with_s3_support with :

-    define_values = {"with_s3_support": "true"},
+    define_values = {"with_s3_support": "false"},
ddresser commented 6 years ago

Thanks a lot. A clean patch would be appreciated. I'll keep plugging away at it. The problem I seem to be running into is that some dependencies are being compiled with the cross compiler, and some are being compiled for x86_64 by /usr/bin/gcc. For example, in my build log, protobuf_archive is being built for x86_64:

SUBCOMMAND: # @protobuf_archive//:js_embed [action 'Compiling external/protobuf_archive/src/google/protobuf/compiler/js/embed.cc [for host]']
   (cd /home/ddresser/.cache/bazel/_bazel_ddresser/07218640e155d2003dbb6761ed58c2d0/execroot/org_tensorflow && \
     exec env - \
       PATH=/home/ddresser/.gradle/var/idexx/compilers/acadia/bin:/home/ddresser/bin:/home/ddresser/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \
       PWD=/proc/self/cwd \
     /usr/bin/gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -g0 -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK '-std=c++0x' -g0\
      -MD -MF bazel-out/host/bin/external/protobuf_archive/_objs/js_embed/external/protobuf_archive/src/google/protobuf/compiler/js/embed.d '-frandom-seed=bazel-out/host/bin/external/protobuf_archive/_objs/js_embed/external/protobuf_archive/src/google/protobuf/compiler/j\
     s/embed.o' -iquote external/protobuf_archive -iquote bazel-out/host/genfiles/external/protobuf_archive -iquote external/bazel_tools -iquote bazel-out/host/genfiles/external/bazel_tools -isystem external/bazel_tools/tools/cpp/gcc3 -no-canonical-prefixes -fno-canonica\
     l-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/protobuf_archive/src/google/protobuf/compiler/js/embed.cc -o bazel-out/host/bin/external/protobuf_archive/_objs/js_embed/external/p\
     rotobuf_archive/src/google/protobuf/compiler/js/embed.o)^M

...while other dependencies are being compiled for arm using the 'arm-linux-gnueabihf-gcc' compiler:

SUBCOMMAND: # @sqlite_archive//:sqlite [action 'Compiling external/sqlite_archive/sqlite3.c']
 (cd /home/ddresser/.cache/bazel/_bazel_ddresser/07218640e155d2003dbb6761ed58c2d0/execroot/org_tensorflow && \
   exec env - \
     PWD=/proc/self/cwd \
     PYTHON_BIN_PATH=/usr/bin/python \
     PYTHON_LIB_PATH=/usr/local/lib/python2.7/dist-packages \
     TF_NEED_CUDA=0 \
     TF_NEED_OPENCL_SYCL=0 \
   /home/ddresser/.gradle/var/idexx/compilers/acadia/bin/arm-linux-gnueabihf-gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK '-march=armv5' -DRASPBERRY_PI '-mfpu=\
     vfp' -funsafe-math-optimizations -ftree-vectorize -fomit-frame-pointer -MD -MF bazel-out/armeabi-opt/bin/external/sqlite_archive/_objs/sqlite/external/sqlite_archive/sqlite3.pic.d -fPIC -iquote external/sqlite_archive -iquote bazel-out/armeabi-opt/genfiles/external/\
     sqlite_archive -iquote external/bazel_tools -iquote bazel-out/armeabi-opt/genfiles/external/bazel_tools -isystem external/sqlite_archive -isystem bazel-out/armeabi-opt/genfiles/external/sqlite_archive -isystem external/bazel_tools/tools/cpp/gcc3 -Wno-builtin-macro-r\
     edefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -no-canonical-prefixes -fno-canonical-system-headers -c external/sqlite_archive/sqlite3.c -o bazel-out/armeabi-opt/bin/external/sqlite_archive/_objs/sqlite/external/sqlite_archive/\
     sqlite3.pic.o)^M
 SUBCOMMAND: # @lmdb//:lmdb [action 'Compiling external/lmdb/midl.c']
 (cd /home/ddresser/.cache/bazel/_bazel_ddresser/07218640e155d2003dbb6761ed58c2d0/execroot/org_tensorflow && \
   exec env - \
     PWD=/proc/self/cwd \
     PYTHON_BIN_PATH=/usr/bin/python \
     PYTHON_LIB_PATH=/usr/local/lib/python2.7/dist-packages \
     TF_NEED_CUDA=0 \
     TF_NEED_OPENCL_SYCL=0 \
   /home/ddresser/.gradle/var/idexx/compilers/acadia/bin/arm-linux-gnueabihf-gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK '-march=armv5' -DRASPBERRY_PI '-mfpu=\
     vfp' -funsafe-math-optimizations -ftree-vectorize -fomit-frame-pointer -MD -MF bazel-out/armeabi-opt/bin/external/lmdb/_objs/lmdb/external/lmdb/midl.pic.d -fPIC -iquote external/lmdb -iquote bazel-out/armeabi-opt/genfiles/external/lmdb -iquote external/bazel_tools -\
     iquote bazel-out/armeabi-opt/genfiles/external/bazel_tools -isystem external/bazel_tools/tools/cpp/gcc3 -w -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -no-canonical-prefixes -fno-canonical-system-headers \
     -c external/lmdb/midl.c -o bazel-out/armeabi-opt/bin/external/lmdb/_objs/lmdb/external/lmdb/midl.pic.o)^M

Not sure why.

Thanks for your help. -Derek

MarcTreySonos commented 6 years ago

building protobuf for the host is expected

ddresser commented 6 years ago

Thanks. I am having much more success with bazel 0.5.1. Previously I was using the Ubuntu default of 0.8.1. I have been able to compile libtensorflow.so with the 'arm-bcm2708' compiler and with my 'arm-linux-gnueabihf' compiler at the specified commit (107cc777af7880c140d089e44ad898a6ba929286) I'll try head next.

ddresser commented 6 years ago

It seems tensorflow HEAD has a check for bazel version >= 0.5.4.

I upgraded to bazel 0.5.4 and am still able to build 107cc777af7880c140d089e44ad898a6ba929286, but am seeing the original linking issues when trying to build head.

mtrey, I am curious what version of bazel you used to successfully build HEAD.

I appreciate the help. I'm learning a bunch about bazel in this process.

ddresser commented 6 years ago

Today I discovered that tensorflow cross compiles for the raspberry pi as part of their CI build.

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/ci_build/pi/build_raspberry_pi.sh

I was able to easily modify that script to use my compiler and build tensorflow HEAD. That satisfies my need for now. I have included my patch if it is helpful to anyone.

Thank you both very much for your support on this. -Derek

build_raspberry_pi.sh.patch.txt

AntoineWeber commented 5 years ago

Hi @ddresser, I know this post is very old but I'm encountering some problems cross-compiling tensorflow for raspberry-pi. I also want to use my own compiler, hence modified the build_raspberry_pi.sh script, but I encounter the problem

C Compiler (/opt/cross-pi-gcc/bin/arm-linux-gnueabihf-gcc) is something wrong.

Did you remember encountering such a problem ?