tensorflow for Nvidia TX1

jmtatsch commented 8 years ago

Hello,

@maxcuda has recently got tensorflow running on the tk1 as documented in blogpost http://cudamusing.blogspot.de/2015/11/building-tensorflow-for-jetson-tk1.html but since then been unable to repeatedly build it. I am now trying to get tensorflow running on a tx1 tegra platform and need some support.

Much trouble seems to come from Eigen variadic templates and using C++11 initializer lists, both of wich could work according to http://devblogs.nvidia.com/parallelforall/cplusplus-11-in-cuda-variadic-templates/. In theory std=c++11 should be set according to crosstool. Nevertheless, nvcc crashes happily on all of them. This smells as if the "-std=c++11" flag is not properly set. How can I verify/enforce this?

Also in tensorflow.bzl, variadic templates in Eigen are said to be disabled We have to disable variadic templates in Eigen for NVCC even though std=c++11 are enabled is that still necessary?

Here is my build workflow:

git clone —recurse-submodules git@github.com:jmtatsch/tensorflow.git
cd tensorflow
grep -Rl "lib64"| xargs sed -i 's/lib64/lib/g' # no lib64 for tx1 yet 
./configure
bazel build -c opt --local_resources 2048,0.5,1.0 --verbose_failures --config=cuda //tensorflow/cc:tutorials_example_trainer

bhack commented 8 years ago

See http://devblogs.nvidia.com/parallelforall/power-cpp11-cuda-7/

bhack commented 8 years ago

Are you using jetpack 2?

jmtatsch commented 8 years ago

No, JetPack does not support running directly on the L4T platform.

bhack commented 8 years ago

I meant if you have flashed the board with jetpack 2 to have cuda 7 support.

jmtatsch commented 8 years ago

Ah, yes I have Cuda 7 support and used jetpack 2. To be more precise, the target is not actually the Jetson TX1 but an repurposed Nvida Sield TV flashed to L4T 23.1 for Jetson.

vincentvanhoucke commented 8 years ago

@Yangqing FYI

benoitsteiner commented 8 years ago

I think there is a TX1 that I could use to take a look. I'll see what I can do.

robagar commented 8 years ago

In theory, can TensorFlow run usefully on the TK1? Or is the 2G memory too small for, say, face verification?

benoitsteiner commented 8 years ago

@robagar It all depends on how large your network is and whether you intend to train the model on TK1 or just run inference. Two GB of memory is plenty to run inference on almost any model.

benoitsteiner commented 8 years ago

I have worked around an issue that prevented nvcc from compiling the Eigen codebase on Tegra X1 (https://bitbucket.org/eigen/eigen/commits/d0950ac79c0404047379eb5a927a176dbb9d12a5). However, so far I haven't succeeded in setting up bazel on the Tegra X1, so I haven't been able to start working on the other issues reported in http://cudamusing.blogspot.de/2015/11/building-tensorflow-for-jetson-tk1.html

jmtatsch commented 8 years ago

That's good news ;) Whats the problem with bazel? maxcuda's instructions for building bazel worked quite well for me..

jmtatsch commented 8 years ago

For building bazel I had to use a special java build which can cope with the 32bit rootfs on a 64bit machine

wget http://www.java.net/download/jdk8u76/archive/b02/binaries/jdk-8u76-ea-bin-b02-linux-arm-vfp-hflt-04_jan_2016.tar.gz
sudo tar -zxvf jdk-8u76-ea-bin-b02-linux-arm-vfp-hflt-04_jan_2016.tar.gz -C /usr/lib/jvm
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.8.0_76/bin/java" 1
sudo update-alternatives --config java

There seems to be one eigen issue I can't get around:

bazel build -c opt --local_resources 2048,0.5,1.0 --verbose_failures --config=cuda //tensorflow/cc:tutorials_example_trainer
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.io/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
INFO: Found 1 target...
INFO: From Compiling tensorflow/core/kernels/cross_op_gpu.cu.cc:
At end of source: warning: routine is both "inline" and "noinline"

external/eigen_archive/eigen-eigen-c5e90d9e764e/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h(125): warning: routine is both "inline" and "noinline"

At end of source: warning: routine is both "inline" and "noinline"

external/eigen_archive/eigen-eigen-c5e90d9e764e/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h(125): warning: routine is both "inline" and "noinline"

./tensorflow/core/lib/strings/strcat.h(195): internal error: assertion failed at: "/dvs/p4/build/sw/rel/gpu_drv/r346/r346_00/drivers/compiler/edg/EDG_4.9/src/decl_inits.c", line 3251

1 catastrophic error detected in the compilation of "/tmp/tmpxft_0000682d_00000000-8_cross_op_gpu.cu.cpp4.ii".
Compilation aborted.
Aborted
ERROR: /opt/tensorflow/tensorflow/core/BUILD:331:1: output 'tensorflow/core/_objs/gpu_kernels/tensorflow/core/kernels/cross_op_gpu.cu.o' was not created.
ERROR: /opt/tensorflow/tensorflow/core/BUILD:331:1: not all outputs were created.
Target //tensorflow/cc:tutorials_example_trainer failed to build
INFO: Elapsed time: 2271.358s, Critical Path: 2260.25s

Can you have a look at TensorEvaluator.h please?

benoitsteiner commented 8 years ago

I still haven't been able to install bazel. That said, the assertion you're facing seems to be triggered by the variadic template at line 195 of ./tensorflow/core/lib/strings/strcat.h. I would just comment this code and see how it goes.

ggaabe commented 8 years ago

When you say maxcuda has "been unable to repeatedly build it" since then, does that mean that tensorflow is no longer working on the TK1 again? Because I just ordered the TK1 with the express purpose of being able to run tensorflow :-/

maxcuda commented 8 years ago

Yes, I have been unable to recompile the latest versions. The wheel I built around Thanksgiving should still work but it is quite an old version.

jmtatsch commented 8 years ago

Commenting the variadic template at line 195 helps a little but at line 234 there is a another template that seems to be required. Any hints how to rewrite that in nvcc friendly manner?

jmtatsch commented 8 years ago

@benoitsteiner any suggestions how this could be rewritten in a nvcc compatible manner?

// Support 5 or more arguments
template <typename... AV>
inline void StrAppend(string *dest, const AlphaNum &a, const AlphaNum &b,
                      const AlphaNum &c, const AlphaNum &d, const AlphaNum &e,
                      const AV &... args) {
  internal::AppendPieces(dest,
                         {a.Piece(), b.Piece(), c.Piece(), d.Piece(), e.Piece(),
                          static_cast<const AlphaNum &>(args).Piece()...});
}

martinwicke commented 8 years ago

@damienmg FYI

jas0n1ee commented 8 years ago

Hi folks, I'm also working on building everything from scratch on tx1. There is lots of discussions here and also on nvidia developer forums. But by now I haven't seen any well summarized instruction besides that tk1's. Can we start another repo or script file so people can work on it more efficient?

jmtatsch commented 8 years ago

Imho we have to first solve the fundamental issue of the variadic templates not working with nvcc. Either the developers would have to do without those templates which is backwards and probably not going to happen or nvidia has to step up and make nvcc more compatible? In theory nvcc should already be able to deal with your own variadic templates, but external e.g. STL headers won't "just work" because of the need to annotate all functions called on the device with "host device". Maybe someone knows a good way how to get around this issue....

benoitsteiner commented 8 years ago

@jmtatsch At the moment, the version of cuda that is shipped with the tegra x1 has problems with variadic templates. Nvidia is aware of this and working on a fix. I updated Eigen a few weeks ago to disable the use of variadic templates when compiling on tegra x1, and that seems to fix the bulk of the problem. However, StrCat and StrAppend still rely on variadic templates. Until nvidia releases a fix, the best solution is to comment out the variadic versions of StrCat and StrAppend, and create non variadic versions of StrCat and StrAppend with up to 11 arguments (since that's what TensorFlow currently needs). There are a couple of ways to avoid the STL issues: a brittle solution is to only compile optimized kernels. The compiler then inlines the STL code at which point the lack of host device annotation doesn't matter since there is no function call to resolve. A better solution is to replace all the STL functionality with custom code. We've started to do this in Eigen by reimplementing most of the STL functions we need in the Eigen::numext namespace. This is tedious by much more reliable than relying on inlining to bypass the problem.

maxcuda commented 8 years ago

I have a build of TF 0.8 but it requires a new 7.0 compiler that is not yet available to the general public. I am building a wheel on a Jetson TK1, I will make it available after some testing. I will update the instructions on how to build from source on cudamusing.

robagar commented 8 years ago

Good work @maxcuda! Will it build on the TX1 too?

maxcuda commented 8 years ago

Yes, it will build on TX1 too. I fixed a problem with the new memory allocator to take in account the 32bit OS. Some basic tests are passing but the label_image test is giving the wrong results so there may be some other places with 32bit issues.

maxcuda commented 8 years ago

@benoitsteiner , with the new compiler your change to Eigen is not required anymore ( and it is forcing to edit a bunch of files). Could you please remove the check and re-enable variadic templates ?

benoitsteiner commented 8 years ago

@maxcuda Where can I download the new cuda compiler? I'd like to make sure that I don't introduce new problems when I enable variadic templates again.

tylerfox commented 8 years ago

@maxcuda is the new 7.0 compiler you were referencing part of Jetpack 2.2 that was just released?

maxcuda commented 8 years ago

Yes, you can get it with: wget http://developer.download.nvidia.com/embedded/L4T/r24_Release_v1.0/CUDA/cuda-repo-l4t-7-0-local_7.0-76_armhf.deb

The good news are that I was able to build v0.8 but some of the results are incorrect. I will update the blog with the changes. With v0.9 I had problem with the cudnn.cc file, it looks like it cannot handle cuddn v2.

tylerfox commented 8 years ago

Thanks so much. Looking forward to your post so I can get tensorflow running on the TX1

maxcuda commented 8 years ago

I updated my building instruction on cudamusing and also posted a wheel file.

syed-ahmed commented 8 years ago

Has anyone tested this on jetson tx1? I can't seem to get bazel build on aarch64.

jaeoh2 commented 8 years ago

@syed-ahmed I tested it on TX1. This is my configurations.

Cuda Toolkit 7.0, JetPack 2.2(32bit)
Bazel 0.2.1
jdk-8u76-ea-bin-b02-linux-arm-vfp-hflt-04_jan_2016.tar.gz
./configure : compute capability 5.3
bazel option : --local_resources 2048,2.0,1.0

tylerfox commented 8 years ago

@syed-ahmed I got it to build on an aarch64 TX1. I mostly followed the instructions for the TK1 at cudamusing.blogspot.de. The only additional things I did were

add aarch64 to the ARM enum in /bazel/src/main/java/com/google/devtools/build/lib/util/CPU.java by changing line 28 to "ARM("arm", ImmutableSet.of("arm", "armv7l", "aarch64"))," without quotes
Added aarch64 as valid ARM machine type in /bazel/scripts/bootstrap/buildenv.sh by changing line 35 to "if [ "${MACHINE_TYPE}" = 'arm' -o "${MACHINE_TYPE}" = 'armv7l' -o "${MACHINE_TYPE}" = 'aarch64' ]; then" without quotes

Or, if you prefer, here is the bazel executable for aarch64 I ended up with: https://drive.google.com/file/d/0B8Gc_oVaYC7CWEhOMHJhc0hLY0U/view?usp=sharing

martinwicke commented 8 years ago

Maybe make a PR against bazel?

On Wed, Jul 6, 2016 at 8:38 AM Tyler Fox notifications@github.com wrote:

@syed-ahmed https://github.com/syed-ahmed I got it to build on an aarch64 TX1. I mostly followed the instructions for the TK1 at cudamusing.blogspot.de. The only additional things I did was

add aarch64 to the ARM enum in /bazel/src/main/java/com/google/devtools/build/lib/util/CPU.java by changing line 28 to "ARM("arm", ImmutableSet.of("arm", "armv7l", "aarch64"))," without quotes

Added aarch64 as valid ARM machine type in /bazel/scripts/bootstrap/buildenv.sh by changing line 35 to "if [ "${MACHINE_TYPE}" = 'arm' -o "${MACHINE_TYPE}" = 'armv7l' -o "${MACHINE_TYPE}" = 'aarch64' ]; then" without quotes

Or, if you prefer, here is the bazel executable for aarch64 I ended up with: https://drive.google.com/file/d/0B8Gc_oVaYC7CWEhOMHJhc0hLY0U/view?usp=sharing

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/851#issuecomment-230810921, or mute the thread https://github.com/notifications/unsubscribe/AAjO_SFJWCHTe1vT-jcv8t5tp51x9clmks5qS8vjgaJpZM4HK5_C .

syed-ahmed commented 8 years ago

@tylerfox Thank you! I'll try your suggestions. In the meanwhile, any thoughts on this: https://github.com/bazelbuild/bazel/issues/1264 and @wtfuzz 's change for cc_configure.bzl. I was getting a toolchain error. So wondering if you encountered it.

Did you also build with latest bazel release or 0.1.4.? And how about the tensorflow version - r0.8?

tylerfox commented 8 years ago

@syed-ahmed yes, changing the buildenv.sh should fix that issue. Also it's worth noting that I used bazel 0.1.4 per the instructions on cudamusing. I should probably also test on the current version of bazel, but for now I know 0.1.4 works

syed-ahmed commented 8 years ago

I am trying to build the tensorflow r0.9 release. I got bazel 0.2.1 installed following @tylerfox 's suggestions. Getting this following error when trying to build tensorflow. Any thoughts? Appreciate all the help.

>>>>> # @farmhash_archive//:configure [action 'Executing genrule @farmhash_archive//:configure [for host]']
(cd /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/tensorflow && \
  exec env - \
    PATH=/usr/local/cuda-7.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/ubuntu/bazel/output/ \
  /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; pushd external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260; workdir=$(mktemp -d -t tmp.XXXXXXXXXX); cp -a * $workdir; pushd $workdir; ./configure; popd; popd; cp $workdir/config.h bazel-out/host/genfiles/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260; rm -rf $workdir;')
ERROR: /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/external/farmhash_archive/BUILD:5:1: Executing genrule @farmhash_archive//:configure failed: bash failed: error executing command 
  (cd /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/tensorflow && \
  exec env - \
    PATH=/usr/local/cuda-7.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/ubuntu/bazel/output/ \
  /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; pushd external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260; workdir=$(mktemp -d -t tmp.XXXXXXXXXX); cp -a * $workdir; pushd $workdir; ./configure; popd; popd; cp $workdir/config.h bazel-out/host/genfiles/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260; rm -rf $workdir;'): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
/home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/tensorflow/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260 /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/tensorflow
/tmp/tmp.ZKGtjQ4mLO /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/tensorflow/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260 /home/ubuntu/.cache/bazel/_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/tensorflow
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking build system type... /tmp/tmp.ZKGtjQ4mLO/missing: Unknown `--is-lightweight' option
Try `/tmp/tmp.ZKGtjQ4mLO/missing --help' for more information
configure: WARNING: 'missing' script is too old or missing
./config.guess: unable to guess system type

This script, last modified 2010-08-21, has failed to recognize
the operating system you are using. It is advised that you
download the most up to date version of the config scripts from

  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD
and
  http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD

If the version you run (./config.guess) is already up to date, please
send the following data and any information you think might be
pertinent to <config-patches@gnu.org> in order to provide the needed
information to handle your system.

config.guess timestamp = 2010-08-21

uname -m = aarch64
uname -r = 3.10.96-tegra
uname -s = Linux
uname -v = #1 SMP PREEMPT Tue May 17 16:29:05 PDT 2016

/usr/bin/uname -p = 
/bin/uname -X     = 

hostinfo               = 
/bin/universe          = 
/usr/bin/arch -k       = 
/bin/arch              = 
/usr/bin/oslevel       = 
/usr/convex/getsysinfo = 

UNAME_MACHINE = aarch64
UNAME_RELEASE = 3.10.96-tegra
UNAME_SYSTEM  = Linux
UNAME_VERSION = #1 SMP PREEMPT Tue May 17 16:29:05 PDT 2016
configure: error: cannot guess build type; you must specify one

syed-ahmed commented 8 years ago

Anyone knows what farmhash is being used for in tensorflow r0.9? My motivation for installing tensorflow 0.9 on the jetson tx1 is to solely utilize some of the fp16 ops. Hence, if farmhash is not doing anything important, may be I could remove the firmhash related code and build without it. Here is the farmhash commit.

lpralf commented 8 years ago

Temporary sources used in the build process can be found in ~/.cache/bazel. Cd to this directory and search for config.guess: find ./ -name "config.guess". You might get several files but the paths should give you a clue which config.guess is the one of farmhash. In my case it is ./_bazel_ubuntu/ad1e09741bb4109fbc70ef8216b59ee2/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260/config.guess In this file replace line UNAME_MACHINE=(uname -m) 2>/dev/null || UNAME_MACHINE=unknown with UNAME_MACHINE=armhf

On my machine (Nvidia Shield TV flashed to L4T 23.1) farmhash built successfully after this change.

shingchuang commented 8 years ago

I successfully build the tensorflow on TX1 24.1 64 bit, with the following patch. But, run example failed with following kernel message. tutorials_examp[31026]: unhandled level 1 translation fault (11) at 0xffffffffffffe8, esr 0x92000005

Maybe farmhard.BUILD with --build=arm-linux-gnu is wrong? But, I failed to compile it with --build=aaarch64-linux-gnu. I'm still trying to figure out what caused the runtime fails. tx1_patch.zip

suharshs commented 8 years ago

@benoitsteiner has reenablling variadic templates been verified to work?

alephman commented 8 years ago

@shingchuang have you found the root cause of segmentation fault issue? I have the same problem on aarch64 platform.

benoitsteiner commented 8 years ago

I tried to reenable variadic templates last night after upgrading the cuda compiler using http://developer.download.nvidia.com/embedded/L4T/r24_Release_v1.0/CUDA/cuda-repo-l4t-7-0-local_7.0-76_armhf.deb. This new compiler appears to fix some of the issues but I still get some crashes.

I noticed that nvidia released an even more recent version of the compiler. @maxcuda, is there a debian package that I can use to install the latest version of the cuda sdk ?

dusty-nv commented 8 years ago

Re-install / re-flash using JetPack 2.3 because the latest release also updated to Ubuntu 16.04 aarch64 in addition to CUDA 8 and L4T R24.2. The underlying CUDA version is tied to the L4T BSP in JetPack.

sladomic commented 8 years ago

Hi all. I'm trying to build TensorFlow for the Google Pixel C in order to use the GPU TX1. Do you build it on your machine (e.g. Mac) or on the device itself (e.g. Pixel C)? Does anyone have the already generated files for TX1 or can point me in the right direction? Thanks.

dwightcrow commented 8 years ago

Hi all - haven't gotten TensorFlow r0.11 working yet, but do have a working path to r0.9 TensorFlow install on TX1 with JetPack 2.3. Have tested basic nets MLP/LSTM/Conv and seems to work, though it OOMS out pretty easily on bigger convs.

Wrote down all my steps and patches below if it's helpful to anyone. Really appreciated all above commentary was critical to tracking down right path.

http://stackoverflow.com/questions/39783919/tensorflow-on-nvidia-tx1/

gwljf commented 8 years ago

@dwightcrow , I tried your solution, and it works on TX1, thank you. And the version 0.11.0rc0 can be built with bazel with version of 0.3.2

dwightcrow commented 8 years ago

That's fantastic. Bazel 0.3.2 builds fairly easily on TX1?

sunils27 commented 8 years ago

Wondering if there's a concise summary of everything in this issue? It would definitely make it easier for others trying to get TF working on a TX1.

sunils27 commented 8 years ago

Following up on the request for a summary to build tensorflow on a Jetson TX1. Any help is appreciated.

tensorflow / tensorflow

tensorflow for Nvidia TX1 #851