peterlee0127 / tensorflow-nvJetson

TensorFlow for NVIDIA Jetson, also include patch and script for building.
https://tfjetson.peterlee.app
205 stars 61 forks source link

ERROR: No default_toolchain found for cpu 'aarch64'. #27

Closed lchia closed 6 years ago

lchia commented 6 years ago

When I run ./buildTensorflow.sh, the error occurs, why?

nvidia@tegra-ubuntu:/media/nvidia/udisk/nvidiaTX2TF/nvJethson$ sudo ./buildTensorflow.sh 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
python-numpy is already the newest version (1:1.11.0-1ubuntu1).
python-wheel is already the newest version (0.29.0-1).
python-dev is already the newest version (2.7.12-1~16.04).
python-pip is already the newest version (8.1.1-2ubuntu0.4).
0 upgraded, 0 newly installed, 0 to remove and 304 not upgraded.
Cloning into 'tensorflow'...
remote: Counting objects: 407886, done.
remote: Compressing objects: 100% (31/31), done.
remote: Total 407886 (delta 7), reused 11 (delta 4), pack-reused 407851
Receiving objects: 100% (407886/407886), 187.59 MiB | 908.00 KiB/s, done.
Resolving deltas: 100% (324757/324757), done.
Checking connectivity... done.
Checking out files: 100% (12827/12827), done.
Checking out files: 100% (7870/7870), done.
Note: checking out 'v1.8.0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 93bc2e2... Merge pull request #18928 from tensorflow/release-patch-4-1
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.16.1- (@non-git) installed.
Please specify the location of python. [Default is /usr/bin/python]: 

Invalid python path: 2 cannot be found.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python

Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 

Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:/usr/lib/aarch64-linux-gnu

Do you wish to build TensorFlow with TensorRT support? [y/N]: y
TensorRT support will be enabled for TensorFlow.

Please specify the location where TensorRT is installed. [Default is /usr/lib/aarch64-linux-gnu]:

Please specify the NCCL version you want to use. [Leave empty to default to NCCL 1.3]: 2.2

Please specify the location where NCCL 2 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:/usr/local

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]

Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 

Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
    --config=mkl            # Build with MKL support.
    --config=monolithic     # Config for mostly static monolithic build.
Configuration finished
Starting local Bazel server and connecting to it...
WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
ERROR: No default_toolchain found for cpu 'aarch64'. Valid cpus are: [
  k8,
  piii,
  arm,
  darwin,
  ppc,
]
INFO: Elapsed time: 95.184s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (2 packages loaded)
./buildTensorflow.sh: 13: ./buildTensorflow.sh: bazel-bin/tensorflow/tools/pip_package/build_pip_package: not found
alejandroandreu commented 6 years ago

Does it also happen when trying to build TF 1.9?

lchia commented 6 years ago

@alejandroandreu Yes, same error. WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior. ERROR: No default_toolchain found for cpu 'aarch64'. Valid cpus are: [ k8, piii, arm, darwin, ppc, ]

alejandroandreu commented 6 years ago

Can you post the whole output when running the script with bash -x? Maybe the patch file is not being applied correctly? For me it just works to be honest :man_shrugging:

peterlee0127 commented 6 years ago

@lchia What's your Bazel version ?

$ bazel version
lchia commented 6 years ago

@peterlee0127 bazel version: 0.16.1 Build label: 0.16.1- (@non-git) Build target: bazel-out/aarch64-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar Build time: Tue Aug 21 14:15:42 2018 (1534860942) Build timestamp: 1534860942 Build timestamp as int: 1534860942

lchia commented 6 years ago

@alejandroandreu The whole output bash -x buildTensorflow.sh is as follows :

nvidia@tegra-ubuntu:~/workspaces/tfworkspace/tensorflow-nvJetson-master$ bash -x buildTensorflow.sh

You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example:

git checkout -b

HEAD is now at 93bc2e2... Merge pull request #18928 from tensorflow/release-patch-4-1

Found possible Python library paths: /usr/local/lib/python2.7/dist-packages /usr/lib/python2.7/dist-packages Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: n No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]:

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]:

Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Do you wish to build TensorFlow with TensorRT support? [y/N]: y TensorRT support will be enabled for TensorFlow.

Please specify the location where TensorRT is installed. [Default is /usr/lib/aarch64-linux-gnu]:

Please specify the NCCL version you want to use. [Leave empty to default to NCCL 1.3]: 2.2

Please specify the location where NCCL 2 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:/usr/local

Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]

Do you want to use clang as CUDA compiler? [y/N]: n nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:

Do you wish to build TensorFlow with MPI support? [y/N]: n No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. Configuration finished

peterlee0127 commented 6 years ago

Could you try to downgrade Bazel to 0.15.2

And for Jetson NCCL should set to 1.3.

Please specify the NCCL version you want to use. [Leave empty to default to NCCL 1.3]: 1.3
lchia commented 6 years ago

@peterlee0127 I tried, but got another error. INFO: From Compiling tensorflow/core/kernels/strided_slice_op_gpu.cu.cc: Killed ERROR: /data/tfworkspace/tensorflow-nvJetson-master/tensorflow/tensorflow/core/kernels/BUILD:99:1: output 'tensorflow/core/kernels/_objs/strided_slice_op_gpu/tensorflow/core/kernels/strided_slice_op_gpu.cu.o' was not created ERROR: /data/tfworkspace/tensorflow-nvJetson-master/tensorflow/tensorflow/core/kernels/BUILD:99:1: not all outputs were created or valid Target //tensorflow/tools/pip_package:build_pip_package failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 1819.720s, Critical Path: 128.77s INFO: 2249 processes: 2249 local. FAILED: Build did NOT complete successfully Thu Aug 23 14:35:57 CST 2018 : === Using tmpdir: /tmp/tmp.xNmvnF9fQk ~/workspaces/tfworkspace/tensorflow-nvJetson-master/tensorflow/bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles ~/workspaces/tfworkspace/tensorflow-nvJetson-master/tensorflow ~/workspaces/tfworkspace/tensorflow-nvJetson-master/tensorflow /tmp/tmp.xNmvnF9fQk ~/workspaces/tfworkspace/tensorflow-nvJetson-master/tensorflow Thu Aug 23 14:36:00 CST 2018 : === Building wheel error: can't copy 'external/protobuf_archive/python/google/protobuf/field_mask_pb2.py': doesn't exist or not a regular file

alejandroandreu commented 6 years ago

I can confirm that downgrading to Bazel 0.15.2 does fix the problem (0.16.1 causes issues). I just compiled Tensorflow 1.10 just fine with it.

peterlee0127 commented 6 years ago

The reason for the ERROR: No default_toolchain found for cpu 'aarch64' The newer Bazel now support the arm64 define. So, it will cause building tensorflow error. It defined arm not aarch64. Just use the Bazel 0.15.2.

Configuration finished
Starting local Bazel server and connecting to it...
WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
ERROR: No default_toolchain found for cpu 'aarch64'. Valid cpus are: [
  k8,
  piii,
  arm,
  darwin,
  ppc,
]

CPU.java

lchia commented 6 years ago

@peterlee0127 Thanks. I tried Bazel 0.15.2 for tensorflow r.1.10, the 'ERROR: No default_toolchain found for cpu 'aarch64'' is solved. But got another Linking of rule '//tensorflow/contrib/nccl:python/ops/_nccl_ops.so' failed error:

Did I missed something? Thanks very much.

ERROR: /data/tfworkspace/tensorflow-nvJetson-master/tensorflow/tensorflow/contrib/nccl/BUILD:24:1: Linking of rule '//tensorflow/contrib/nccl:python/ops/_nccl_ops.so' failed (Exit 1) /usr/bin/ld: skipping incompatible bazel-out/arm-opt/bin/_solib_local/_U@local_Uconfig_Unccl_SS Cnccl_Uexternal_Slocal_Uconfig_Unccl_Snccl_Slib/libnccl.so.2 when searching for -l:libnccl.so.2 /usr/bin/ld: skipping incompatible bazel-out/arm-opt/bin/_solib_local/_U @local_Uconfig_Unccl_SS Cnccl_Uexternal_Slocal_Uconfig_Unccl_Snccl_Slib/libnccl.so.2 when searching for -l:libnccl.so.2 /usr/bin/ld: skipping incompatible //usr/local/lib/libnccl.so.2 when searching for -l:libnccl.so.2 /usr/bin/ld: skipping incompatible //usr/local/lib/libnccl.so.2 when searching for -l:libnccl.so.2 /usr/bin/ld: cannot find -l:libnccl.so.2 collect2: error: ld returned 1 exit status Target //tensorflow/tools/pip_package:build_pip_package failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 13855.829s, Critical Path: 639.73s INFO: 5610 processes: 5610 local. FAILED: Build did NOT complete successfully Fri Aug 24 02:44:11 CST 2018 : === Preparing sources in dir: /tmp/tmp.24Rxzm44CJ ~/workspaces/tfworkspace/tensorflow-nvJetson-master/tensorflow ~/workspaces/tfworkspace/tensorflow-nvJetson-master/tensorflow ~/workspaces/tfworkspace/tensorflow-nvJetson-master/tensorflow Fri Aug 24 02:45:15 CST 2018 : === Building wheel warning: no files found matching '.dll' under directory '' warning: no files found matching '.lib' under directory '' warning: no files found matching '.h' under directory 'tensorflow/include/tensorflow' warning: no files found matching '' under directory 'tensorflow/include/Eigen' warning: no files found matching '.h' under directory 'tensorflow/include/google' warning: no files found matching '' under directory 'tensorflow/include/third_party' warning: no files found matching '*' under directory 'tensorflow/include/unsupported' Fri Aug 24 02:45:29 CST 2018 : === Output wheel file is in: /tmp/tensorflow_pkg

peterlee0127 commented 6 years ago

Can you try to reclone the tensorflow to your home folder ? Remember to export LD_LIBRARY_PATH.

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64
next step ....

I am not sure why you will fail at here. It seems you missing some package or LIBRARY_PATH is not link.

/usr/bin/ld: cannot find -l:libnccl.so.2
collect2: error: ld returned 1 exit status

Do you install all requirement of Jetpack ?

$ ls /var/cuda-repo-9-0-local
7fa2af80.pub                                     cuda-libraries-9-0_9.0.252-1_arm64.deb
cuda-command-line-tools-9-0_9.0.252-1_arm64.deb  cuda-libraries-dev-9-0_9.0.252-1_arm64.deb
cuda-core-9-0_9.0.252-1_arm64.deb                cuda-license-9-0_9.0.252-1_arm64.deb
cuda-cublas-9-0_9.0.252-1_arm64.deb              cuda-minimal-build-9-0_9.0.252-1_arm64.deb
cuda-cublas-dev-9-0_9.0.252-1_arm64.deb          cuda-misc-headers-9-0_9.0.252-1_arm64.deb
cuda-cudart-9-0_9.0.252-1_arm64.deb              cuda-npp-9-0_9.0.252-1_arm64.deb
cuda-cudart-dev-9-0_9.0.252-1_arm64.deb          cuda-npp-dev-9-0_9.0.252-1_arm64.deb
cuda-cufft-9-0_9.0.252-1_arm64.deb               cuda-nvgraph-9-0_9.0.252-1_arm64.deb
cuda-cufft-dev-9-0_9.0.252-1_arm64.deb           cuda-nvgraph-dev-9-0_9.0.252-1_arm64.deb
cuda-curand-9-0_9.0.252-1_arm64.deb              cuda-nvml-dev-9-0_9.0.252-1_arm64.deb
cuda-curand-dev-9-0_9.0.252-1_arm64.deb          cuda-nvrtc-9-0_9.0.252-1_arm64.deb
cuda-cusolver-9-0_9.0.252-1_arm64.deb            cuda-nvrtc-dev-9-0_9.0.252-1_arm64.deb
cuda-cusolver-dev-9-0_9.0.252-1_arm64.deb        cuda-runtime-9-0_9.0.252-1_arm64.deb
cuda-cusparse-9-0_9.0.252-1_arm64.deb            cuda-samples-9-0_9.0.252-1_arm64.deb
cuda-cusparse-dev-9-0_9.0.252-1_arm64.deb        cuda-toolkit-9-0_9.0.252-1_arm64.deb
cuda-documentation-9-0_9.0.252-1_arm64.deb       Packages.gz
cuda-driver-dev-9-0_9.0.252-1_arm64.deb          Release
cuda-gdb-src-9-0_9.0.252-1_arm64.deb             Release.gpg

Anyway you can try to install my tensorflow build.

lchia commented 6 years ago

@peterlee0127 @alejandroandreu I made it by choosing nccl1.3 in ./configure. Thanks very much. Your replies gave me powerful supports. My successful setting is as follows:

Nvidia Jethon TX2

  • (1) Jetpack 3.3 installed following https://www.jetsonhacks.com/2017/03/21/jetpack-3-0-nvidia-jetson-tx2-development-kit/ (better video) or https://tm3.ghost.io/2018/07/06/setting-up-the-nvidia-jetson-tx2/
  • (2) Bazel 0.15.2 by running ./installBazel.sh
  • (3) swap file : 8G by running ./createswapfile.sh:

    Create a swapfile for Ubuntu at the current directory location

    SWAPDIRECTORY=$PWD SWAPSIZE=8 fallocate -l $SWAPSIZE"G" $SWAPDIRECTORY"/swapfile" cd $SWAPDIRECTORY ls -lh swapfile sudo chmod 600 swapfile ls -lh swapfile sudo mkswap swapfile sudo swapon swapfile swapon -s

  • (4) install tensorflow by running ./buildTensorflow.sh modified as follows ! /bin/sh export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64 sudo apt-get install python-numpy python-dev python-pip python-wheel git clone https://github.com/tensorflow/tensorflow.git cd tensorflow git checkout r1.10 git apply ../patch/tensorflow1.10.0rc1.patch ./configure bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg bazel build --config=opt --config=cuda //tensorflow:libtensorflow_cc.so
alejandroandreu commented 6 years ago

No worries. FYI, and I guess you know already, the compute capability on the Jetson TX2 should be 6.2.