Closed hossein1387 closed 4 years ago
if you are trying to do a native compile, don't use --arm option (which is for cross compile)
here's the commandline I used to build a python wheel with tensorrt support on jetson nano. you may customize it to suit your purposes.
./build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu
Thanks for your comment but I am getting this error:
Scanning dependencies of target onnxruntime_providers_cuda
[ 25%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/activation/activations.cc.o
[ 25%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/controlflow/if.cc.o
[ 25%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/controlflow/loop.cc.o
In file included from /home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/controlflow/loop.h:6:0,
from /home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/controlflow/loop.cc:4:
/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/controlflow/loop.cc: In function ‘onnxruntime::common::Status onnxruntime::cuda::ConcatenateGpuOutput(std::vector<OrtValue>&, void*, size_t)’:
/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/controlflow/loop.cc:61:85: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
ORT_ENFORCE(static_cast<gsl::byte*>(cur_output) - static_cast<gsl::byte*>(output) == output_size_in_bytes,
^
/home/askari/repos/onnxruntime/include/onnxruntime/core/common/common.h:112:9: note: in definition of macro ‘ORT_ENFORCE’
if (!(condition)) \
^
[ 25%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/controlflow/scan.cc.o
[ 25%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cuda_allocator.cc.o
[ 26%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc.o
[ 26%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cuda_execution_provider.cc.o
[ 26%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cuda_fence.cc.o
[ 26%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cuda_pch.cc.o
[ 26%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cuda_provider_factory.cc.o
[ 26%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cudnn_common.cc.o
In file included from /home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cudnn_common.cc:4:0:
/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cudnn_common.h:42:12: error: expected type-specifier before ‘cudnnRNNDataDescriptor_t’
operator cudnnRNNDataDescriptor_t() const { return tensor_; }
^
what is the minimum cuda version requirements? I have cuda9.0
By the way , in your command, why tensorrt_home
is poiting to /usr/lib/aarch64-linux-gnu
?
@jywu-msft could you please let me know if you know what might be the problem?
the jetson nano I tested awhile back was using jetpack 4.2.1 which comes with cuda 10.0, cudnn 7.5, and tensorrt 5.1.6 latest version should work as well. see https://developer.nvidia.com/embedded/jetpack-archive
the jetson nano I tested awhile back was using jetpack 4.2.1 which comes with cuda 10.0, cudnn 7.5, and tensorrt 5.1.6 latest version should work as well. see https://developer.nvidia.com/embedded/jetpack-archive
if you want to use latest onnxruntime master branch with tensorrt , it requires tensorrt 6 that means if you use jetpack, you will need to use jetpack 4.3 which comes with tensorrt 6.
I also have a TX1 that has jetpack 4.3 installed with cuda 10.0:
askari@askari-desktop:~$ dpkg -l | grep TensorRT
ii graphsurgeon-tf 6.0.1-1+cuda10.0 arm64 GraphSurgeon for TensorRT package
ii libnvinfer-bin 6.0.1-1+cuda10.0 arm64 TensorRT binaries
ii libnvinfer-dev 6.0.1-1+cuda10.0 arm64 TensorRT development libraries and headers
ii libnvinfer-doc 6.0.1-1+cuda10.0 all TensorRT documentation
ii libnvinfer-plugin-dev 6.0.1-1+cuda10.0 arm64 TensorRT plugin libraries
ii libnvinfer-plugin6 6.0.1-1+cuda10.0 arm64 TensorRT plugin libraries
ii libnvinfer-samples 6.0.1-1+cuda10.0 all TensorRT samples
ii libnvinfer6 6.0.1-1+cuda10.0 arm64 TensorRT runtime libraries
ii libnvonnxparsers-dev 6.0.1-1+cuda10.0 arm64 TensorRT ONNX libraries
ii libnvonnxparsers6 6.0.1-1+cuda10.0 arm64 TensorRT ONNX libraries
ii libnvparsers-dev 6.0.1-1+cuda10.0 arm64 TensorRT parsers libraries
ii libnvparsers6 6.0.1-1+cuda10.0 arm64 TensorRT parsers libraries
ii nvidia-container-csv-tensorrt 6.0.1.10-1+cuda10.0 arm64 Jetpack TensorRT CSV file
ii python-libnvinfer 6.0.1-1+cuda10.0 arm64 Python bindings for TensorRT
ii python-libnvinfer-dev 6.0.1-1+cuda10.0 arm64 Python development package for TensorRT
ii python3-libnvinfer 6.0.1-1+cuda10.0 arm64 Python 3 bindings for TensorRT
ii python3-libnvinfer-dev 6.0.1-1+cuda10.0 arm64 Python 3 development package for TensorRT
ii tensorrt 6.0.1.10-1+cuda10.0 arm64 Meta package of TensorRT
ii uff-converter-tf 6.0.1-1+cuda10.0 arm64 UFF converter for TensorRT package
With the latest onnxruntime I get stuck right at the beginning:
Scanning dependencies of target onnxruntime_mlas
[ 0%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/platform.cpp.o
[ 0%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/threading.cpp.o
[ 0%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/dgemm.cpp.o
[ 0%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/sgemm.cpp.o
[ 1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/qgemm.cpp.o
[ 1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/convolve.cpp.o
[ 1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/pooling.cpp.o
[ 1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/reorder.cpp.o
[ 1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/snchwc.cpp.o
[ 1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/activate.cpp.o
[ 1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/logistic.cpp.o
[ 1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/tanh.cpp.o
[ 1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/erf.cpp.o
[ 1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp.o
/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp: In function ‘void MlasQuantizeLinearKernel(const float*, OutputType*, size_t, float, int32_t)’:
/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp:179:79: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts
vst1q_lane_u8((uint8_t*)Output + n, vreinterpretq_s32_u8(IntegerVector), 0);
^
/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp:179:79: error: cannot convert ‘__vector(4) int’ to ‘uint8x16_t {aka __vector(16) unsigned char}’ for argument ‘1’ to int32x4_t vreinterpretq_s32_u8(uint8x16_t)’
CMakeFiles/onnxruntime_mlas.dir/build.make:231: recipe for target 'CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp.o' failed
make[2]: *** [CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp.o] Error 1
CMakeFiles/Makefile2:883: recipe for target 'CMakeFiles/onnxruntime_mlas.dir/all' failed
make[1]: *** [CMakeFiles/onnxruntime_mlas.dir/all] Error 2
Makefile:162: recipe for target 'all' failed
make: *** [all] Error 2
Traceback (most recent call last):
File "/mnt/sdcard/onnxruntime/tools/ci_build/build.py", line 1051, in <module>
sys.exit(main())
File "/mnt/sdcard/onnxruntime/tools/ci_build/build.py", line 983, in main
build_targets(cmake_path, build_dir, configs, args.parallel)
File "/mnt/sdcard/onnxruntime/tools/ci_build/build.py", line 415, in build_targets
run_subprocess(cmd_args)
File "/mnt/sdcard/onnxruntime/tools/ci_build/build.py", line 197, in run_subprocess
completed_process = subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell)
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/local/bin/cmake', '--build', '/mnt/sdcard/onnxruntime/build/Linux/Release', '--config', 'Release']' returned non-zero exit status 2
Finally made it working. I am giving step by step instructions here to reproduce what I did. First, here is my system information (pay close attention to CUDA, cuDNN, TensorRT and JetPack version):
* Name: NVIDIA Jetson NANO/TX1 e-mail: raffaello@rnext.it
* Type: NANO/TX1
* Jetpack: 4.3 [L4T 32.3.1]
* GPU-Arch: 5.3
* SN: 0323216129961
- Libraries:
* CUDA: 10.0.326
* cuDNN: 7.6.3.28-1+cuda10.0
* TensorRT: 6.0.1.10-1+cuda10.0
* VisionWorks: 1.6.0.500n
* OpenCV: 4.1.1 compiled CUDA: YES
The latest release (at commit c33dab3) is not compatible with the above CUDA, cuDNN and TensorRT versions. The problem most likely resides at the quantization file (onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp
). I used the previous release (commit b783805). Here are the step by step instructions:
1- git clone --recursive https://github.com/Microsoft/onnxruntime
2- cd onnxruntime
3- git checkout b783805
4- export CUDACXX="/usr/local/cuda/bin/nvcc"
5- Modify tools/ci_build/build.py
- "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "ON"),
+ "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "OFF"),
6- Modify cmake/CMakeLists.txt
- set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_50,code=sm_50") # M series
+ set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_53,code=sm_53") # Jetson support
7- ./build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu
step 4 is most likely unnecessary because in the latest release, cmake automatically finds the correct cuda.
Note: the Jetson TX1 with the latest jetpack is using up to 14GB of the available 16GB disk space. The native build of onnxruntime requires at least 3GB of disk space and 2GB of ram. I mounted a sdcard with ext4 filesystem format and cloned the repo into it (it will be extermely slow, took me 3-4h to compile the code, better option would be adding a ssd drive). Here is my df -Th
:
Filesystem Type Size Used Avail Use% Mounted on
/dev/mmcblk0p1 ext4 14G 13G 766M 95% /
none devtmpfs 1.7G 0 1.7G 0% /dev
tmpfs tmpfs 2.0G 4.0K 2.0G 1% /dev/shm
tmpfs tmpfs 2.0G 20M 2.0G 1% /run
tmpfs tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
tmpfs tmpfs 396M 12K 396M 1% /run/user/120
tmpfs tmpfs 396M 120K 396M 1% /run/user/1000
/dev/mmcblk2p1 ext4 15G 3.1G 11G 22% /media/askari/sd_card
/dev/loop0 vfat 16M 78K 16M 1% /media/askari/L4T-README
You can make a swap file to make sure you have enough RAM. You can disable GUI and desktop at boot time to free up space.
I am closing the issue since the original problem is resolved.
Finally made it working. I am giving step by step instructions here to reproduce what I did. First, here is my system information (pay close attention to CUDA, cuDNN, TensorRT and JetPack version):
* Name: NVIDIA Jetson NANO/TX1 e-mail: raffaello@rnext.it * Type: NANO/TX1 * Jetpack: 4.3 [L4T 32.3.1] * GPU-Arch: 5.3 * SN: 0323216129961 - Libraries: * CUDA: 10.0.326 * cuDNN: 7.6.3.28-1+cuda10.0 * TensorRT: 6.0.1.10-1+cuda10.0 * VisionWorks: 1.6.0.500n * OpenCV: 4.1.1 compiled CUDA: YES
The latest release (at commit c33dab3) is not compatible with the above CUDA, cuDNN and TensorRT versions. The problem most likely resides at the quantization file (
onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp
). I used the previous release (commit b783805). Here are the step by step instructions:1- git clone --recursive https://github.com/Microsoft/onnxruntime 2- cd onnxruntime 3- git checkout b783805 4- export CUDACXX="/usr/local/cuda/bin/nvcc" 5- Modify tools/ci_build/build.py - "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "ON"), + "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "OFF"), 6- Modify cmake/CMakeLists.txt - set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_50,code=sm_50") # M series + set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_53,code=sm_53") # Jetson support 7- ./build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu
step 4 is most likely unnecessary because in the latest release, cmake automatically finds the correct cuda.
Note: the Jetson TX1 with the latest jetpack is using up to 14GB of the available 16GB disk space. The native build of onnxruntime requires at least 3GB of disk space and 2GB of ram. I mounted a sdcard with ext4 filesystem format and cloned the repo into it (it will be extermely slow, took me 3-4h to compile the code, better option would be adding a ssd drive). Here is my
df -Th
:Filesystem Type Size Used Avail Use% Mounted on /dev/mmcblk0p1 ext4 14G 13G 766M 95% / none devtmpfs 1.7G 0 1.7G 0% /dev tmpfs tmpfs 2.0G 4.0K 2.0G 1% /dev/shm tmpfs tmpfs 2.0G 20M 2.0G 1% /run tmpfs tmpfs 5.0M 4.0K 5.0M 1% /run/lock tmpfs tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup tmpfs tmpfs 396M 12K 396M 1% /run/user/120 tmpfs tmpfs 396M 120K 396M 1% /run/user/1000 /dev/mmcblk2p1 ext4 15G 3.1G 11G 22% /media/askari/sd_card /dev/loop0 vfat 16M 78K 16M 1% /media/askari/L4T-README
You can make a swap file to make sure you have enough RAM. You can disable GUI and desktop at boot time to free up space.
thanks for documenting this!
@tracysh , @askhade FYI, seems like onnxruntime/core/mlas/lib/quantize.cpp does not build on some platforms (nvidia jetson in this case). can we fix it?
to compile for arm64 (cpu only) follow the exact same procedure except use the following to compile:
./build.sh --config Release --update --build --build_wheel
to compile for arm64 (cpu only) follow the exact same procedure except use the following to compile:
./build.sh --config Release --update --build --build_wheel
FYI, the build error you encountered in the latest release was fixed in master by https://github.com/microsoft/onnxruntime/commit/ebf23744eb1e587e12e3459b1891a89a840fc687
thanks again for documenting the steps and reporting the error!
Awesome! Thanks
Hi guys, after ./build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu
how do i install it so i can use it in my python code? :) thanks in advance
Depending on you build target, there will be a wheel file. For instance for a tensorrt build, you will have a file named: onnxruntime_gpu_tensorrt-1.0.0-cp36-cp36m-linux_aarch64 this file should be located in your build directory.
Depending on you build target, there will be a wheel file. For instance for a tensorrt build, you will have a file named: onnxruntime_gpu_tensorrt-1.0.0-cp36-cp36m-linux_aarch64 this file should be located in your build directory.
Thanks man but i don't see any file like it in the folder build i have 2 different folders: lib, Linux. Folder Linux has: Release and in Release there is: build gmock.pc libonnxruntime_providers_tensorrt.a onnxruntime onnxruntime_test_python_nuphar.py CMakeCache.txt gtest_main.pc libonnxruntime_session.a onnxruntime_config.h onnxruntime_test_python.py CMakeFiles gtest.pc libonnxruntime_test_utils.a onnxruntime_gpu_tensorrt.egg-info onnx_test_runner cmake_install.cmake libonnxruntime_common.a libonnxruntime_test_utils_for_framework.a onnxruntime_mlas_test opaque_api_test CPackConfig.cmake libonnxruntime_framework.a libonnxruntime_util.a onnxruntime_perf_test pybind11 CPackSourceConfig.cmake libonnxruntime_graph.a libonnx_test_data_proto.a onnxruntime_profile__2020-01-22_15-11-51.json PythonApiTestOptimizedModel.onnx CTestTestfile.cmake libonnxruntime_mlas.a libonnx_test_runner_common.a onnxruntime_pybind11_state.so testdata dist libonnxruntime_optimizer.a Makefile onnxruntime_test_all tml.pb.cc external libonnxruntime_providers.a onnx onnxruntime_test_python_backend.py tml.pb.h gmock_main.pc libonnxruntime_providers_cuda.a onnx_backend_test_series.py onnxruntime_test_python_keras.py VERSION_NUMBER
The folder build inside that has: bdist.linux-aarch64 and lib and folder lib has only: backend capi datasets init.py LICENSE Privacy.md ThirdPartyNotices.txt tools
so any other help or can you show me where i make mistake pls?
If your build ended successfully, you should have a wheel. In the build directory, run the following command:
find ./ -name *.whl
This should point you to the wheel.
I see you have a dist directory. The whl file should be in there.
I see you have a dist directory. The whl file should be in there.
@hossein1387 @jywu-msft thanks a lot guys :)
I followed the instructions above but I am getting an unused parameter error on onnxruntime rel-1.1.1
In file included from /home/workdir/Onnx/onnxruntime/onnxruntime/core/session/inference_session.cc:49:0:
/home/workdir/Onnx/onnxruntime/onnxruntime/core/providers/cpu/cpu_execution_provider.h: In constructor ‘onnxruntime::CPUExecutionProvider::CPUExecutionProvider(const onnxruntime::CPUExecutionProviderInfo&)’:
/home/workdir/Onnx/onnxruntime/onnxruntime/core/providers/cpu/cpu_execution_provider.h:28:65: error: unused parameter ‘info’ [-Werror=unused-parameter]
explicit CPUExecutionProvider(const CPUExecutionProviderInfo& info)
Thanks @hossein1387 @jywu-msft ! Following these instructions worked like a charm on a TX2. Just note TX2 requires 62 architecture (compute_62 and sm_62).
Finally made it working. I am giving step by step instructions here to reproduce what I did. First, here is my system information (pay close attention to CUDA, cuDNN, TensorRT and JetPack version):
* Name: NVIDIA Jetson NANO/TX1 e-mail: raffaello@rnext.it * Type: NANO/TX1 * Jetpack: 4.3 [L4T 32.3.1] * GPU-Arch: 5.3 * SN: 0323216129961 - Libraries: * CUDA: 10.0.326 * cuDNN: 7.6.3.28-1+cuda10.0 * TensorRT: 6.0.1.10-1+cuda10.0 * VisionWorks: 1.6.0.500n * OpenCV: 4.1.1 compiled CUDA: YES
The latest release (at commit c33dab3) is not compatible with the above CUDA, cuDNN and TensorRT versions. The problem most likely resides at the quantization file (
onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp
). I used the previous release (commit b783805). Here are the step by step instructions:1- git clone --recursive https://github.com/Microsoft/onnxruntime 2- cd onnxruntime 3- git checkout b783805 4- export CUDACXX="/usr/local/cuda/bin/nvcc" 5- Modify tools/ci_build/build.py - "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "ON"), + "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "OFF"), 6- Modify cmake/CMakeLists.txt - set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_50,code=sm_50") # M series + set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_53,code=sm_53") # Jetson support 7- ./build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu
step 4 is most likely unnecessary because in the latest release, cmake automatically finds the correct cuda.
Note: the Jetson TX1 with the latest jetpack is using up to 14GB of the available 16GB disk space. The native build of onnxruntime requires at least 3GB of disk space and 2GB of ram. I mounted a sdcard with ext4 filesystem format and cloned the repo into it (it will be extermely slow, took me 3-4h to compile the code, better option would be adding a ssd drive). Here is my
df -Th
:Filesystem Type Size Used Avail Use% Mounted on /dev/mmcblk0p1 ext4 14G 13G 766M 95% / none devtmpfs 1.7G 0 1.7G 0% /dev tmpfs tmpfs 2.0G 4.0K 2.0G 1% /dev/shm tmpfs tmpfs 2.0G 20M 2.0G 1% /run tmpfs tmpfs 5.0M 4.0K 5.0M 1% /run/lock tmpfs tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup tmpfs tmpfs 396M 12K 396M 1% /run/user/120 tmpfs tmpfs 396M 120K 396M 1% /run/user/1000 /dev/mmcblk2p1 ext4 15G 3.1G 11G 22% /media/askari/sd_card /dev/loop0 vfat 16M 78K 16M 1% /media/askari/L4T-README
You can make a swap file to make sure you have enough RAM. You can disable GUI and desktop at boot time to free up space.
This guide has been really useful to me, thanks! If you have any insight into the issues I am facing here I would be super grateful! https://github.com/microsoft/onnxruntime/issues/3240
the jetson nano I tested awhile back was using jetpack 4.2.1 which comes with cuda 10.0, cudnn 7.5, and tensorrt 5.1.6 latest version should work as well. see https://developer.nvidia.com/embedded/jetpack-archive
I'm facing issues while building onnxruntime in my jetson TX2 (Jetpack 4.2.2) with tensorrt 5.1.6 and cuda 10. I checked out both branches b783805 and ebf2374 but didnt work. your help would be much appreciated
the jetson nano I tested awhile back was using jetpack 4.2.1 which comes with cuda 10.0, cudnn 7.5, and tensorrt 5.1.6 latest version should work as well. see https://developer.nvidia.com/embedded/jetpack-archive
I'm facing issues while building onnxruntime in my jetson TX2 (Jetpack 4.2.2) with tensorrt 5.1.6 and cuda 10. I checked out both branches b783805 and ebf2374 but didnt work. your help would be much appreciated
Are you able to upgrade to Jetpack 4.3 ? (with TensorRT 6.0.1) If you need to stick with Jetpack 4.2.2 (TensorRt 5.1.6) you will need to use https://github.com/microsoft/onnxruntime/commit/4bb6385dcab5e6bb0af2bf494538b92fbdde1683 , which is the commit prior to TensorRT 6.x support being added. Or you can use rel-0.5.0 branch for an official release that supports TensorRT 5.x. The commits you reference are only compatible with TensorRT 6.x
btw, further issues related to TensorRT support on Nvidia Tegra/ARM64 should be opened as new issues. This issue is already closed and the thread has gotten too long. thanks!
Describe the bug
Unable to do a native build from source on TX2.
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux tx2 4.4.38-tegra #1 SMP PREEMPT Thu Mar 1 20:49:20 PST 2018 aarch64 aarch64 aarch64 GNU/Linux
ONNX Runtime installed from (source or binary): source on commit commit c767e264c52c3bac2c319b630d37f541f4d2a677
ONNX Runtime version:
Python version: Python 3.5.2
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source): gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
CUDA/cuDNN version:
GPU model and memory:
To Reproduce
2019-12-17 18:36:06,187 Build [INFO] - Build started 2019-12-17 18:36:06,188 Build [ERROR] - Only Windows ARM(64) cross-compiled builds supported currently through this script
Expected behavior I followed instructions that are provided in BUIL.md file. The process worked perfectly fine on raspberry pi (ARMv7 32 bit machine) but on TX2 which is an ARMv8 64bit machine failed. The following error message appeared:
2019-12-17 18:36:06,187 Build [INFO] - Build started 2019-12-17 18:36:06,188 Build [ERROR] - Only Windows ARM(64) cross-compiled builds supported currently through this script
The error message is very ambigious to me since I am not cross compiling and I am doing a native build on TX2. I was wondering if someone could help me to figure out how to do native build (not cross compiling) on TX2.