microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.14k stars 2.85k forks source link

Instructions to build for ARM 64bit #2684

Closed hossein1387 closed 4 years ago

hossein1387 commented 4 years ago

Describe the bug

Unable to do a native build from source on TX2.

System information

To Reproduce

sudo apt-get update
sudo apt-get install -y \
    sudo \
    build-essential \
    curl \
    libcurl4-openssl-dev \
    libssl-dev \
    wget \
    python3 \
    python3-pip \
    python3-dev \
    git \
    tar
pip3 install --upgrade pip
pip3 install --upgrade setuptools
pip3 install --upgrade wheel
pip3 install numpy
cd /code
git clone --recursive https://github.com/Microsoft/onnxruntime

cd /code/onnxruntime
./build.sh --config MinSizeRel --arm --update --build

2019-12-17 18:36:06,187 Build [INFO] - Build started 2019-12-17 18:36:06,188 Build [ERROR] - Only Windows ARM(64) cross-compiled builds supported currently through this script

Expected behavior I followed instructions that are provided in BUIL.md file. The process worked perfectly fine on raspberry pi (ARMv7 32 bit machine) but on TX2 which is an ARMv8 64bit machine failed. The following error message appeared:

2019-12-17 18:36:06,187 Build [INFO] - Build started 2019-12-17 18:36:06,188 Build [ERROR] - Only Windows ARM(64) cross-compiled builds supported currently through this script

The error message is very ambigious to me since I am not cross compiling and I am doing a native build on TX2. I was wondering if someone could help me to figure out how to do native build (not cross compiling) on TX2.

jywu-msft commented 4 years ago

if you are trying to do a native compile, don't use --arm option (which is for cross compile)

here's the commandline I used to build a python wheel with tensorrt support on jetson nano. you may customize it to suit your purposes.

./build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu

hossein1387 commented 4 years ago

Thanks for your comment but I am getting this error:


Scanning dependencies of target onnxruntime_providers_cuda
[ 25%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/activation/activations.cc.o
[ 25%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/controlflow/if.cc.o
[ 25%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/controlflow/loop.cc.o
In file included from /home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/controlflow/loop.h:6:0,
                 from /home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/controlflow/loop.cc:4:
/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/controlflow/loop.cc: In function ‘onnxruntime::common::Status onnxruntime::cuda::ConcatenateGpuOutput(std::vector<OrtValue>&, void*, size_t)’:
/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/controlflow/loop.cc:61:85: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   ORT_ENFORCE(static_cast<gsl::byte*>(cur_output) - static_cast<gsl::byte*>(output) == output_size_in_bytes,
                                                                                     ^
/home/askari/repos/onnxruntime/include/onnxruntime/core/common/common.h:112:9: note: in definition of macro ‘ORT_ENFORCE’
   if (!(condition))                                                           \
         ^
[ 25%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/controlflow/scan.cc.o
[ 25%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cuda_allocator.cc.o
[ 26%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cuda_call.cc.o
[ 26%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cuda_execution_provider.cc.o
[ 26%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cuda_fence.cc.o
[ 26%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cuda_pch.cc.o
[ 26%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cuda_provider_factory.cc.o
[ 26%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cudnn_common.cc.o
In file included from /home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cudnn_common.cc:4:0:
/home/askari/repos/onnxruntime/onnxruntime/core/providers/cuda/cudnn_common.h:42:12: error: expected type-specifier before ‘cudnnRNNDataDescriptor_t’
   operator cudnnRNNDataDescriptor_t() const { return tensor_; }
            ^
hossein1387 commented 4 years ago

what is the minimum cuda version requirements? I have cuda9.0

hossein1387 commented 4 years ago

By the way , in your command, why tensorrt_home is poiting to /usr/lib/aarch64-linux-gnu ?

hossein1387 commented 4 years ago

@jywu-msft could you please let me know if you know what might be the problem?

jywu-msft commented 4 years ago

the jetson nano I tested awhile back was using jetpack 4.2.1 which comes with cuda 10.0, cudnn 7.5, and tensorrt 5.1.6 latest version should work as well. see https://developer.nvidia.com/embedded/jetpack-archive

jywu-msft commented 4 years ago

the jetson nano I tested awhile back was using jetpack 4.2.1 which comes with cuda 10.0, cudnn 7.5, and tensorrt 5.1.6 latest version should work as well. see https://developer.nvidia.com/embedded/jetpack-archive

if you want to use latest onnxruntime master branch with tensorrt , it requires tensorrt 6 that means if you use jetpack, you will need to use jetpack 4.3 which comes with tensorrt 6.

hossein1387 commented 4 years ago

I also have a TX1 that has jetpack 4.3 installed with cuda 10.0:

askari@askari-desktop:~$ dpkg -l | grep TensorRT
ii  graphsurgeon-tf                               6.0.1-1+cuda10.0                                arm64        GraphSurgeon for TensorRT package
ii  libnvinfer-bin                                6.0.1-1+cuda10.0                                arm64        TensorRT binaries
ii  libnvinfer-dev                                6.0.1-1+cuda10.0                                arm64        TensorRT development libraries and headers
ii  libnvinfer-doc                                6.0.1-1+cuda10.0                                all          TensorRT documentation
ii  libnvinfer-plugin-dev                         6.0.1-1+cuda10.0                                arm64        TensorRT plugin libraries
ii  libnvinfer-plugin6                            6.0.1-1+cuda10.0                                arm64        TensorRT plugin libraries
ii  libnvinfer-samples                            6.0.1-1+cuda10.0                                all          TensorRT samples
ii  libnvinfer6                                   6.0.1-1+cuda10.0                                arm64        TensorRT runtime libraries
ii  libnvonnxparsers-dev                          6.0.1-1+cuda10.0                                arm64        TensorRT ONNX libraries
ii  libnvonnxparsers6                             6.0.1-1+cuda10.0                                arm64        TensorRT ONNX libraries
ii  libnvparsers-dev                              6.0.1-1+cuda10.0                                arm64        TensorRT parsers libraries
ii  libnvparsers6                                 6.0.1-1+cuda10.0                                arm64        TensorRT parsers libraries
ii  nvidia-container-csv-tensorrt                 6.0.1.10-1+cuda10.0                             arm64        Jetpack TensorRT CSV file
ii  python-libnvinfer                             6.0.1-1+cuda10.0                                arm64        Python bindings for TensorRT
ii  python-libnvinfer-dev                         6.0.1-1+cuda10.0                                arm64        Python development package for TensorRT
ii  python3-libnvinfer                            6.0.1-1+cuda10.0                                arm64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev                        6.0.1-1+cuda10.0                                arm64        Python 3 development package for TensorRT
ii  tensorrt                                      6.0.1.10-1+cuda10.0                             arm64        Meta package of TensorRT                      
ii  uff-converter-tf                              6.0.1-1+cuda10.0                                arm64        UFF converter for TensorRT package

With the latest onnxruntime I get stuck right at the beginning:

Scanning dependencies of target onnxruntime_mlas
[  0%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/platform.cpp.o
[  0%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/threading.cpp.o
[  0%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/dgemm.cpp.o
[  0%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/sgemm.cpp.o
[  1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/qgemm.cpp.o
[  1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/convolve.cpp.o
[  1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/pooling.cpp.o
[  1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/reorder.cpp.o
[  1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/snchwc.cpp.o
[  1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/activate.cpp.o
[  1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/logistic.cpp.o
[  1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/tanh.cpp.o
[  1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/erf.cpp.o
[  1%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp.o
/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp: In function ‘void MlasQuantizeLinearKernel(const float*, OutputType*, size_t, float, int32_t)’:
/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp:179:79: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts
         vst1q_lane_u8((uint8_t*)Output + n, vreinterpretq_s32_u8(IntegerVector), 0);
                                                                               ^
/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp:179:79: error: cannot convert ‘__vector(4) int’ to ‘uint8x16_t {aka __vector(16) unsigned char}’ for argument ‘1’ to int32x4_t vreinterpretq_s32_u8(uint8x16_t)’
CMakeFiles/onnxruntime_mlas.dir/build.make:231: recipe for target 'CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp.o' failed
make[2]: *** [CMakeFiles/onnxruntime_mlas.dir/mnt/sdcard/onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp.o] Error 1
CMakeFiles/Makefile2:883: recipe for target 'CMakeFiles/onnxruntime_mlas.dir/all' failed
make[1]: *** [CMakeFiles/onnxruntime_mlas.dir/all] Error 2
Makefile:162: recipe for target 'all' failed
make: *** [all] Error 2
Traceback (most recent call last):
  File "/mnt/sdcard/onnxruntime/tools/ci_build/build.py", line 1051, in <module>
    sys.exit(main())
  File "/mnt/sdcard/onnxruntime/tools/ci_build/build.py", line 983, in main
    build_targets(cmake_path, build_dir, configs, args.parallel)
  File "/mnt/sdcard/onnxruntime/tools/ci_build/build.py", line 415, in build_targets
    run_subprocess(cmd_args)
  File "/mnt/sdcard/onnxruntime/tools/ci_build/build.py", line 197, in run_subprocess
    completed_process = subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell)
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/local/bin/cmake', '--build', '/mnt/sdcard/onnxruntime/build/Linux/Release', '--config', 'Release']' returned non-zero exit status 2
hossein1387 commented 4 years ago

Finally made it working. I am giving step by step instructions here to reproduce what I did. First, here is my system information (pay close attention to CUDA, cuDNN, TensorRT and JetPack version):

    * Name:           NVIDIA Jetson NANO/TX1                                                                                                           e-mail: raffaello@rnext.it
    * Type:           NANO/TX1
    * Jetpack:        4.3 [L4T 32.3.1]
    * GPU-Arch:       5.3
    * SN:             0323216129961
  - Libraries:
    * CUDA:           10.0.326
    * cuDNN:          7.6.3.28-1+cuda10.0
    * TensorRT:       6.0.1.10-1+cuda10.0
    * VisionWorks:    1.6.0.500n
    * OpenCV:         4.1.1 compiled CUDA: YES

The latest release (at commit c33dab3) is not compatible with the above CUDA, cuDNN and TensorRT versions. The problem most likely resides at the quantization file (onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp). I used the previous release (commit b783805). Here are the step by step instructions:

1- git clone --recursive https://github.com/Microsoft/onnxruntime
2- cd onnxruntime
3- git checkout b783805
4- export CUDACXX="/usr/local/cuda/bin/nvcc"
5- Modify  tools/ci_build/build.py
    - "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "ON"),
    + "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "OFF"),
6- Modify cmake/CMakeLists.txt
    -  set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_50,code=sm_50") # M series
    +  set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_53,code=sm_53") # Jetson support
7- ./build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu

step 4 is most likely unnecessary because in the latest release, cmake automatically finds the correct cuda.

Note: the Jetson TX1 with the latest jetpack is using up to 14GB of the available 16GB disk space. The native build of onnxruntime requires at least 3GB of disk space and 2GB of ram. I mounted a sdcard with ext4 filesystem format and cloned the repo into it (it will be extermely slow, took me 3-4h to compile the code, better option would be adding a ssd drive). Here is my df -Th:

Filesystem     Type      Size  Used Avail Use% Mounted on
/dev/mmcblk0p1 ext4       14G   13G  766M  95% /
none           devtmpfs  1.7G     0  1.7G   0% /dev
tmpfs          tmpfs     2.0G  4.0K  2.0G   1% /dev/shm
tmpfs          tmpfs     2.0G   20M  2.0G   1% /run
tmpfs          tmpfs     5.0M  4.0K  5.0M   1% /run/lock
tmpfs          tmpfs     2.0G     0  2.0G   0% /sys/fs/cgroup
tmpfs          tmpfs     396M   12K  396M   1% /run/user/120
tmpfs          tmpfs     396M  120K  396M   1% /run/user/1000
/dev/mmcblk2p1 ext4       15G  3.1G   11G  22% /media/askari/sd_card
/dev/loop0     vfat       16M   78K   16M   1% /media/askari/L4T-README

You can make a swap file to make sure you have enough RAM. You can disable GUI and desktop at boot time to free up space.

hossein1387 commented 4 years ago

I am closing the issue since the original problem is resolved.

jywu-msft commented 4 years ago

Finally made it working. I am giving step by step instructions here to reproduce what I did. First, here is my system information (pay close attention to CUDA, cuDNN, TensorRT and JetPack version):

    * Name:           NVIDIA Jetson NANO/TX1                                                                                                           e-mail: raffaello@rnext.it
    * Type:           NANO/TX1
    * Jetpack:        4.3 [L4T 32.3.1]
    * GPU-Arch:       5.3
    * SN:             0323216129961
  - Libraries:
    * CUDA:           10.0.326
    * cuDNN:          7.6.3.28-1+cuda10.0
    * TensorRT:       6.0.1.10-1+cuda10.0
    * VisionWorks:    1.6.0.500n
    * OpenCV:         4.1.1 compiled CUDA: YES

The latest release (at commit c33dab3) is not compatible with the above CUDA, cuDNN and TensorRT versions. The problem most likely resides at the quantization file (onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp). I used the previous release (commit b783805). Here are the step by step instructions:

1- git clone --recursive https://github.com/Microsoft/onnxruntime
2- cd onnxruntime
3- git checkout b783805
4- export CUDACXX="/usr/local/cuda/bin/nvcc"
5- Modify  tools/ci_build/build.py
    - "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "ON"),
    + "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "OFF"),
6- Modify cmake/CMakeLists.txt
    -  set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_50,code=sm_50") # M series
    +  set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_53,code=sm_53") # Jetson support
7- ./build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu

step 4 is most likely unnecessary because in the latest release, cmake automatically finds the correct cuda.

Note: the Jetson TX1 with the latest jetpack is using up to 14GB of the available 16GB disk space. The native build of onnxruntime requires at least 3GB of disk space and 2GB of ram. I mounted a sdcard with ext4 filesystem format and cloned the repo into it (it will be extermely slow, took me 3-4h to compile the code, better option would be adding a ssd drive). Here is my df -Th:

Filesystem     Type      Size  Used Avail Use% Mounted on
/dev/mmcblk0p1 ext4       14G   13G  766M  95% /
none           devtmpfs  1.7G     0  1.7G   0% /dev
tmpfs          tmpfs     2.0G  4.0K  2.0G   1% /dev/shm
tmpfs          tmpfs     2.0G   20M  2.0G   1% /run
tmpfs          tmpfs     5.0M  4.0K  5.0M   1% /run/lock
tmpfs          tmpfs     2.0G     0  2.0G   0% /sys/fs/cgroup
tmpfs          tmpfs     396M   12K  396M   1% /run/user/120
tmpfs          tmpfs     396M  120K  396M   1% /run/user/1000
/dev/mmcblk2p1 ext4       15G  3.1G   11G  22% /media/askari/sd_card
/dev/loop0     vfat       16M   78K   16M   1% /media/askari/L4T-README

You can make a swap file to make sure you have enough RAM. You can disable GUI and desktop at boot time to free up space.

thanks for documenting this!

@tracysh , @askhade FYI, seems like onnxruntime/core/mlas/lib/quantize.cpp does not build on some platforms (nvidia jetson in this case). can we fix it?

hossein1387 commented 4 years ago

to compile for arm64 (cpu only) follow the exact same procedure except use the following to compile: ./build.sh --config Release --update --build --build_wheel

jywu-msft commented 4 years ago

to compile for arm64 (cpu only) follow the exact same procedure except use the following to compile: ./build.sh --config Release --update --build --build_wheel

FYI, the build error you encountered in the latest release was fixed in master by https://github.com/microsoft/onnxruntime/commit/ebf23744eb1e587e12e3459b1891a89a840fc687

thanks again for documenting the steps and reporting the error!

hossein1387 commented 4 years ago

Awesome! Thanks

Kammmil commented 4 years ago

Hi guys, after ./build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu

how do i install it so i can use it in my python code? :) thanks in advance

hossein1387 commented 4 years ago

Depending on you build target, there will be a wheel file. For instance for a tensorrt build, you will have a file named: onnxruntime_gpu_tensorrt-1.0.0-cp36-cp36m-linux_aarch64 this file should be located in your build directory.

Kammmil commented 4 years ago

Depending on you build target, there will be a wheel file. For instance for a tensorrt build, you will have a file named: onnxruntime_gpu_tensorrt-1.0.0-cp36-cp36m-linux_aarch64 this file should be located in your build directory.

Thanks man but i don't see any file like it in the folder build i have 2 different folders: lib, Linux. Folder Linux has: Release and in Release there is: build gmock.pc libonnxruntime_providers_tensorrt.a onnxruntime onnxruntime_test_python_nuphar.py CMakeCache.txt gtest_main.pc libonnxruntime_session.a onnxruntime_config.h onnxruntime_test_python.py CMakeFiles gtest.pc libonnxruntime_test_utils.a onnxruntime_gpu_tensorrt.egg-info onnx_test_runner cmake_install.cmake libonnxruntime_common.a libonnxruntime_test_utils_for_framework.a onnxruntime_mlas_test opaque_api_test CPackConfig.cmake libonnxruntime_framework.a libonnxruntime_util.a onnxruntime_perf_test pybind11 CPackSourceConfig.cmake libonnxruntime_graph.a libonnx_test_data_proto.a onnxruntime_profile__2020-01-22_15-11-51.json PythonApiTestOptimizedModel.onnx CTestTestfile.cmake libonnxruntime_mlas.a libonnx_test_runner_common.a onnxruntime_pybind11_state.so testdata dist libonnxruntime_optimizer.a Makefile onnxruntime_test_all tml.pb.cc external libonnxruntime_providers.a onnx onnxruntime_test_python_backend.py tml.pb.h gmock_main.pc libonnxruntime_providers_cuda.a onnx_backend_test_series.py onnxruntime_test_python_keras.py VERSION_NUMBER

The folder build inside that has: bdist.linux-aarch64 and lib and folder lib has only: backend capi datasets init.py LICENSE Privacy.md ThirdPartyNotices.txt tools

so any other help or can you show me where i make mistake pls?

hossein1387 commented 4 years ago

If your build ended successfully, you should have a wheel. In the build directory, run the following command:

find ./ -name *.whl

This should point you to the wheel.

jywu-msft commented 4 years ago

I see you have a dist directory. The whl file should be in there.

Kammmil commented 4 years ago

I see you have a dist directory. The whl file should be in there.

@hossein1387 @jywu-msft thanks a lot guys :)

CoderHam commented 4 years ago

I followed the instructions above but I am getting an unused parameter error on onnxruntime rel-1.1.1

In file included from /home/workdir/Onnx/onnxruntime/onnxruntime/core/session/inference_session.cc:49:0:
/home/workdir/Onnx/onnxruntime/onnxruntime/core/providers/cpu/cpu_execution_provider.h: In constructor ‘onnxruntime::CPUExecutionProvider::CPUExecutionProvider(const onnxruntime::CPUExecutionProviderInfo&)’:
/home/workdir/Onnx/onnxruntime/onnxruntime/core/providers/cpu/cpu_execution_provider.h:28:65: error: unused parameter ‘info’ [-Werror=unused-parameter]
   explicit CPUExecutionProvider(const CPUExecutionProviderInfo& info)
jcarlosgarcia commented 4 years ago

Thanks @hossein1387 @jywu-msft ! Following these instructions worked like a charm on a TX2. Just note TX2 requires 62 architecture (compute_62 and sm_62).

thirdeye-callum commented 4 years ago

Finally made it working. I am giving step by step instructions here to reproduce what I did. First, here is my system information (pay close attention to CUDA, cuDNN, TensorRT and JetPack version):

    * Name:           NVIDIA Jetson NANO/TX1                                                                                                           e-mail: raffaello@rnext.it
    * Type:           NANO/TX1
    * Jetpack:        4.3 [L4T 32.3.1]
    * GPU-Arch:       5.3
    * SN:             0323216129961
  - Libraries:
    * CUDA:           10.0.326
    * cuDNN:          7.6.3.28-1+cuda10.0
    * TensorRT:       6.0.1.10-1+cuda10.0
    * VisionWorks:    1.6.0.500n
    * OpenCV:         4.1.1 compiled CUDA: YES

The latest release (at commit c33dab3) is not compatible with the above CUDA, cuDNN and TensorRT versions. The problem most likely resides at the quantization file (onnxruntime/onnxruntime/core/mlas/lib/quantize.cpp). I used the previous release (commit b783805). Here are the step by step instructions:

1- git clone --recursive https://github.com/Microsoft/onnxruntime
2- cd onnxruntime
3- git checkout b783805
4- export CUDACXX="/usr/local/cuda/bin/nvcc"
5- Modify  tools/ci_build/build.py
    - "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "ON"),
    + "-Donnxruntime_DEV_MODE=" + ("OFF" if args.android else "OFF"),
6- Modify cmake/CMakeLists.txt
    -  set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_50,code=sm_50") # M series
    +  set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_53,code=sm_53") # Jetson support
7- ./build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu

step 4 is most likely unnecessary because in the latest release, cmake automatically finds the correct cuda.

Note: the Jetson TX1 with the latest jetpack is using up to 14GB of the available 16GB disk space. The native build of onnxruntime requires at least 3GB of disk space and 2GB of ram. I mounted a sdcard with ext4 filesystem format and cloned the repo into it (it will be extermely slow, took me 3-4h to compile the code, better option would be adding a ssd drive). Here is my df -Th:

Filesystem     Type      Size  Used Avail Use% Mounted on
/dev/mmcblk0p1 ext4       14G   13G  766M  95% /
none           devtmpfs  1.7G     0  1.7G   0% /dev
tmpfs          tmpfs     2.0G  4.0K  2.0G   1% /dev/shm
tmpfs          tmpfs     2.0G   20M  2.0G   1% /run
tmpfs          tmpfs     5.0M  4.0K  5.0M   1% /run/lock
tmpfs          tmpfs     2.0G     0  2.0G   0% /sys/fs/cgroup
tmpfs          tmpfs     396M   12K  396M   1% /run/user/120
tmpfs          tmpfs     396M  120K  396M   1% /run/user/1000
/dev/mmcblk2p1 ext4       15G  3.1G   11G  22% /media/askari/sd_card
/dev/loop0     vfat       16M   78K   16M   1% /media/askari/L4T-README

You can make a swap file to make sure you have enough RAM. You can disable GUI and desktop at boot time to free up space.

This guide has been really useful to me, thanks! If you have any insight into the issues I am facing here I would be super grateful! https://github.com/microsoft/onnxruntime/issues/3240

arjunbinu commented 4 years ago

the jetson nano I tested awhile back was using jetpack 4.2.1 which comes with cuda 10.0, cudnn 7.5, and tensorrt 5.1.6 latest version should work as well. see https://developer.nvidia.com/embedded/jetpack-archive

I'm facing issues while building onnxruntime in my jetson TX2 (Jetpack 4.2.2) with tensorrt 5.1.6 and cuda 10. I checked out both branches b783805 and ebf2374 but didnt work. your help would be much appreciated

jywu-msft commented 4 years ago

the jetson nano I tested awhile back was using jetpack 4.2.1 which comes with cuda 10.0, cudnn 7.5, and tensorrt 5.1.6 latest version should work as well. see https://developer.nvidia.com/embedded/jetpack-archive

I'm facing issues while building onnxruntime in my jetson TX2 (Jetpack 4.2.2) with tensorrt 5.1.6 and cuda 10. I checked out both branches b783805 and ebf2374 but didnt work. your help would be much appreciated

Are you able to upgrade to Jetpack 4.3 ? (with TensorRT 6.0.1) If you need to stick with Jetpack 4.2.2 (TensorRt 5.1.6) you will need to use https://github.com/microsoft/onnxruntime/commit/4bb6385dcab5e6bb0af2bf494538b92fbdde1683 , which is the commit prior to TensorRT 6.x support being added. Or you can use rel-0.5.0 branch for an official release that supports TensorRT 5.x. The commits you reference are only compatible with TensorRT 6.x

jywu-msft commented 4 years ago

btw, further issues related to TensorRT support on Nvidia Tegra/ARM64 should be opened as new issues. This issue is already closed and the thread has gotten too long. thanks!