Jetson Xavier - building from source

AndreV84 commented 4 years ago

I tried the solution proposed here: `../build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu 2020-02-14 14:34:50,960 Build [INFO] - Build started 2020-02-14 14:34:50,960 Build [DEBUG] - Running subprocess in '/code/onnxruntime' ['git', 'submodule', 'sync', '--recursive'] Synchronizing submodule url for 'cmake/external/DNNLibrary' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/flatbuffers' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/glog' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/pybind11' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/cub' Synchronizing submodule url for 'cmake/external/date' Synchronizing submodule url for 'cmake/external/eigen' Synchronizing submodule url for 'cmake/external/gemmlowp' Synchronizing submodule url for 'cmake/external/googletest' Synchronizing submodule url for 'cmake/external/grpc' Synchronizing submodule url for 'cmake/external/grpc/third_party/abseil-cpp' Synchronizing submodule url for 'cmake/external/grpc/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/libFuzzer' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/re2' Synchronizing submodule url for 'cmake/external/grpc/third_party/boringssl' Synchronizing submodule url for 'cmake/external/grpc/third_party/boringssl-with-bazel' Synchronizing submodule url for 'cmake/external/grpc/third_party/cares/cares' Synchronizing submodule url for 'cmake/external/grpc/third_party/data-plane-api' Synchronizing submodule url for 'cmake/external/grpc/third_party/gflags' Synchronizing submodule url for 'cmake/external/grpc/third_party/gflags/doc' Synchronizing submodule url for 'cmake/external/grpc/third_party/googleapis' Synchronizing submodule url for 'cmake/external/grpc/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/libcxx' Synchronizing submodule url for 'cmake/external/grpc/third_party/libcxxabi' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/protoc-gen-validate' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/zlib' Synchronizing submodule url for 'cmake/external/mimalloc' Synchronizing submodule url for 'cmake/external/nsync' Synchronizing submodule url for 'cmake/external/onnx' Synchronizing submodule url for 'cmake/external/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/onnx-tensorrt' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/protobuf' Synchronizing submodule url for 'cmake/external/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/re2' Synchronizing submodule url for 'cmake/external/spdlog' Synchronizing submodule url for 'cmake/external/tvm' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/HalideIR' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/dlpack' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/dmlc-core' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/rang' Synchronizing submodule url for 'cmake/external/wil' 2020-02-14 14:34:52,305 Build [DEBUG] - Running subprocess in '/code/onnxruntime' ['git', 'submodule', 'update', '--init', '--recursive'] 2020-02-14 14:34:54,502 Build [INFO] - Generating CMake build tree 2020-02-14 14:34:54,504 Build [DEBUG] - Running subprocess in '/code/onnxruntime/build/Linux/Release' ['/usr/local/bin/cmake', '/code/onnxruntime/cmake', '-Donnxruntime_RUN_ONNX_TESTS=OFF', '-Donnxruntime_GENERATE_TEST_REPORTS=ON', '-Donnxruntime_DEV_MODE=OFF', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-Donnxruntime_USE_CUDA=ON', '-Donnxruntime_USE_NSYNC=OFF', '-Donnxruntime_CUDNN_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_USE_AUTOML=OFF', '-Donnxruntime_CUDA_HOME=/usr/local/cuda', '-Donnxruntime_USE_JEMALLOC=OFF', '-Donnxruntime_USE_MIMALLOC=OFF', '-Donnxruntime_ENABLE_PYTHON=ON', '-Donnxruntime_BUILD_CSHARP=OFF', '-Donnxruntime_BUILD_SHARED_LIB=OFF', '-Donnxruntime_USE_EIGEN_FOR_BLAS=ON', '-Donnxruntime_USE_OPENBLAS=OFF', '-Donnxruntime_USE_MKLDNN=OFF', '-Donnxruntime_USE_MKLML=OFF', '-Donnxruntime_USE_GEMMLOWP=OFF', '-Donnxruntime_USE_NGRAPH=OFF', '-Donnxruntime_USE_OPENVINO=OFF', '-Donnxruntime_USE_OPENVINO_BINARY=OFF', '-Donnxruntime_USE_OPENVINO_SOURCE=OFF', '-Donnxruntime_USE_OPENVINO_MYRIAD=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP16=OFF', '-Donnxruntime_USE_OPENVINO_CPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_VAD_M=OFF', '-Donnxruntime_USE_OPENVINO_VAD_F=OFF', '-Donnxruntime_USE_NNAPI=OFF', '-Donnxruntime_USE_OPENMP=ON', '-Donnxruntime_USE_TVM=OFF', '-Donnxruntime_USE_LLVM=OFF', '-Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF', '-Donnxruntime_USE_BRAINSLICE=OFF', '-Donnxruntime_USE_NUPHAR=OFF', '-Donnxruntime_USE_EIGEN_THREADPOOL=OFF', '-Donnxruntime_USE_TENSORRT=ON', '-Donnxruntime_TENSORRT_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_CROSS_COMPILING=OFF', '-Donnxruntime_BUILD_SERVER=OFF', '-Donnxruntime_BUILD_x86=OFF', '-Donnxruntime_USE_FULL_PROTOBUF=ON', '-Donnxruntime_DISABLE_CONTRIB_OPS=OFF', '-Donnxruntime_MSVC_STATIC_RUNTIME=OFF', '-Donnxruntime_ENABLE_LANGUAGE_INTEROP_OPS=OFF', '-Donnxruntime_USE_DML=OFF', '-DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs', '-Donnxruntime_PYBIND_EXPORT_OPSCHEMA=OFF', '-DCMAKE_BUILD_TYPE=Release'] Use gtest from submodule -- Found PythonInterp: /usr/bin/python3 (found version "3.6.9") -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.6.9", minimum required is "3.5") Use protobuf from submodule -- The CUDA compiler identification is NVIDIA 10.0.326 -- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc - broken CMake Error at /usr/local/share/cmake-3.17/Modules/CMakeTestCUDACompiler.cmake:46 (message): The CUDA compiler

"/usr/local/cuda-10.0/bin/nvcc"

is not able to compile a simple test program.

It fails with the following output:

Change Dir: /code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/make cmTC_bb43d/fast && /usr/bin/make -f CMakeFiles/cmTC_bb43d.dir/build.make CMakeFiles/cmTC_bb43d.dir/build make[1]: Entering directory '/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp' Building CUDA object CMakeFiles/cmTC_bb43d.dir/main.cu.o /usr/local/cuda-10.0/bin/nvcc -cudart shared -Xcompiler=-fPIE -x cu -c /code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_bb43d.dir/main.cu.o Linking CUDA executable cmTC_bb43d /usr/local/bin/cmake -E cmake_link_script CMakeFiles/cmTC_bb43d.dir/link.txt --verbose=1 /usr/bin/g++ CMakeFiles/cmTC_bb43d.dir/main.cu.o -o cmTC_bb43d -lcudadevrt -lcudart_static -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib/stubs" -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib" -lcudadevrt -lcudart /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::globalState::initializeDriverEntrypoints()': :(.text+0x23488): undefined reference todlsym' :(.text+0x234b0): undefined reference to dlsym' :(.text+0x234d4): undefined reference todlsym' :(.text+0x234f8): undefined reference to dlsym' :(.text+0x2351c): undefined reference todlsym' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o)::(.text+0x23540): more undefined references to dlsym' follow /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::globalState::loadDriverInternal()': :(.text+0x288cc): undefined reference to dlopen' :(.text+0x28904): undefined reference todlclose' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::__loadDriverInternalUtil()': :(.text+0x289e0): undefined reference todlopen' :(.text+0x28a14): undefined reference to dlclose' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::globalState::initializeDriverInternal()': :(.text+0x2b664): undefined reference to dlclose' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosInit()': :(.text+0x5c7bc): undefined reference to dlerror' :(.text+0x5c7c8): undefined reference todlopen' :(.text+0x5c7dc): undefined reference to dlsym' :(.text+0x5c7e4): undefined reference todlerror' :(.text+0x5c7f4): undefined reference to dlclose' :(.text+0x5c838): undefined reference todlerror' :(.text+0x5c844): undefined reference to dlopen' :(.text+0x5c858): undefined reference todlsym' :(.text+0x5c860): undefined reference to dlerror' :(.text+0x5c870): undefined reference todlclose' :(.text+0x5c8b4): undefined reference to dlerror' :(.text+0x5c8c0): undefined reference todlopen' :(.text+0x5c8d4): undefined reference to dlsym' :(.text+0x5c8dc): undefined reference todlerror' :(.text+0x5c8ec): undefined reference to dlclose' :(.text+0x5c930): undefined reference todlerror' :(.text+0x5c93c): undefined reference to dlopen' :(.text+0x5c950): undefined reference todlsym' :(.text+0x5c958): undefined reference to dlerror' :(.text+0x5c968): undefined reference todlclose' :(.text+0x5c9a0): undefined reference to dlerror' :(.text+0x5c9ac): undefined reference todlopen' :(.text+0x5c9c0): undefined reference to dlsym' :(.text+0x5c9c8): undefined reference todlerror' :(.text+0x5c9d8): undefined reference to dlclose' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosSemaphoreCreate(sem_t, int)': :(.text+0x5d910): undefined reference to sem_init' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosSemaphoreDestroy(sem_t)': :(.text+0x5d92c): undefined reference to sem_destroy' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosSemaphoreWait(sem_t, unsigned int)': :(.text+0x5da10): undefined reference to sem_timedwait' :(.text+0x5da48): undefined reference tosem_wait' :(.text+0x5da60): undefined reference to sem_trywait' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosSemaphoreSignal(sem_t)': :(.text+0x5dab0): undefined reference to sem_post' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosVirtualReserveInRangeBug1778973WARInit()': :(.text+0x5f448): undefined reference to pthread_mutexattr_init' :(.text+0x5f464): undefined reference topthread_mutexattr_settype' :(.text+0x5f474): undefined reference to pthread_mutexattr_setpshared' :(.text+0x5f484): undefined reference topthread_mutexattr_setprotocol' :(.text+0x5f4a4): undefined reference to pthread_mutexattr_destroy' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosPosixInit()': :(.text+0x5f4f0): undefined reference to dlerror' :(.text+0x5f4fc): undefined reference todlopen' :(.text+0x5f510): undefined reference to dlsym' :(.text+0x5f518): undefined reference todlerror' :(.text+0x5f528): undefined reference to dlclose' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosVirtualReserveInRange(unsigned long, void, void, unsigned long)': :(.text+0x5f768): undefined reference to pthread_once' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosLoadLibrary(char const)': :(.text+0x5fc8c): undefined reference to dlerror' :(.text+0x5fca0): undefined reference todlopen' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosLoadLibraryUnsafe(char const)': :(.text+0x5fcb4): undefined reference to dlerror' :(.text+0x5fcc8): undefined reference todlopen' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosFreeLibrary(void*)': :(.text+0x5fcd4): undefined reference todlclose' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosGetProcAddress(void*, char const*)': :(.text+0x5fce8): undefined reference todlsym' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosTlsAlloc(void (*)(void*))': :(.text+0x5fdec): undefined reference topthread_key_create' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosTlsFree(unsigned int)': :(.text+0x5fe10): undefined reference topthread_key_delete' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosTlsGetValue(unsigned int)': :(.text+0x5fe18): undefined reference topthread_getspecific' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosTlsSetValue(unsigned int, void*)': :(.text+0x5fe28): undefined reference topthread_setspecific' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosInitializeCriticalSectionWithSharedFlag(pthread_mutex_t*, int)': :(.text+0x5fef4): undefined reference topthread_mutexattr_init' :(.text+0x5ff14): undefined reference to pthread_mutexattr_settype' :(.text+0x5ff24): undefined reference topthread_mutexattr_setpshared' :(.text+0x5ff34): undefined reference to pthread_mutexattr_setprotocol' :(.text+0x5ff50): undefined reference topthread_mutexattr_destroy' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosInitializeCriticalSection(pthread_mutex_t*)': :(.text+0x5ff70): undefined reference topthread_mutexattr_init' :(.text+0x5ff8c): undefined reference to pthread_mutexattr_settype' :(.text+0x5ff9c): undefined reference topthread_mutexattr_setpshared' :(.text+0x5ffac): undefined reference to pthread_mutexattr_setprotocol' :(.text+0x5ffc8): undefined reference topthread_mutexattr_destroy' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosInitializeCriticalSectionShared(pthread_mutex_t*)': :(.text+0x5ffe8): undefined reference topthread_mutexattr_init' :(.text+0x60004): undefined reference to pthread_mutexattr_settype' :(.text+0x60014): undefined reference topthread_mutexattr_setpshared' :(.text+0x60024): undefined reference to pthread_mutexattr_setprotocol' :(.text+0x60040): undefined reference topthread_mutexattr_destroy' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosTryEnterCriticalSection(pthread_mutex_t*)': :(.text+0x60058): undefined reference topthread_mutex_trylock' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosInitRWLockEx(void**, void*, unsigned long)': :(.text+0x600b4): undefined reference topthread_rwlockattr_init' :(.text+0x600c4): undefined reference to pthread_rwlockattr_setpshared' :(.text+0x600d4): undefined reference topthread_rwlock_init' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosInitRWLock(void**)': :(.text+0x60114): undefined reference topthread_rwlockattr_init' :(.text+0x60144): undefined reference to pthread_rwlockattr_setpshared' :(.text+0x60154): undefined reference topthread_rwlock_init' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosAcquireReaderLock(void**)': :(.text+0x60164): undefined reference topthread_rwlock_rdlock' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosAcquireWriterLock(void**)': :(.text+0x6016c): undefined reference topthread_rwlock_wrlock' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosTryAcquireReaderLock(void**)': :(.text+0x6017c): undefined reference topthread_rwlock_tryrdlock' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosTryAcquireWriterLock(void**)': :(.text+0x601a4): undefined reference topthread_rwlock_trywrlock' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosReleaseReaderLock(void**)': :(.text+0x601c4): undefined reference topthread_rwlock_unlock' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosReleaseWriterLock(void**)': :(.text+0x601cc): undefined reference topthread_rwlock_unlock' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosDestroyRWLockEx(void**)': :(.text+0x601d4): undefined reference topthread_rwlock_destroy' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosDestroyRWLock(void**)': :(.text+0x601ec): undefined reference topthread_rwlock_destroy' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosOnce(int*, void (*)())': :(.text+0x60210): undefined reference topthread_once' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosCondCreateWithSharedFlag(pthread_cond_t*, int)': :(.text+0x60250): undefined reference topthread_condattr_setpshared' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosCondCreate(pthread_cond_t*)': :(.text+0x602b0): undefined reference topthread_condattr_setpshared' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosCondCreateShared(pthread_cond_t*)': :(.text+0x60310): undefined reference topthread_condattr_setpshared' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function cudart::cuosThreadCreateWithName(cudart::CUOSthread_st**, int (*)(void*), void*, char const*)': :(.text+0x60564): undefined reference topthread_create' :(.text+0x60578): undefined reference to pthread_setname_np' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosThreadCreate(cudart::CUOSthread_st, int ()(void), void)': :(.text+0x60640): undefined reference to pthread_create' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosThreadJoin(cudart::CUOSthread_st, int)': :(.text+0x606a8): undefined reference to pthread_join' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosThreadDetach(cudart::CUOSthread_st)': :(.text+0x60708): undefined reference to pthread_detach' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosHasThreadExited(cudart::CUOSthread_st)': :(.text+0x60758): undefined reference to pthread_kill' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosShmCreateNamedEx(void, char const*, unsigned long, cudart::cuosShmInfoEx_st)': :(.text+0x60ee0): undefined reference to shm_unlink' :(.text+0x60ef8): undefined reference toshm_open' :(.text+0x60f98): undefined reference to shm_unlink' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosShmOpenNamedEx(void, char const, unsigned long, cudart::cuosShmInfoEx_st)': :(.text+0x61124): undefined reference to shm_open' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosShmCloseEx(cudart::cuosShmInfoEx_st, unsigned int, unsigned int)': :(.text+0x61370): undefined reference to shm_unlink' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functioncudart::cuosSetThreadName(cudart::CUOSthread_st, char const)': :(.text+0x62294): undefined reference to pthread_setname_np' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functionCUOSdlsymLoader<int ()(int, sockaddr, unsigned int, int)>::~CUOSdlsymLoader()': :(.text._ZN15CUOSdlsymLoaderIPFiiP8sockaddrPjiEED2Ev[_ZN15CUOSdlsymLoaderIPFiiP8sockaddrPjiEED5Ev]+0x18): undefined reference to dlclose' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functionCUOSdlsymLoader<int ()(int, int)>::~CUOSdlsymLoader()': :(.text._ZN15CUOSdlsymLoaderIPFiPiiEED2Ev[_ZN15CUOSdlsymLoaderIPFiPiiEED5Ev]+0x18): undefined reference to dlclose' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functionCUOSdlsymLoader<int ()(unsigned long, unsigned long, unsigned long const)>::~CUOSdlsymLoader()': :(.text._ZN15CUOSdlsymLoaderIPFimmPKmEED2Ev[_ZN15CUOSdlsymLoaderIPFimmPKmEED5Ev]+0x18): undefined reference to dlclose' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functionCUOSdlsymLoader<int ()(unsigned long, unsigned long, unsigned long)>::~CUOSdlsymLoader()': :(.text._ZN15CUOSdlsymLoaderIPFimmPmEED2Ev[_ZN15CUOSdlsymLoaderIPFimmPmEED5Ev]+0x18): undefined reference to dlclose' /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In functionCUOSdlsymLoader<int (*)()>::~CUOSdlsymLoader()': :(.text._ZN15CUOSdlsymLoaderIPFivEED2Ev[_ZN15CUOSdlsymLoaderIPFivEED5Ev]+0x18): undefined reference to `dlclose' collect2: error: ld returned 1 exit status CMakeFiles/cmTC_bb43d.dir/build.make:103: recipe for target 'cmTC_bb43d' failed make[1]: ** [cmTC_bb43d] Error 1 make[1]: Leaving directory '/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp' Makefile:138: recipe for target 'cmTC_bb43d/fast' failed make: [cmTC_bb43d/fast] Error 2

CMake will not be able to correctly generate this project. Call Stack (most recent call first): CMakeLists.txt:715 (enable_language)

-- Configuring incomplete, errors occurred! See also "/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeOutput.log". See also "/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeError.log". Traceback (most recent call last): File "/code/onnxruntime/tools/ci_build/build.py", line 1043, in sys.exit(main()) File "/code/onnxruntime/tools/ci_build/build.py", line 972, in main args, cmake_extra_args) File "/code/onnxruntime/tools/ci_build/build.py", line 422, in generate_build_tree run_subprocess(cmake_args + ["-DCMAKE_BUILD_TYPE={}".format(config)], cwd=config_build_dir) File "/code/onnxruntime/tools/ci_build/build.py", line 196, in run_subprocess return subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell) File "/usr/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['/usr/local/bin/cmake', '/code/onnxruntime/cmake', '-Donnxruntime_RUN_ONNX_TESTS=OFF', '-Donnxruntime_GENERATE_TEST_REPORTS=ON', '-Donnxruntime_DEV_MODE=OFF', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-Donnxruntime_USE_CUDA=ON', '-Donnxruntime_USE_NSYNC=OFF', '-Donnxruntime_CUDNN_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_USE_AUTOML=OFF', '-Donnxruntime_CUDA_HOME=/usr/local/cuda', '-Donnxruntime_USE_JEMALLOC=OFF', '-Donnxruntime_USE_MIMALLOC=OFF', '-Donnxruntime_ENABLE_PYTHON=ON', '-Donnxruntime_BUILD_CSHARP=OFF', '-Donnxruntime_BUILD_SHARED_LIB=OFF', '-Donnxruntime_USE_EIGEN_FOR_BLAS=ON', '-Donnxruntime_USE_OPENBLAS=OFF', '-Donnxruntime_USE_MKLDNN=OFF', '-Donnxruntime_USE_MKLML=OFF', '-Donnxruntime_USE_GEMMLOWP=OFF', '-Donnxruntime_USE_NGRAPH=OFF', '-Donnxruntime_USE_OPENVINO=OFF', '-Donnxruntime_USE_OPENVINO_BINARY=OFF', '-Donnxruntime_USE_OPENVINO_SOURCE=OFF', '-Donnxruntime_USE_OPENVINO_MYRIAD=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP16=OFF', '-Donnxruntime_USE_OPENVINO_CPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_VAD_M=OFF', '-Donnxruntime_USE_OPENVINO_VAD_F=OFF', '-Donnxruntime_USE_NNAPI=OFF', '-Donnxruntime_USE_OPENMP=ON', '-Donnxruntime_USE_TVM=OFF', '-Donnxruntime_USE_LLVM=OFF', '-Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF', '-Donnxruntime_USE_BRAINSLICE=OFF', '-Donnxruntime_USE_NUPHAR=OFF', '-Donnxruntime_USE_EIGEN_THREADPOOL=OFF', '-Donnxruntime_USE_TENSORRT=ON', '-Donnxruntime_TENSORRT_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_CROSS_COMPILING=OFF', '-Donnxruntime_BUILD_SERVER=OFF', '-Donnxruntime_BUILD_x86=OFF', '-Donnxruntime_USE_FULL_PROTOBUF=ON', '-Donnxruntime_DISABLE_CONTRIB_OPS=OFF', '-Donnxruntime_MSVC_STATIC_RUNTIME=OFF', '-Donnxruntime_ENABLE_LANGUAGE_INTEROP_OPS=OFF', '-Donnxruntime_USE_DML=OFF', '-DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs', '-Donnxruntime_PYBIND_EXPORT_OPSCHEMA=OFF', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1. `

skottmckay commented 4 years ago

The error is from CMake trying to validate that the CUDA compiler works. CMake generates a really simple program using the following

  file(WRITE ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/CMakeTmp/main.cu
    "#ifndef __CUDACC__\n"
    "# error \"The CMAKE_CUDA_COMPILER is set to an invalid CUDA compiler\"\n"
    "#endif\n"
    "int main(){return 0;}\n")

It tries to compile it using the below command, and the errors you see are from this step:

/usr/local/cuda-10.0/bin/nvcc -cudart shared -Xcompiler=-fPIE -x cu -c /code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_bb43d.dir/main.cu.o

Is your CUDA installation valid?

AndreV84 commented 4 years ago

Hi @skottmckay Thank you for your response. I shall validate the CUDA installation and will get back to you.

AndreV84 commented 4 years ago

I can run the following: #include "stdio.h" int main() { printf("Hello, world\n"); return 0; } nvcc hello1.cu ./a.out Hello, world Moreover, I can see that there is no file main.cu in the folder /code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp/ Thanks

AndreV84 commented 4 years ago

The code below also builds and executes: #include "stdio.h" __global__ void add(int a, int b, int *c) { *c = a + b; } int main() { int a,b,c; int *dev_c; a=3; b=4; cudaMalloc((void**)&dev_c, sizeof(int)); add<<<1,1>>>(a,b,dev_c); cudaMemcpy(&c, dev_c, sizeof(int), cudaMemcpyDeviceToHost); printf("%d + %d is %d\n", a, b, c); cudaFree(dev_c); return 0; }

AndreV84 commented 4 years ago

source: http://www.math.uaa.alaska.edu/~afkjm/cs448/handouts/cuda-firstprograms.pdf

skottmckay commented 4 years ago

The main.cu is temporary and cmake will most likely clean it up before the attempted build exits.

Can you try with the exact same command line that cmake is generating, with the only change being the path to the cu file?

AndreV84 commented 4 years ago

change it how? if I omit the Release parameter it by default tries to use the default folder: /build.sh --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu protobuf from submodule -- -- 3.11.3.0 -- The CUDA compiler identification is NVIDIA 10.0.326 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - broken CMake Error at /usr/local/share/cmake-3.17/Modules/CMakeTestCUDACompiler.cmake:46 (message): The CUDA compiler

"/usr/local/cuda/bin/nvcc"

is not able to compile a simple test program.

It fails with the following output:

Change Dir: /code/onnxruntime/build/Linux/Debug/CMakeFiles/CMakeTmp

Do you mean that I shall try hardcoding the path parameter to the file /tools/ci_build/build.py?

AndreV84 commented 4 years ago

I will try to delete the folder, clone it from github and try again

AndreV84 commented 4 years ago

CMakeOutput.log

AndreV84 commented 4 years ago

CMakeError.log

AndreV84 commented 4 years ago

where exactly do I have to change the path, could you extend, please?

skottmckay commented 4 years ago

Instead of this: nvcc hello1.cu

Please try this:

/usr/local/cuda-10.0/bin/nvcc -cudart shared -Xcompiler=-fPIE -x cu -c <path to>/hello1.cu -o <path to>/hello1.cu.o

so that you're more exactly replicating the command cmake is generating.

AndreV84 commented 4 years ago

the output is as follows: `/usr/local/cuda-10.0/bin/nvcc -cudart shared -Xcompiler=-fPIE -x cu -c /home/nvidia/hello1.cu -o /home/nvidia/hello1.cu.o

nvidia@nvidia-desktop:~$ ls add cmake-3.12.3.tar.gz Downloads hello1 out Templates
add.cu code edited hello1.cu Music Pictures txttt workdir add.o deploy examples.desktop hello1.cu.o NeMo pilot up Desktop gst-rtsp-server-1.14.1 hello.cu Public Videos apex Documents gst-rtsp-server-1.14.1.tar.xz i o4.jpg super VisionWorks-SFM-0.90-Samples

nvidia@nvidia-desktop:~$ chmod +x hello1.cu.o

nvidia@nvidia-desktop:~$ ./hello1.cu.o -bash: ./hello1.cu.o: cannot execute binary file: Exec format error `

skottmckay commented 4 years ago

The CUDA compiler seems to be able to generate the object file, so that's good. But linking it to create the executable may be the issue.

If you try this command is it successful or does it replicate the problem? If it works it should create an executable called 'hello1'

/usr/bin/g++ <path to>/hello1.cu.o -o hello1 -lcudadevrt -lcudart_static -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib/stubs" -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib" -lcudadevrt -lcudart

AndreV84 commented 4 years ago

the output is as follows output.log

AndreV84 commented 4 years ago

similar errors are said to be characteristic to cases where dl library flag is missed [ -ldl] reference: https://forum.hdfgroup.org/t/h5pl-c-text-0x749-undefined-reference-to-dlsym/4913

output_withdlflag.log

and pthread flag might be missed reference:https://stackoverflow.com/questions/23556042/undefined-reference-issues-using-semaphores

` /usr/bin/g++ /home/nvidia/hello1.cu.o -o hello1 -lcudadevrt -lcudart_static -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib/stubs" -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib" -lcudadevrt -lcudart -ldl -lrt

/usr/bin/ld: /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): undefined reference to symbol 'pthread_rwlockattr_init@@GLIBC_2.17' //lib/aarch64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status `

skottmckay commented 4 years ago

Unfortunately I don't have a machine I can replicate this on. Seems like a fundamental issue for the link to fail when it's just involving some simple CMake + CUDA pieces and no TensorRT components. Given that the cmake setup seems to work for everyone else (I haven't seen an Issue where it's failing in the CMake test for CUDA) there's a good chance there's something about your particular setup that is making it unhappy. If adding '-ldl' was necessary I'd expect it to have shown up as an issue far sooner.

The folks who can dig into this further will be in the office on Tuesday PST as Monday is the President's Day holiday in the US.

skottmckay commented 4 years ago

@jywu-msft so you have any insights into this issue?

CoderHam commented 4 years ago

@skottmckay @jywu-msft Facing the same issue on Jetson Xavier as well. Cuda 10.0 JetPack 4.3 Onnxruntime 1.1.1

jywu-msft commented 4 years ago

Unfortunately we don't have a Jetson Xavier available for us to reproduce this issue. We've tested on Jetson Nano and others have tested on TX1, TX2 Adding @hossein1387 who has participated in threads about ORT on Jetson to see if he has any insights.

AndreV84 commented 4 years ago

thank you for following up!

AndreV84 commented 4 years ago

folks said:

A cuda program need to be compiled with nvcc. It won't work to compile it by a standard g++ with cuda linked. source: (https://devtalk.nvidia.com/default/topic/1071650/jetson-agx-xavier/validation-of-cuda-installation/post/5431992/#5431992)

jywu-msft commented 4 years ago

@AndreV84 , @CoderHam
are both of you using CMake 3.17rc1?

CoderHam commented 4 years ago

Yes @jywu-msft I am using cmake-3.17.0-rc1

AndreV84 commented 4 years ago

cmake --version cmake version 3.17.0-rc1

jywu-msft commented 4 years ago

It looks to me like cmake 3.17 rc1 has some behavior change where it uses static cuda runtime by default, which may be why the link command doesn't have additional -ldl -lpthread

Can you use the latest stable cmake version 3.16 instead? or, try building with additional cmake option for setting the runtime library type (I haven't tested this out yet)

./build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu --cmake_extra_defines CMAKE_CUDA_RUNTIME_LIBRARY=Shared

AndreV84 commented 4 years ago

tried the command above and the output is still the same; log.log trying with cmake-3.16.4/

jywu-msft commented 4 years ago

tried the command above and the output is still the same;

I haven't tested it myself. it's possible other changes need to be made in onnxruntime/cmake/CMakeLists.txt your log shows it's still trying to link against static cuda runtime

/usr/bin/g++ CMakeFiles/cmTC_eb5cb.dir/main.cu.o -o cmTC_eb5cb -lcudadevrt -lcudart_static -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib/stubs" -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib" -lcudadevrt -lcudart /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o)

Would it be possible for you to try cmake 3.16 instead?

AndreV84 commented 4 years ago

I used to download from here: https://github.com/Kitware/CMake/releases/download/v3.16.4/cmake-3.16.4.tar.gz

Upd: I can try with this one: https://github.com/Kitware/CMake/releases/download/v3.16.0/cmake-3.16.0.tar.gz

jywu-msft commented 4 years ago

I used to download from here: https://cmake.org/download/ https://github.com/Kitware/CMake/releases/download/v3.16.4/cmake-3.16.4.tar.gz where do I get 3.16 from? could you provide exact url, please? I am still building 3.16.4 and will report once tested

sorry. what i meant was 3.16.x , so i think 3.16.4 should be fine.

AndreV84 commented 4 years ago

cmake --version cmake version 3.16.4 tthe output is as attached 16.log rosoft/onnxruntime/files/4251775/16.log)

jywu-msft commented 4 years ago

cmake --version cmake version 3.16.4 tthe output is as attached 16.log rosoft/onnxruntime/files/4251775/16.log)

The current master branch of onnxruntime requires TRT 7.0.x for integration. TRT 7 is not yet available in JetPack. To build onnxruntime with TRT 6, please use an older version of onnxruntime. please follow https://github.com/microsoft/onnxruntime/issues/2684#issuecomment-568548387

please replace step 3 with git checkout ebf2374

CoderHam commented 4 years ago

Thanks @jywu-msft :)

jywu-msft commented 4 years ago

cmake --version cmake version 3.16.4 tthe output is as attached 16.log rosoft/onnxruntime/files/4251775/16.log)

btw, thanks for confirming that cmake 3.16.4 works. I can see in your log that the CUDA compiler check succeeded.

-- The CUDA compiler identification is NVIDIA 10.0.326 -- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- works

we will look into how to get it fixed for cmake 3.17 , but it is still RC1, so I suggest sticking to the latest stable (currently cmake 3.16.4)

AndreV84 commented 4 years ago

[ 32%] Built target onnx Scanning dependencies of target onnxruntime_providers [ 32%] Building CXX object CMakeFiles/onnxruntime_providers.dir/home/nvidia/code/onnxruntime/onnxruntime/core/providers/cpu/activation/activations.cc.o In file included from /home/nvidia/code/onnxruntime/cmake/external/eigen/Eigen/Core:313:0, from /home/nvidia/code/onnxruntime/onnxruntime/core/util/math_cpuonly.h:31, from /home/nvidia/code/onnxruntime/onnxruntime/core/providers/cpu/activation/activations.h:8, from /home/nvidia/code/onnxruntime/onnxruntime/core/providers/cpu/activation/activations.cc:4: /home/nvidia/code/onnxruntime/cmake/external/eigen/Eigen/src/Core/products/GeneralBlockPanelKernel.h: In member function ‘void Eigen::internal::gebp_traits<float, float, false, false, 4, 0>::updateRhs(const RhsScalar*, Eigen::internal::gebp_traits<float, float, false, false, 4, 0>::RhsPacketx4&) const’: /home/nvidia/code/onnxruntime/cmake/external/eigen/Eigen/src/Core/products/GeneralBlockPanelKernel.h:1079:55: error: unused parameter ‘b’ [-Werror=unused-parameter] EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacketx4& dest) const ^ /home/nvidia/code/onnxruntime/cmake/external/eigen/Eigen/src/Core/products/GeneralBlockPanelKernel.h:1079:71: error: unused parameter ‘dest’ [-Werror=unused-parameter] EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacketx4& dest) const ^~~~ /home/nvidia/code/onnxruntime/cmake/external/eigen/Eigen/src/Core/products/GeneralBlockPanelKernel.h: In member function ‘void Eigen::internal::gebp_traits<double, double, false, false, 4>::updateRhs(const RhsScalar*, Eigen::internal::gebp_traits<double, double, false, false, 4>::RhsPacketx4&) const’: /home/nvidia/code/onnxruntime/cmake/external/eigen/Eigen/src/Core/products/GeneralBlockPanelKernel.h:1148:55: error: unused parameter ‘b’ [-Werror=unused-parameter] EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacketx4& dest) const ^ /home/nvidia/code/onnxruntime/cmake/external/eigen/Eigen/src/Core/products/GeneralBlockPanelKernel.h:1148:71: error: unused parameter ‘dest’ [-Werror=unused-parameter] EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacketx4& dest) const ^~~~ cc1plus: all warnings being treated as errors CMakeFiles/onnxruntime_providers.dir/build.make:62: recipe for target 'CMakeFiles/onnxruntime_providers.dir/home/nvidia/code/onnxruntime/onnxruntime/core/providers/cpu/activation/activations.cc.o' failed make[2]: *** [CMakeFiles/onnxruntime_providers.dir/home/nvidia/code/onnxruntime/onnxruntime/core/providers/cpu/activation/activations.cc.o] Error 1 CMakeFiles/Makefile2:562: recipe for target 'CMakeFiles/onnxruntime_providers.dir/all' failed make[1]: *** [CMakeFiles/onnxruntime_providers.dir/all] Error 2 Makefile:162: recipe for target 'all' failed make: *** [all] Error 2 Traceback (most recent call last): File "/home/nvidia/code/onnxruntime/tools/ci_build/build.py", line 1048, in <module> sys.exit(main()) File "/home/nvidia/code/onnxruntime/tools/ci_build/build.py", line 980, in main build_targets(cmake_path, build_dir, configs, args.parallel) File "/home/nvidia/code/onnxruntime/tools/ci_build/build.py", line 412, in build_targets run_subprocess(cmd_args) File "/home/nvidia/code/onnxruntime/tools/ci_build/build.py", line 196, in run_subprocess completed_process = subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell) File "/usr/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['/usr/local/bin/cmake', '--build', '/home/nvidia/code/onnxruntime/build/Linux/Release', '--config', 'Release']' returned non-zero exit status 2. rt6.log

jywu-msft commented 4 years ago

[ 32%] Built target onnx Scanning dependencies of target onnxruntime_providers [ 32%] Building CXX object CMakeFiles/onnxruntime_providers.dir/home/nvidia/code/onnxruntime/onnxruntime/core/providers/cpu/activation/activations.cc.o In file included from /home/nvidia/code/onnxruntime/cmake/external/eigen/Eigen/Core:313:0, from /home/nvidia/code/onnxruntime/onnxruntime/core/util/math_cpuonly.h:31, from /home/nvidia/code/onnxruntime/onnxruntime/core/providers/cpu/activation/activations.h:8, from /home/nvidia/code/onnxruntime/onnxruntime/core/providers/cpu/activation/activations.cc:4: /home/nvidia/code/onnxruntime/cmake/external/eigen/Eigen/src/Core/products/GeneralBlockPanelKernel.h: In member function ‘void Eigen::internal::gebp_traits<float, float, false, false, 4, 0>::updateRhs(const RhsScalar*, Eigen::internal::gebp_traits<float, float, false, false, 4, 0>::RhsPacketx4&) const’: /home/nvidia/code/onnxruntime/cmake/external/eigen/Eigen/src/Core/products/GeneralBlockPanelKernel.h:1079:55: error: unused parameter ‘b’ [-Werror=unused-parameter] EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacketx4& dest) const ^ /home/nvidia/code/onnxruntime/cmake/external/eigen/Eigen/src/Core/products/GeneralBlockPanelKernel.h:1079:71: error: unused parameter ‘dest’ [-Werror=unused-parameter] EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacketx4& dest) const ^~~~ /home/nvidia/code/onnxruntime/cmake/external/eigen/Eigen/src/Core/products/GeneralBlockPanelKernel.h: In member function ‘void Eigen::internal::gebp_traits<double, double, false, false, 4>::updateRhs(const RhsScalar*, Eigen::internal::gebp_traits<double, double, false, false, 4>::RhsPacketx4&) const’: /home/nvidia/code/onnxruntime/cmake/external/eigen/Eigen/src/Core/products/GeneralBlockPanelKernel.h:1148:55: error: unused parameter ‘b’ [-Werror=unused-parameter] EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacketx4& dest) const ^ /home/nvidia/code/onnxruntime/cmake/external/eigen/Eigen/src/Core/products/GeneralBlockPanelKernel.h:1148:71: error: unused parameter ‘dest’ [-Werror=unused-parameter] EIGEN_STRONG_INLINE void updateRhs(const RhsScalar* b, RhsPacketx4& dest) const ^~~~ cc1plus: all warnings being treated as errors CMakeFiles/onnxruntime_providers.dir/build.make:62: recipe for target 'CMakeFiles/onnxruntime_providers.dir/home/nvidia/code/onnxruntime/onnxruntime/core/providers/cpu/activation/activations.cc.o' failed make[2]: *** [CMakeFiles/onnxruntime_providers.dir/home/nvidia/code/onnxruntime/onnxruntime/core/providers/cpu/activation/activations.cc.o] Error 1 CMakeFiles/Makefile2:562: recipe for target 'CMakeFiles/onnxruntime_providers.dir/all' failed make[1]: *** [CMakeFiles/onnxruntime_providers.dir/all] Error 2 Makefile:162: recipe for target 'all' failed make: *** [all] Error 2 Traceback (most recent call last): File "/home/nvidia/code/onnxruntime/tools/ci_build/build.py", line 1048, in <module> sys.exit(main()) File "/home/nvidia/code/onnxruntime/tools/ci_build/build.py", line 980, in main build_targets(cmake_path, build_dir, configs, args.parallel) File "/home/nvidia/code/onnxruntime/tools/ci_build/build.py", line 412, in build_targets run_subprocess(cmd_args) File "/home/nvidia/code/onnxruntime/tools/ci_build/build.py", line 196, in run_subprocess completed_process = subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell) File "/usr/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['/usr/local/bin/cmake', '--build', '/home/nvidia/code/onnxruntime/build/Linux/Release', '--config', 'Release']' returned non-zero exit status 2. rt6.log

Please make sure you follow step 5 of the linked instructions (turning off Dev mode which treats build warnings as errors)

AndreV84 commented 4 years ago

seems done; thank you for your assistance

AndreV84 commented 4 years ago

could you propose a basic sample to execute within python 3 interface to verify that the gpu component works and won't fail?

hossein1387 commented 4 years ago

@AndreV84 take a look at this simple mnist demo that I made to test inference performance for Openvino and Onnxruntime:

https://github.com/hossein1387/int8_experiments/tree/master/mnist/onnxrt

in the onnxrt folder, run the following command: python3 main.py -x LENET.onnx

I do not have access to TX1 right now, but on a 8 core x86 (Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz) using onnxruntime (NOT onnxruntime-gpu) I am getting the following:

[INFO   ]   =======================================================================
[INFO   ]   Accuracy = 97.25% out of 10000.0 tests, avg inference=0.03732ms per image
[INFO   ]   =======================================================================

AndreV84 commented 4 years ago

executed on 8 core Jetson AGX, I wouldn't say that it uses gpu; is there any chance to run something with use of GPU? [INFO ] ======================================================================= [INFO ] Accuracy = 97.13% out of 10000.0 tests, avg inference=0.02901ms per image [INFO ] ======================================================================= 9920512it [01:00, 164042.39it/s]

hossein1387 commented 4 years ago

If you have onnxruntime-gpu installed, you should automatically see the usage of Jetson GPUs.

AndreV84 commented 4 years ago

@hossein1387 thank you for your response. Yes I have built and installed onnxruntime-gpu as whl with pip. However, before starting the example I have GR3D_FREQ 0%@1377 and while running it it is GR3D_FREQ 6%@1377 GR3D_FREQ 1%@1377 or it also remains 0; and when the execution ended there are also fluctuations up to 5%

jywu-msft commented 4 years ago

cmake --version cmake version 3.16.4 tthe output is as attached 16.log rosoft/onnxruntime/files/4251775/16.log)

btw, thanks for confirming that cmake 3.16.4 works. I can see in your log that the CUDA compiler check succeeded.

-- The CUDA compiler identification is NVIDIA 10.0.326 -- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- works

we will look into how to get it fixed for cmake 3.17 , but it is still RC1, so I suggest sticking to the latest stable (currently cmake 3.16.4)

For CMake version 3.17RC1, I found that if you replace the following line in onnxruntime/cmake/CMakeLists.txt string(APPEND CMAKE_CUDA_FLAGS "-cudart shared") with set(CMAKE_CUDA_RUNTIME_LIBRARY Shared) the CUDA compiler check should succeed.

AndreV84 commented 4 years ago

@jywu-msft: thank you for the update! could you suggest a basic cuda check to be run from under python to get confirmed that onnxruntime-gpu installation can successfully use GPU, rather than CPU's please?

jywu-msft commented 4 years ago

@jywu-msft: thank you for the update! could you suggest a basic cuda check to be run from under python to get confirmed that onnxruntime-gpu installation can successfully use GPU, rather than CPU's please?

you can turn on verbose logging prior to your sess.run() call, and it will show which nodes were assigned to execute on GPU. see https://github.com/microsoft/onnxruntime/issues/2404#issuecomment-554276559

import onnxruntime as ort ort.set_default_logger_severity(0) sess = ort.InferenceSession(...) sess.run(...)

AndreV84 commented 4 years ago

@jywu-msft Thank you for following up! I added rt.set_default_logger_severity(0) to the proposed previously mnist cpu example python3 main.py -x LENET.onnx verbose.log It doesn't seem to call cuda; perhaps I need some example of ort.InferenceSession( code) that call cuda on execution to see it reflected in the log tried another code verboise2.log

AndreV84 commented 4 years ago

Is there any GPU sample around that I might execute with the verbose logging?

jywu-msft commented 4 years ago

@jywu-msft Thank you for following up! I added rt.set_default_logger_severity(0) to the proposed previously mnist cpu example python3 main.py -x LENET.onnx verbose.log It doesn't seem to call cuda; perhaps I need some example of ort.InferenceSession( code) that call cuda on execution to see it reflected in the log tried another code verboise2.log

it is working. you actually built TensorRT Execution Provider. (CUDA Execution provider is a different provider) They both execute on GPU.

your verbose.log shows 2020-02-26 18:13:28.123485238 [V:onnxruntime:, inference_session.cc:642 TransformGraph] All nodes have been placed on [TensorrtExecutionProvider]. That means all nodes are executing on GPU via TensorRT

jywu-msft commented 4 years ago

you can test on onnx models from onnx model zoo. e.g. squeezenet you can then use the python api set_providers() to test 'CPUExecutionProvider' vs 'TensorrtExecutionProvider' or 'CUDAExecutionProvider'

see example

stale[bot] commented 4 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

microsoft / onnxruntime

Jetson Xavier - building from source #3024