Open ndeep27 opened 4 months ago
Hi @ndeep27, Can you try with any latest stable versions to see if the issue persists? 22.04 is more than 2 years old and has been a lot of development since then.
@sourabh-burnwal Even the latest version fails (24.07) without giving any specific error. For instance below is what I see in the log
gmake[3]: Leaving directory /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build/grpc/src/grpc-build' cd /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build/grpc/src/grpc-build && /usr/local/bin/cmake -E touch /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build/grpc/src/grpc-stamp/grpc-install [ 84%] Completed 'grpc' cd /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build && /usr/local/bin/cmake -E make_directory /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build/CMakeFiles cd /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build && /usr/local/bin/cmake -E touch /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build/CMakeFiles/grpc-complete cd /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build && /usr/local/bin/cmake -E touch /tmp/citritonbuild2406/tritonserver/build/_deps/repo-third-party-build/grpc/src/grpc-stamp/grpc-done gmake[2]: Leaving directory
/tmp/citritonbuild2406/tritonserver/build'
[ 84%] Built target grpc
gmake[1]: Leaving directory `/tmp/citritonbuild2406/tritonserver/build'
gmake: *** [all] Error 2
I am not sure what is the exact error here
@sourabh-burnwal Can you please help with above?
@sourabh-burnwal Is there a CPU version of triton with pytorch released in open source?
Hi @ndeep27, is there any specific reason you want to build Triton, that too with CPU? You can always control device access while starting the container or from the model config.
@sourabh-burnwal we do access it via model-config where we specify CPU but the issue is triton libraries like libtorch_cpu needs cudaart and other related cuda libraries which is leading for us to not run on CPU
@ndeep27, I have run NGC triton image on a CPU-only system without any problems. Can you share what issue you are getting?
@sourabh-burnwal Can you send me the exact docker image which you used to run?
For instance I downloaded - nvcr.io/nvidia/tritonserver:24.07-pyt-python-py3 and when I ssh to the host and run the below, I see libraries like cudart linked
root@031a384b7f38:/opt/tritonserver# ldd backends/pytorch/libtorch_cpu.so linux-vdso.so.1 (0x00007fff1d4cf000) libc10.so => /opt/tritonserver/backends/pytorch/libc10.so (0x00007c0f0a949000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007c0f0a93c000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007c0f0a91c000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007c0f0a917000) libmkl_intel_lp64.so.1 => /opt/tritonserver/backends/pytorch/libmkl_intel_lp64.so.1 (0x00007c0f09a00000) libmkl_gnu_thread.so.1 => /opt/tritonserver/backends/pytorch/libmkl_gnu_thread.so.1 (0x00007c0f07a00000) libmkl_core.so.1 => /opt/tritonserver/backends/pytorch/libmkl_core.so.1 (0x00007c0eff800000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007c0f0a910000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007c0f0a829000) libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1 (0x00007c0f0a7df000) libcupti.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libcupti.so.12 (0x00007c0efee00000) libmpi.so.40 => /opt/hpcx/ompi/lib/libmpi.so.40 (0x00007c0f098e1000) libcudart.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.12 (0x00007c0efea00000) libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007c0efe7d4000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007c0efe5ab000) /lib64/ld-linux-x86-64.so.2 (0x00007c0f1699b000) libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007c0f0a7d8000) libopen-rte.so.40 => /opt/hpcx/ompi/lib/libopen-rte.so.40 (0x00007c0f07941000) libopen-pal.so.40 => /opt/hpcx/ompi/lib/libopen-pal.so.40 (0x00007c0efece9000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007c0f0a7ba000)
root@031a384b7f38:/opt/tritonserver# ldd lib/libtritonserver.so linux-vdso.so.1 (0x00007ffc46cfb000) libnuma.so.1 => /lib/x86_64-linux-gnu/libnuma.so.1 (0x0000705fc1e05000) libcudart.so.12 => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.12 (0x0000705fc1a00000) libdcgm.so.3 => /lib/x86_64-linux-gnu/libdcgm.so.3 (0x0000705fc16a3000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x0000705fc1de9000) libcurl.so.4 => /lib/x86_64-linux-gnu/libcurl.so.4 (0x0000705fc1d40000) libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x0000705fc15ff000) libxml2.so.2 => /lib/x86_64-linux-gnu/libxml2.so.2 (0x0000705fc141d000) libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x0000705fc11f1000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000705fc110a000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x0000705fc1d20000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000705fc0ee1000) /lib64/ld-linux-x86-64.so.2 (0x0000705fc32b1000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x0000705fc1d19000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x0000705fc1d14000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x0000705fc1d0f000) libnghttp2.so.14 => /lib/x86_64-linux-gnu/libnghttp2.so.14 (0x0000705fc1ce5000) libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x0000705fc1cc4000) librtmp.so.1 => /lib/x86_64-linux-gnu/librtmp.so.1 (0x0000705fc0ec2000) libssh.so.4 => /lib/x86_64-linux-gnu/libssh.so.4 (0x0000705fc0e55000) libpsl.so.5 => /lib/x86_64-linux-gnu/libpsl.so.5 (0x0000705fc0e41000) libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x0000705fc09fd000) libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x0000705fc09a9000) libldap-2.5.so.0 => /lib/x86_64-linux-gnu/libldap-2.5.so.0 (0x0000705fc094a000) liblber-2.5.so.0 => /lib/x86_64-linux-gnu/liblber-2.5.so.0 (0x0000705fc0939000) libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x0000705fc086a000) libbrotlidec.so.1 => /lib/x86_64-linux-gnu/libbrotlidec.so.1 (0x0000705fc1cb2000) libicuuc.so.70 => /lib/x86_64-linux-gnu/libicuuc.so.70 (0x0000705fc066f000) liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x0000705fc0644000) libunistring.so.2 => /lib/x86_64-linux-gnu/libunistring.so.2 (0x0000705fc049a000) libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x0000705fc02af000) libhogweed.so.6 => /lib/x86_64-linux-gnu/libhogweed.so.6 (0x0000705fc0267000) libnettle.so.8 => /lib/x86_64-linux-gnu/libnettle.so.8 (0x0000705fc0221000) libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x0000705fc019f000) libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x0000705fc00d2000) libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x0000705fc00a3000) libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x0000705fc009d000) libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x0000705fc008f000) libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x0000705fc0074000) libbrotlicommon.so.1 => /lib/x86_64-linux-gnu/libbrotlicommon.so.1 (0x0000705fc0051000) libicudata.so.70 => /lib/x86_64-linux-gnu/libicudata.so.70 (0x0000705fbe431000) libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x0000705fbe2f6000) libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x0000705fbe2de000) libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x0000705fbe2d7000) libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x0000705fbe2c3000) libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x0000705fbe2b4000)
On CPUs these are not available
We also cannot use these docker images directly since our OS is a variant of RHEL. So we have to build triton from source for our OS.
@ndeep27 I see. Those files get shipped with the docker image.
Can you start a container with that 24.07 docker image without giving it gpu device access, then try loading your pytorch model after specifying device as cpu in config.pbtxt.
Regarding your use-case of running this in RHEL/CentOS based system. I think, as long as you have docker installed and configured, you should be able to run it.
@sourabh-burnwal What i mean is that we cannot directly use that docker image (these are built for Ubuntu but internally we have to build on RHEL due to security constrainsts). We have to build it from source. I am guessing these containers were also built when building from source. Right? We needed a way to build from source so that it works on CPU too
root@0d33724e583c:/opt/tritonserver# cat /etc/os-release PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.4 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy
If I copy the files (under /opt/tritonserver) from this docker image and add it to our custom inference pipeline - will that work?
If I copy the files (under /opt/tritonserver) from this docker image and add it to our custom inference pipeline - will that work?
I don't think this will work as the build might also contain system dependencies. I can try to reproduce your issue but that will take some time as I am currently using ubuntu.
I encountered the same problem, only building a CPU image, the command is:
python3 build.py --enable-logging --endpoint http --endpoint grpc --backend python
I built it in the latest main branch code, version v2.50,
-- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/port_platform.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/string_util.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/sync.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/sync_abseil.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/sync_custom.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/sync_generic.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/sync_posix.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/sync_windows.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/thd_id.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/time.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc/support/workaround_list.h -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/libgrpc_authorization_provider.a -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpc++/impl/codegen/config_protobuf.h -- Up-to-date: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpcpp/impl/codegen/config_protobuf.h -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/libgrpc_plugin_support.a -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/include/grpcpp/ext/channelz_service_plugin.h -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/libgrpcpp_channelz.a -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/libupb.a -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/bin/grpc_cpp_plugin -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/bin/grpc_csharp_plugin -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/bin/grpc_node_plugin -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/bin/grpc_objective_c_plugin -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/bin/grpc_php_plugin -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/bin/grpc_python_plugin -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/bin/grpc_ruby_plugin -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/gRPCTargets.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/gRPCTargets-release.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/gRPCPluginTargets.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/gRPCPluginTargets-release.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/gRPCConfig.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/gRPCConfigVersion.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/modules/Findc-ares.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/modules/Findre2.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/cmake/grpc/modules/Findsystemd.cmake -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/share/grpc/roots.pem -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/pkgconfig/gpr.pc -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/pkgconfig/grpc.pc -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/pkgconfig/grpc_unsecure.pc -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/pkgconfig/grpc++.pc -- Installing: /tmp/tritonbuild/tritonserver/build/third-party/grpc/lib/pkgconfig/grpc++_unsecure.pc [ 87%] Completed 'grpc' [ 87%] Built target grpc gmake: *** [Makefile:136: all] Error 2 error: build failed
Description Triton Server with Pytorch Backend build not working for CPU_ONLY. It is expecting libraries like libcudart.so even though the build was for CPU. Below is how we invoke the build. From another issue thread - we came to know that the CPU only build was fixed in v22.04 onwards
python ./build.py --cmake-dir=$(pwd) --build-dir=/tmp/citritonbuild --endpoint=http --endpoint=grpc --enable-logging --enable-stats --enable-tracing --enable-metrics --backend=pytorch:${tritonversion} --repo-tag=common:${tritonversion} --repo-tag=core:${tritonversion} --repo-tag=backend:${tritonversion} --repo-tag=thirdparty:${tritonversion} --no-container-build --extra-core-cmake-arg=TRITON_ENABLE_GPU=OFF --extra-core-cmake-arg=TRITON_ENABLE_ONNXRUNTIME_TENSORRT=OFF --extra-backend-cmake-arg=pytorch:TRITON_ENABLE_GPU=OFF --upstream-container-version=22.04
Triton Information What version of Triton are you using? 22.04
Are you using the Triton container or did you build it yourself? Build it ourselves
To Reproduce Steps to reproduce the behavior. Mentioned above