Closed NDNM1408 closed 6 months ago
Hi @NDNM1408, I was wondering which Triton version are you using? Could you also provide the model config and any steps required for us to reproduce the issue?
Hi @krishung5 , I met with the same problem "Stub process 'add_sub_0_0' is not healthy." when I'm trying to build triton_python_backend_stub of python3.8. The error was like:
I0509 08:28:58.312840 1635 grpc_server.cc:2466] Started GRPCInferenceService at 0.0.0.0:8001
I0509 08:28:58.313128 1635 http_server.cc:4636] Started HTTPService at 0.0.0.0:8000
I0509 08:28:58.354728 1635 http_server.cc:320] Started Metrics Service at 0.0.0.0:8002
terminate called after throwing an instance of 'boost::interprocess::lock_exception'
what(): boost::interprocess::lock_exception
Signal (6) received.
0# 0x00005648F7FA23BD in tritonserver
1# 0x00007F57FBD1E520 in /lib/x86_64-linux-gnu/libc.so.6
2# pthread_kill in /lib/x86_64-linux-gnu/libc.so.6
3# raise in /lib/x86_64-linux-gnu/libc.so.6
4# abort in /lib/x86_64-linux-gnu/libc.so.6
5# 0x00007F57FBFA7B9E in /lib/x86_64-linux-gnu/libstdc++.so.6
6# 0x00007F57FBFB320C in /lib/x86_64-linux-gnu/libstdc++.so.6
7# 0x00007F57FBFB21E9 in /lib/x86_64-linux-gnu/libstdc++.so.6
8# __gxx_personality_v0 in /lib/x86_64-linux-gnu/libstdc++.so.6
9# 0x00007F57FE17F884 in /lib/x86_64-linux-gnu/libgcc_s.so.1
10# _Unwind_RaiseException in /lib/x86_64-linux-gnu/libgcc_s.so.1
11# __cxa_throw in /lib/x86_64-linux-gnu/libstdc++.so.6
12# 0x00007F57E8FD8E1A in /opt/tritonserver/backends/python/libtriton_python.so
13# 0x00007F57E8F94EB0 in /opt/tritonserver/backends/python/libtriton_python.so
14# 0x00007F57E8FA2BBA in /opt/tritonserver/backends/python/libtriton_python.so
15# 0x00007F57E8F8E193 in /opt/tritonserver/backends/python/libtriton_python.so
16# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/python/libtriton_python.so
17# 0x00007F57FC71DD74 in /opt/tritonserver/bin/../lib/libtritonserver.so
18# 0x00007F57FC71E0DB in /opt/tritonserver/bin/../lib/libtritonserver.so
19# 0x00007F57FC8329BD in /opt/tritonserver/bin/../lib/libtritonserver.so
20# 0x00007F57FC721D64 in /opt/tritonserver/bin/../lib/libtritonserver.so
21# 0x00007F57FBFE1253 in /lib/x86_64-linux-gnu/libstdc++.so.6
22# 0x00007F57FBD70AC3 in /lib/x86_64-linux-gnu/libc.so.6
23# clone in /lib/x86_64-linux-gnu/libc.so.6
Below is my process of building triton_python_backend_stub and running the official add_sub example, following the README (https://github.com/triton-inference-server/python_backend?tab=readme-ov-file#building-custom-python-backend-stub):
git clone https://github.com/triton-inference-server/python_backend -b main
cd python_backend
curl -O https://archives.boost.io/release/1.79.0/source/boost_1_79_0.tar.gz
sudo apt-get install libarchive-dev
cd ..
git clone https://github.com/Tencent/rapidjson.git # to install rapidjson
git submodule update --init
mkdir build && cd build
cmake ..
make
make install
cd python_backend
mkdir build && cd build
cmake -DTRITON_ENABLE_GPU=OFF -DTRITON_BACKEND_REPO_TAG=main -DTRITON_COMMON_REPO_TAG=main -DTRITON_CORE_REPO_TAG=main -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install -DPYTHON_EXECUTABLE=$(which python3.8) .. # my python3.8 version is python3.8.12
make triton-python-backend-stub
When I executed "ldd triton_python_backend_stub", I saw the below outputs as expected in the README:
linux-vdso.so.1 (0x00007fff66d6c000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fea9ce70000)
libpython3.8.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (0x00007fea9c71b000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fea9c392000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fea9c17a000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fea9bf5b000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fea9bb6a000)
libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007fea9b938000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fea9b71b000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fea9b517000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fea9b314000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fea9af76000)
/lib64/ld-linux-x86-64.so.2 (0x00007fea9d078000)
docker run -itd --privileged --network host --name victor.chen_triton -e HOME=/home/victor.chen -v /root:/root -v /home:/home -v /data:/data --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --ulimit nofile=65536 nvcr.io/nvidia/tritonserver:24.03-py3
add-apt-repository ppa:deadsnakes/ppa
apt update
apt install python3.8 # Here I could only install python3.8.19 inside the server docker.
apt install python3.8-distutils
apt install python3.8-dev
python3.8 -m pip install setuptools
cd python_backend
mkdir -p models/add_sub/1/
cp examples/add_sub/model.py models/add_sub/1/model.py
cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt
cp triton-python-backend-stub models/add_sub/ # My py3.8 stub is expected to be used by this line.
tritonserver --model-repository `pwd`/models
The server was successfully started.
python3.8 -m pip install tritonclient[http] opencv-python-headless (I have python3.8.12)
python3.8 python_backend/examples/add_sub/client.py
Then my client.py got an "unhealthy" exception: tritonclient.utils.InferenceServerException: [500] Failed to process the request(s) for model instance 'add_sub_0_0', message: Stub process 'add_sub_0_0' is not healthy. And my triton server had the following error:
I0509 08:28:58.312840 1635 grpc_server.cc:2466] Started GRPCInferenceService at 0.0.0.0:8001
I0509 08:28:58.313128 1635 http_server.cc:4636] Started HTTPService at 0.0.0.0:8000
I0509 08:28:58.354728 1635 http_server.cc:320] Started Metrics Service at 0.0.0.0:8002
terminate called after throwing an instance of 'boost::interprocess::lock_exception'
what(): boost::interprocess::lock_exception
Signal (6) received.
0# 0x00005648F7FA23BD in tritonserver
1# 0x00007F57FBD1E520 in /lib/x86_64-linux-gnu/libc.so.6
2# pthread_kill in /lib/x86_64-linux-gnu/libc.so.6
3# raise in /lib/x86_64-linux-gnu/libc.so.6
4# abort in /lib/x86_64-linux-gnu/libc.so.6
5# 0x00007F57FBFA7B9E in /lib/x86_64-linux-gnu/libstdc++.so.6
6# 0x00007F57FBFB320C in /lib/x86_64-linux-gnu/libstdc++.so.6
7# 0x00007F57FBFB21E9 in /lib/x86_64-linux-gnu/libstdc++.so.6
8# __gxx_personality_v0 in /lib/x86_64-linux-gnu/libstdc++.so.6
9# 0x00007F57FE17F884 in /lib/x86_64-linux-gnu/libgcc_s.so.1
10# _Unwind_RaiseException in /lib/x86_64-linux-gnu/libgcc_s.so.1
11# __cxa_throw in /lib/x86_64-linux-gnu/libstdc++.so.6
12# 0x00007F57E8FD8E1A in /opt/tritonserver/backends/python/libtriton_python.so
13# 0x00007F57E8F94EB0 in /opt/tritonserver/backends/python/libtriton_python.so
14# 0x00007F57E8FA2BBA in /opt/tritonserver/backends/python/libtriton_python.so
15# 0x00007F57E8F8E193 in /opt/tritonserver/backends/python/libtriton_python.so
16# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/python/libtriton_python.so
17# 0x00007F57FC71DD74 in /opt/tritonserver/bin/../lib/libtritonserver.so
18# 0x00007F57FC71E0DB in /opt/tritonserver/bin/../lib/libtritonserver.so
19# 0x00007F57FC8329BD in /opt/tritonserver/bin/../lib/libtritonserver.so
20# 0x00007F57FC721D64 in /opt/tritonserver/bin/../lib/libtritonserver.so
21# 0x00007F57FBFE1253 in /lib/x86_64-linux-gnu/libstdc++.so.6
22# 0x00007F57FBD70AC3 in /lib/x86_64-linux-gnu/libc.so.6
23# clone in /lib/x86_64-linux-gnu/libc.so.6
python3.8 python_backend/examples/add_sub/client.py
again, everything will go right:
root@vir-115-46-001:~/triton# python3.8 python_backend/examples/add_sub/client.py
INPUT0 ([0.1482776 0.73882675 0.68110615 0.46473113]) + INPUT1 ([0.7229217 0.18495801 0.7215026 0.2987663 ]) = OUTPUT0 ([0.8711993 0.92378473 1.4026088 0.7634974 ])
INPUT0 ([0.1482776 0.73882675 0.68110615 0.46473113]) - INPUT1 ([0.7229217 0.18495801 0.7215026 0.2987663 ]) = OUTPUT1 ([-0.57464415 0.5538688 -0.04039645 0.16596484])
PASS: add_sub
I think step 9 ensures that my triton-python-backend-stub (resulted from step 1 to step 4) is the main cause of the problem.
Could you please tell me what I was doing wrong? How can I make my python3.8 stub healthier?
@victorsoda You need to compile the same branch of the repo as the server. I think the issue is that you're using the main
branch with the 24.03version of the server. Could you try building Python backend from
r24.03` branch and let us know if you're still running into an error?
@victorsoda You need to compile the same branch of the repo as the server. I think the issue is that you're using the
main
branch with the 24.03version of the server. Could you try building Python backend from
r24.03` branch and let us know if you're still running into an error?
@Tabrizian Thanks a lot for your quick reply! After changing the branch version from main
to r24.03
for the python backend repo, the problem is solved. That's awesome!
@victorsoda You need to compile the same branch of the repo as the server. I think the issue is that you're using the
main
branch with the 24.03version of the server. Could you try building Python backend from
r24.03` branch and let us know if you're still running into an error?
Just want to be sure here, is manually uprading python_backend also also needed if i'm running nvcr.io/nvidia/tritonserver:23.07-py3
docker image ?
I want to use python backend with triton to deploy TTS model using hifigan and fastpitch. When I infer hifigan, I meet the error
tritonclient.utils.InferenceServerException: [400] Failed to process the request(s) for model instance 'hifigan_0', message: Stub process is not healthy.
This is the content of fime model.py
import json import triton_python_backend_utils as pb_utils from nemo.collections.tts.models import HifiGanModel import torch
class TritonPythonModel:
Can anyone help?