Stub process is not healthy.

NDNM1408 commented 6 months ago

I want to use python backend with triton to deploy TTS model using hifigan and fastpitch. When I infer hifigan, I meet the error

tritonclient.utils.InferenceServerException: [400] Failed to process the request(s) for model instance 'hifigan_0', message: Stub process is not healthy.

This is the content of fime model.py

import json import triton_python_backend_utils as pb_utils from nemo.collections.tts.models import HifiGanModel import torch

class TritonPythonModel:

def initialize(self, args):

    self.model_config = model_config = json.loads(args['model_config'])

    output_config = pb_utils.get_output_config_by_name(
        model_config, "audio")

    # Convert Triton types to numpy types
    self.output_dtype = pb_utils.triton_string_to_numpy(
        output_config['data_type'])
    self.model = HifiGanModel.restore_from("hifigan.nemo")

def execute(self, requests):

    output_dtype = self.output_dtype

    responses = []

    # Every Python backend must iterate over everyone of the requests
    # and create a pb_utils.InferenceResponse for each of them.
    for request in requests:
        input = pb_utils.get_input_tensor_by_name(request, "spec").as_numpy()
        audio = self.model.convert_spectrogram_to_audio(spec=torch.from_numpy(input))
        print(audio)
        print(audio.shape)
        out_tensor = pb_utils.Tensor("audio",
                                       audio.astype(output_dtype))
        inference_response = pb_utils.InferenceResponse(
            output_tensors=[out_tensor])
        responses.append(inference_response)

    # You should return a list of pb_utils.InferenceResponse. Length
    # of this list must match the length of `requests` list.
    return responses

def finalize(self):
    """`finalize` is called only once when the model is being unloaded.
    Implementing `finalize` function is OPTIONAL. This function allows
    the model to perform any necessary clean ups before exit.
    """
    print('Cleaning up...')

Can anyone help?

krishung5 commented 6 months ago

Hi @NDNM1408, I was wondering which Triton version are you using? Could you also provide the model config and any steps required for us to reproduce the issue?

victorsoda commented 6 months ago

Hi @krishung5 , I met with the same problem "Stub process 'add_sub_0_0' is not healthy." when I'm trying to build triton_python_backend_stub of python3.8. The error was like:

I0509 08:28:58.312840 1635 grpc_server.cc:2466] Started GRPCInferenceService at 0.0.0.0:8001
I0509 08:28:58.313128 1635 http_server.cc:4636] Started HTTPService at 0.0.0.0:8000
I0509 08:28:58.354728 1635 http_server.cc:320] Started Metrics Service at 0.0.0.0:8002
terminate called after throwing an instance of 'boost::interprocess::lock_exception'
  what():  boost::interprocess::lock_exception
Signal (6) received.
 0# 0x00005648F7FA23BD in tritonserver
 1# 0x00007F57FBD1E520 in /lib/x86_64-linux-gnu/libc.so.6
 2# pthread_kill in /lib/x86_64-linux-gnu/libc.so.6
 3# raise in /lib/x86_64-linux-gnu/libc.so.6
 4# abort in /lib/x86_64-linux-gnu/libc.so.6
 5# 0x00007F57FBFA7B9E in /lib/x86_64-linux-gnu/libstdc++.so.6
 6# 0x00007F57FBFB320C in /lib/x86_64-linux-gnu/libstdc++.so.6
 7# 0x00007F57FBFB21E9 in /lib/x86_64-linux-gnu/libstdc++.so.6
 8# __gxx_personality_v0 in /lib/x86_64-linux-gnu/libstdc++.so.6
 9# 0x00007F57FE17F884 in /lib/x86_64-linux-gnu/libgcc_s.so.1
10# _Unwind_RaiseException in /lib/x86_64-linux-gnu/libgcc_s.so.1
11# __cxa_throw in /lib/x86_64-linux-gnu/libstdc++.so.6
12# 0x00007F57E8FD8E1A in /opt/tritonserver/backends/python/libtriton_python.so
13# 0x00007F57E8F94EB0 in /opt/tritonserver/backends/python/libtriton_python.so
14# 0x00007F57E8FA2BBA in /opt/tritonserver/backends/python/libtriton_python.so
15# 0x00007F57E8F8E193 in /opt/tritonserver/backends/python/libtriton_python.so
16# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/python/libtriton_python.so
17# 0x00007F57FC71DD74 in /opt/tritonserver/bin/../lib/libtritonserver.so
18# 0x00007F57FC71E0DB in /opt/tritonserver/bin/../lib/libtritonserver.so
19# 0x00007F57FC8329BD in /opt/tritonserver/bin/../lib/libtritonserver.so
20# 0x00007F57FC721D64 in /opt/tritonserver/bin/../lib/libtritonserver.so
21# 0x00007F57FBFE1253 in /lib/x86_64-linux-gnu/libstdc++.so.6
22# 0x00007F57FBD70AC3 in /lib/x86_64-linux-gnu/libc.so.6
23# clone in /lib/x86_64-linux-gnu/libc.so.6

To Reproduce

Below is my process of building triton_python_backend_stub and running the official add_sub example, following the README (https://github.com/triton-inference-server/python_backend?tab=readme-ov-file#building-custom-python-backend-stub):

My docker for compilation: qic_ubuntu_1804_gcc7:1.5.0.3. It is on machine A, which is fast for compilation.

git clone https://github.com/triton-inference-server/python_backend -b main
cd python_backend
curl -O https://archives.boost.io/release/1.79.0/source/boost_1_79_0.tar.gz
sudo apt-get install libarchive-dev
cd ..
git clone https://github.com/Tencent/rapidjson.git  # to install rapidjson
git submodule update --init
mkdir build && cd build
cmake ..
make
make install

Change all the "std::filesystem" into "std::experimental::filesystem" and "" into "<experimental/filesystem>" under the directory python_backend/src/

cd python_backend
mkdir build && cd build
cmake -DTRITON_ENABLE_GPU=OFF -DTRITON_BACKEND_REPO_TAG=main -DTRITON_COMMON_REPO_TAG=main -DTRITON_CORE_REPO_TAG=main -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install -DPYTHON_EXECUTABLE=$(which python3.8) ..   # my python3.8 version is python3.8.12
make triton-python-backend-stub

When I executed "ldd triton_python_backend_stub", I saw the below outputs as expected in the README:

linux-vdso.so.1 (0x00007fff66d6c000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fea9ce70000)
    libpython3.8.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 (0x00007fea9c71b000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fea9c392000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fea9c17a000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fea9bf5b000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fea9bb6a000)
    libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007fea9b938000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fea9b71b000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fea9b517000)
    libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fea9b314000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fea9af76000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fea9d078000)

Then I started an official triton-server docker on another machine B (which is fast for running model inference):

docker run -itd --privileged --network host --name victor.chen_triton -e HOME=/home/victor.chen -v /root:/root -v /home:/home -v /data:/data  --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864  --ulimit nofile=65536 nvcr.io/nvidia/tritonserver:24.03-py3

And I installed python3.8 in the server docker:

add-apt-repository ppa:deadsnakes/ppa
apt update
apt install python3.8  # Here I could only install python3.8.19 inside the server docker. 
apt install python3.8-distutils
apt install python3.8-dev
python3.8 -m pip install setuptools

Afterwards, I followed the README (https://github.com/triton-inference-server/python_backend?tab=readme-ov-file#quick-start) to start the server with the official add_sub model:

cd python_backend
mkdir -p models/add_sub/1/
cp examples/add_sub/model.py models/add_sub/1/model.py
cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt
cp triton-python-backend-stub models/add_sub/  # My py3.8 stub is expected to be used by this line.
tritonserver --model-repository `pwd`/models

The server was successfully started.

Finally, I started another docker (qic_ubuntu_1804_gcc7:latest) as client docker, and tried to run client.py following README:

python3.8 -m pip install tritonclient[http] opencv-python-headless  (I have python3.8.12)
python3.8 python_backend/examples/add_sub/client.py

Then my client.py got an "unhealthy" exception: tritonclient.utils.InferenceServerException: [500] Failed to process the request(s) for model instance 'add_sub_0_0', message: Stub process 'add_sub_0_0' is not healthy. And my triton server had the following error:

I0509 08:28:58.312840 1635 grpc_server.cc:2466] Started GRPCInferenceService at 0.0.0.0:8001
I0509 08:28:58.313128 1635 http_server.cc:4636] Started HTTPService at 0.0.0.0:8000
I0509 08:28:58.354728 1635 http_server.cc:320] Started Metrics Service at 0.0.0.0:8002
terminate called after throwing an instance of 'boost::interprocess::lock_exception'
what():  boost::interprocess::lock_exception
Signal (6) received.
0# 0x00005648F7FA23BD in tritonserver
1# 0x00007F57FBD1E520 in /lib/x86_64-linux-gnu/libc.so.6
2# pthread_kill in /lib/x86_64-linux-gnu/libc.so.6
3# raise in /lib/x86_64-linux-gnu/libc.so.6
4# abort in /lib/x86_64-linux-gnu/libc.so.6
5# 0x00007F57FBFA7B9E in /lib/x86_64-linux-gnu/libstdc++.so.6
6# 0x00007F57FBFB320C in /lib/x86_64-linux-gnu/libstdc++.so.6
7# 0x00007F57FBFB21E9 in /lib/x86_64-linux-gnu/libstdc++.so.6
8# __gxx_personality_v0 in /lib/x86_64-linux-gnu/libstdc++.so.6
9# 0x00007F57FE17F884 in /lib/x86_64-linux-gnu/libgcc_s.so.1
10# _Unwind_RaiseException in /lib/x86_64-linux-gnu/libgcc_s.so.1
11# __cxa_throw in /lib/x86_64-linux-gnu/libstdc++.so.6
12# 0x00007F57E8FD8E1A in /opt/tritonserver/backends/python/libtriton_python.so
13# 0x00007F57E8F94EB0 in /opt/tritonserver/backends/python/libtriton_python.so
14# 0x00007F57E8FA2BBA in /opt/tritonserver/backends/python/libtriton_python.so
15# 0x00007F57E8F8E193 in /opt/tritonserver/backends/python/libtriton_python.so
16# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/python/libtriton_python.so
17# 0x00007F57FC71DD74 in /opt/tritonserver/bin/../lib/libtritonserver.so
18# 0x00007F57FC71E0DB in /opt/tritonserver/bin/../lib/libtritonserver.so
19# 0x00007F57FC8329BD in /opt/tritonserver/bin/../lib/libtritonserver.so
20# 0x00007F57FC721D64 in /opt/tritonserver/bin/../lib/libtritonserver.so
21# 0x00007F57FBFE1253 in /lib/x86_64-linux-gnu/libstdc++.so.6
22# 0x00007F57FBD70AC3 in /lib/x86_64-linux-gnu/libc.so.6
23# clone in /lib/x86_64-linux-gnu/libc.so.6

If I remove the models/add_sub/triton-python-backend-stub, restart the server again (this time it is using the default python3.10 in the triton server docker), and execute python3.8 python_backend/examples/add_sub/client.py again, everything will go right:

root@vir-115-46-001:~/triton# python3.8 python_backend/examples/add_sub/client.py
INPUT0 ([0.1482776  0.73882675 0.68110615 0.46473113]) + INPUT1 ([0.7229217  0.18495801 0.7215026  0.2987663 ]) = OUTPUT0 ([0.8711993  0.92378473 1.4026088  0.7634974 ])
INPUT0 ([0.1482776  0.73882675 0.68110615 0.46473113]) - INPUT1 ([0.7229217  0.18495801 0.7215026  0.2987663 ]) = OUTPUT1 ([-0.57464415  0.5538688  -0.04039645  0.16596484])
PASS: add_sub

I think step 9 ensures that my triton-python-backend-stub (resulted from step 1 to step 4) is the main cause of the problem.

Could you please tell me what I was doing wrong? How can I make my python3.8 stub healthier?

Tabrizian commented 6 months ago

@victorsoda You need to compile the same branch of the repo as the server. I think the issue is that you're using the main branch with the 24.03version of the server. Could you try building Python backend fromr24.03` branch and let us know if you're still running into an error?

victorsoda commented 6 months ago

@victorsoda You need to compile the same branch of the repo as the server. I think the issue is that you're using the main branch with the 24.03version of the server. Could you try building Python backend fromr24.03` branch and let us know if you're still running into an error?

@Tabrizian Thanks a lot for your quick reply! After changing the branch version from main to r24.03 for the python backend repo, the problem is solved. That's awesome!

sboudouk commented 4 months ago

@victorsoda You need to compile the same branch of the repo as the server. I think the issue is that you're using the main branch with the 24.03version of the server. Could you try building Python backend fromr24.03` branch and let us know if you're still running into an error?

Just want to be sure here, is manually uprading python_backend also also needed if i'm running nvcr.io/nvidia/tritonserver:23.07-py3 docker image ?

triton-inference-server / server

Stub process is not healthy. #7186

To Reproduce