triton-inference-server / pytriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
https://triton-inference-server.github.io/pytriton/
Apache License 2.0
687 stars 45 forks source link

ModuleNotFoundError: No module named '_ctypes' error when run pytriton server with 0.5.0 #56

Closed lionsheep0724 closed 5 months ago

lionsheep0724 commented 5 months ago

Description

I got an error ModuleNotFoundError: No module named '_ctypes' when I run pytriton server. The code and env are exactly same with last run, which has no problem (0.4.2)

To reproduce

If relevant, add a minimal example so that we can reproduce the error, if necessary, by running the code. For example:

# server
from pytriton.decorators import batch
from pytriton.model_config import ModelConfig, Tensor
from pytriton.triton import Triton

@batch
def infer_fn(**inputs: np.ndarray):
    (audio_feature,) = inputs.values()

    # convert to cuda
    audio_feature_tensor: torch.Tensor = torch.from_numpy(audio_feature).to("cuda:0")
    # generate token
    token: torch.Tensor = whisper_model.generate_token(input_features=audio_feature_tensor)
    token: np.ndarray = token.cpu().numpy()

    return {"token": token}

with Triton(
        config=TritonConfig(http_port=args.http_port, grpc_port=args.grpc_port, metrics_port=args.metrics_port)
    ) as triton:
        logger.info(f"Loading STT model with batch size : {MAX_BATCH_SIZE}")
        triton.bind(
            model_name="Whisper",
            infer_func=infer_fn,
            inputs=[
                Tensor(name="audio_feature", dtype=np.float32, shape=(80, 3000)),
            ],
            outputs=[
                Tensor(name="token", dtype=np.int64, shape=(-1,)),
            ],
            config=ModelConfig(max_batch_size=MAX_BATCH_SIZE),
            strict=True,
        )
        logger.info("Serving inference")
        triton.serve()
        logger.info("Pytriton is ready")

Observed results and expected behavior

Refer to error log below.

2024-01-17 21:10:18 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2024-01-17 21:10:34 2024-01-17 21:10:34,975 - INFO - pytriton.triton: Read more about configuring and serving models in documentation: https://triton-inference-server.github.io/pytriton.
2024-01-17 21:10:34 2024-01-17 21:10:34,975 - INFO - pytriton.triton: (Press CTRL+C or use the command `kill -SIGINT 1` to send a SIGINT signal and quit)
2024-01-17 21:10:34 2024-01-17 21:10:34,975 - INFO - pytriton_single_pipeline: Loading STT model with batch size : 1
2024-01-17 21:10:38 I0117 12:10:34.336323 38 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x304600000' with size 268435456
2024-01-17 21:10:38 I0117 12:10:34.336439 38 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
2024-01-17 21:10:38 I0117 12:10:34.341477 38 server.cc:606] 
2024-01-17 21:10:38 +------------------+------+
2024-01-17 21:10:38 | Repository Agent | Path |
2024-01-17 21:10:38 +------------------+------+
2024-01-17 21:10:38 +------------------+------+
2024-01-17 21:10:38 
2024-01-17 21:10:38 I0117 12:10:34.341506 38 server.cc:633] 
2024-01-17 21:10:38 +---------+------+--------+
2024-01-17 21:10:38 | Backend | Path | Config |
2024-01-17 21:10:38 +---------+------+--------+
2024-01-17 21:10:38 +---------+------+--------+
2024-01-17 21:10:38 
2024-01-17 21:10:38 I0117 12:10:34.341527 38 server.cc:676] 
2024-01-17 21:10:38 +-------+---------+--------+
2024-01-17 21:10:38 | Model | Version | Status |
2024-01-17 21:10:38 +-------+---------+--------+
2024-01-17 21:10:38 +-------+---------+--------+
2024-01-17 21:10:38 
2024-01-17 21:10:38 I0117 12:10:34.379326 38 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA RTX A2000 8GB Laptop GPU
2024-01-17 21:10:38 I0117 12:10:34.385456 38 metrics.cc:710] Collecting CPU metrics
2024-01-17 21:10:38 I0117 12:10:34.385609 38 tritonserver.cc:2483] 
2024-01-17 21:10:38 +----------------------------------+------------------------------------------+
2024-01-17 21:10:38 | Option                           | Value                                    |
2024-01-17 21:10:38 +----------------------------------+------------------------------------------+
2024-01-17 21:10:38 | server_id                        | triton                                   |
2024-01-17 21:10:38 | server_version                   | 2.41.0                                   |
2024-01-17 21:10:38 | server_extensions                | classification sequence model_repository |
2024-01-17 21:10:38 |                                  |  model_repository(unload_dependents) sch |
2024-01-17 21:10:38 |                                  | edule_policy model_configuration system_ |
2024-01-17 21:10:38 |                                  | shared_memory cuda_shared_memory binary_ |
2024-01-17 21:10:38 |                                  | tensor_data parameters statistics trace  |
2024-01-17 21:10:38 |                                  | logging                                  |
2024-01-17 21:10:38 | model_repository_path[0]         | /root/.cache/pytriton/workspace_ofhp5n85 |
2024-01-17 21:10:38 | model_control_mode               | MODE_EXPLICIT                            |
2024-01-17 21:10:38 | strict_model_config              | 0                                        |
2024-01-17 21:10:38 | rate_limit                       | OFF                                      |
2024-01-17 21:10:38 | pinned_memory_pool_byte_size     | 268435456                                |
2024-01-17 21:10:38 | cuda_memory_pool_byte_size{0}    | 67108864                                 |
2024-01-17 21:10:38 | min_supported_compute_capability | 6.0                                      |
2024-01-17 21:10:38 | strict_readiness                 | 1                                        |
2024-01-17 21:10:38 | exit_timeout                     | 30                                       |
2024-01-17 21:10:38 | cache_enabled                    | 0                                        |
2024-01-17 21:10:38 +----------------------------------+------------------------------------------+
2024-01-17 21:10:38 
2024-01-17 21:10:38 I0117 12:10:34.387048 38 grpc_server.cc:2495] Started GRPCInferenceService at 0.0.0.0:8001
2024-01-17 21:10:38 I0117 12:10:34.387252 38 http_server.cc:4619] Started HTTPService at 0.0.0.0:10100
2024-01-17 21:10:38 I0117 12:10:34.429670 38 http_server.cc:282] Started Metrics Service at 0.0.0.0:8002
2024-01-17 21:10:38 I0117 12:10:34.987339 38 model_lifecycle.cc:461] loading: Whisper:1
2024-01-17 21:10:38 I0117 12:10:36.247956 38 python_be.cc:2363] TRITONBACKEND_ModelInstanceInitialize: Whisper_0_0 (CPU device 0)
2024-01-17 21:10:38 Traceback (most recent call last):
2024-01-17 21:10:38   File "<string>", line 1, in <module>
2024-01-17 21:10:38   File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
2024-01-17 21:10:38     exitcode = _main(fd, parent_sentinel)
2024-01-17 21:10:38   File "/usr/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
2024-01-17 21:10:38     self = reduction.pickle.load(from_parent)
2024-01-17 21:10:38   File "/tmp/folderSrr2Ia/1/data.py", line 21, in <module>
2024-01-17 21:10:38     import ctypes
2024-01-17 21:10:38   File "/usr/lib/python3.8/ctypes/__init__.py", line 7, in <module>
2024-01-17 21:10:38     from _ctypes import Union, Structure, Array
2024-01-17 21:10:38 ModuleNotFoundError: No module named '_ctypes'
2024-01-17 21:10:38 I0117 12:10:36.426031 38 pb_stub.cc:346] Failed to initialize Python stub: TritonModelException: Model initialize error: Traceback (most recent call last):
2024-01-17 21:10:38   File "/tmp/folderSrr2Ia/1/model.py", line 392, in initialize
2024-01-17 21:10:38     self._serializer_deserializer.start(data_socket)
2024-01-17 21:10:38   File "/tmp/folderSrr2Ia/1/data.py", line 952, in start
2024-01-17 21:10:38     self._tensor_store.start()
2024-01-17 21:10:38   File "/tmp/folderSrr2Ia/1/data.py", line 555, in start
2024-01-17 21:10:38     self._remote_blocks_store_manager.start()
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/managers.py", line 583, in start
2024-01-17 21:10:38     self._address = reader.recv()
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/connection.py", line 250, in recv
2024-01-17 21:10:38     buf = self._recv_bytes()
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
2024-01-17 21:10:38     buf = self._recv(4)
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
2024-01-17 21:10:38     raise EOFError
2024-01-17 21:10:38 EOFError
2024-01-17 21:10:38 
2024-01-17 21:10:38 
2024-01-17 21:10:38 At:
2024-01-17 21:10:38   /tmp/folderSrr2Ia/1/model.py(426): initialize
2024-01-17 21:10:38 
2024-01-17 21:10:38 E0117 12:10:36.600638 38 backend_model.cc:635] ERROR: Failed to create instance: TritonModelException: Model initialize error: Traceback (most recent call last):
2024-01-17 21:10:38   File "/tmp/folderSrr2Ia/1/model.py", line 392, in initialize
2024-01-17 21:10:38     self._serializer_deserializer.start(data_socket)
2024-01-17 21:10:38   File "/tmp/folderSrr2Ia/1/data.py", line 952, in start
2024-01-17 21:10:38     self._tensor_store.start()
2024-01-17 21:10:38   File "/tmp/folderSrr2Ia/1/data.py", line 555, in start
2024-01-17 21:10:38     self._remote_blocks_store_manager.start()
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/managers.py", line 583, in start
2024-01-17 21:10:38     self._address = reader.recv()
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/connection.py", line 250, in recv
2024-01-17 21:10:38     buf = self._recv_bytes()
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
2024-01-17 21:10:38     buf = self._recv(4)
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
2024-01-17 21:10:38     raise EOFError
2024-01-17 21:10:38 EOFError
2024-01-17 21:10:38 
2024-01-17 21:10:38 
2024-01-17 21:10:38 At:
2024-01-17 21:10:38   /tmp/folderSrr2Ia/1/model.py(426): initialize
2024-01-17 21:10:38 
2024-01-17 21:10:38 E0117 12:10:36.600727 38 model_lifecycle.cc:621] failed to load 'Whisper' version 1: Internal: TritonModelException: Model initialize error: Traceback (most recent call last):
2024-01-17 21:10:38   File "/tmp/folderSrr2Ia/1/model.py", line 392, in initialize
2024-01-17 21:10:38     self._serializer_deserializer.start(data_socket)
2024-01-17 21:10:38   File "/tmp/folderSrr2Ia/1/data.py", line 952, in start
2024-01-17 21:10:38     self._tensor_store.start()
2024-01-17 21:10:38   File "/tmp/folderSrr2Ia/1/data.py", line 555, in start
2024-01-17 21:10:38     self._remote_blocks_store_manager.start()
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/managers.py", line 583, in start
2024-01-17 21:10:38     self._address = reader.recv()
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/connection.py", line 250, in recv
2024-01-17 21:10:38     buf = self._recv_bytes()
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
2024-01-17 21:10:38     buf = self._recv(4)
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
2024-01-17 21:10:38     raise EOFError
2024-01-17 21:10:38 EOFError
2024-01-17 21:10:38 
2024-01-17 21:10:38 
2024-01-17 21:10:38 At:
2024-01-17 21:10:38   /tmp/folderSrr2Ia/1/model.py(426): initialize
2024-01-17 21:10:38 
2024-01-17 21:10:38 I0117 12:10:36.600752 38 model_lifecycle.cc:756] failed to load 'Whisper'
2024-01-17 21:10:38 Signal (2) received.
2024-01-17 21:10:38 I0117 12:10:36.601623 38 server.cc:307] Waiting for in-flight requests to complete.
2024-01-17 21:10:38 Traceback (most recent call last):
2024-01-17 21:10:38   File "inference_pipeline.py", line 126, in <module>
2024-01-17 21:10:38     main()
2024-01-17 21:10:38   File "inference_pipeline.py", line 102, in main
2024-01-17 21:10:38     triton.bind(
2024-01-17 21:10:38   File "/usr/local/lib/python3.8/dist-packages/pytriton/triton.py", line 389, in bind
2024-01-17 21:10:38     self._model_manager.add_model(model, self.is_connected())
2024-01-17 21:10:38   File "/usr/local/lib/python3.8/dist-packages/pytriton/models/manager.py", line 78, in add_model
2024-01-17 21:10:38     self._load_model(model)
2024-01-17 21:10:38   File "/usr/local/lib/python3.8/dist-packages/pytriton/models/manager.py", line 127, in _load_model
2024-01-17 21:10:38     client.load_model(config=config, files=files)
2024-01-17 21:10:38   File "/usr/local/lib/python3.8/dist-packages/pytriton/client/client.py", line 397, in load_model
2024-01-17 21:10:38     self._general_client.load_model(self._model_name, config=config, files=files)
2024-01-17 21:10:38   File "/usr/local/lib/python3.8/dist-packages/tritonclient/http/_client.py", line 663, in load_model
2024-01-17 21:10:38     _raise_if_error(response)
2024-01-17 21:10:38   File "/usr/local/lib/python3.8/dist-packages/tritonclient/http/_utils.py", line 69, in _raise_if_error
2024-01-17 21:10:38     raise error
2024-01-17 21:10:38 tritonclient.utils.InferenceServerException: [400] load failed for model 'Whisper': version 1 is at UNAVAILABLE state: Internal: TritonModelException: Model initialize error: Traceback (most recent call last):
2024-01-17 21:10:38   File "/tmp/folderSrr2Ia/1/model.py", line 392, in initialize
2024-01-17 21:10:38     self._serializer_deserializer.start(data_socket)
2024-01-17 21:10:38   File "/tmp/folderSrr2Ia/1/data.py", line 952, in start
2024-01-17 21:10:38     self._tensor_store.start()
2024-01-17 21:10:38   File "/tmp/folderSrr2Ia/1/data.py", line 555, in start
2024-01-17 21:10:38     self._remote_blocks_store_manager.start()
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/managers.py", line 583, in start
2024-01-17 21:10:38     self._address = reader.recv()
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/connection.py", line 250, in recv
2024-01-17 21:10:38     buf = self._recv_bytes()
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
2024-01-17 21:10:38     buf = self._recv(4)
2024-01-17 21:10:38   File "/usr/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
2024-01-17 21:10:38     raise EOFError
2024-01-17 21:10:38 EOFError
2024-01-17 21:10:38 
2024-01-17 21:10:38 
2024-01-17 21:10:38 At:
2024-01-17 21:10:38   /tmp/folderSrr2Ia/1/model.py(426): initialize
2024-01-17 21:10:38 ;
2024-01-17 21:10:38 
2024-01-17 21:10:38 I0117 12:10:36.601632 38 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
2024-01-17 21:10:38 I0117 12:10:36.601635 38 server.cc:338] All models are stopped, unloading models
2024-01-17 21:10:38 I0117 12:10:36.601636 38 server.cc:345] Timeout 30: Found 0 live models and 0 in-flight non-inference requests

Environment

Please refer to my dockerfile, as below.

# Use Ubuntu 22.04 as base image
FROM ubuntu:22.04

# Install necessary packages
RUN apt update -y && apt install -y --fix-missing software-properties-common

# Add repository with various Python versions
RUN add-apt-repository ppa:deadsnakes/ppa -y

#Set timezone
RUN apt-get update && \
    apt-get install -yq tzdata && \
    ln -fs /usr/share/zoneinfo/Asia/Seoul /etc/localtime && \
    dpkg-reconfigure -f noninteractive tzdata

# Install Python 3.8 and required libraries
RUN apt install -y --fix-missing python3.8 libpython3.8 python3.8-distutils python3-pip \
     build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libsqlite3-dev libreadline-dev \
     libffi-dev curl libbz2-dev pkg-config make

# install dependencies
COPY ./requirements.txt /workspace/requirements.txt
RUN pip install -r  /workspace/requirements.txt

# Install nvidia-pytriton using pip
RUN python3.8 -m pip install nvidia-pytriton
RUN python3.8 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

WORKDIR /workspace

# copy model
COPY ./models /workspace

Additional context I don't understand why the error log says failed to load 'Whisper' version 1. The previous version loaded model with version 2, since folder name is 2.

piotrm-nvidia commented 5 months ago

Introduction

It's evident that there is an issue with supporting Python versions other than 3.10 in your environment, leading to the ModuleNotFoundError. Further investigation is needed to address this compatibility problem.

You can use Python 3.10 as a workaround for now. We will investigate the issue and provide a fix as soon as possible. Should you require any further assistance or information, please don't hesitate to reach out.

Background

The error message you've encountered indicates an issue with importing the _ctypes module, which is a part of the standard library in Python. This issue is typically related to the Python environment or the way Python was installed or is being used. The _ctypes module is used by the ctypes library to provide C-compatible data types and allows calling functions in DLLs or shared libraries. It depends on libffi, which is a library that provides a portable interface for calling C functions.

In your specific case, it appears you are working with Triton Python 3.8 backend and that the wheels are built with Python 3.8.18. However, the error message references both Python 3.10 (/usr/lib/python3.10/multiprocessing/spawn.py) and Python 3.8 (/usr/lib/python3.8/ctypes/__init__.py) paths, indicating a potential conflict or misconfiguration between different Python versions in your environment.

Reproduction

I reduced your Dockerfile to just include the bare minimum to reproduce the issue:

# Use Ubuntu 22.04 as base image
FROM ubuntu:22.04

# Install necessary packages
RUN apt update -y && apt install -y --fix-missing software-properties-common

# Add repository with various Python versions
RUN add-apt-repository ppa:deadsnakes/ppa -y

#Set timezone
RUN apt-get update && \
    apt-get install -yq tzdata && \
    ln -fs /usr/share/zoneinfo/Asia/Seoul /etc/localtime && \
    dpkg-reconfigure -f noninteractive tzdata

# Install Python 3.8 and required libraries
RUN apt install -y --fix-missing python3.8 libpython3.8 python3.8-distutils python3-pip \
     build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libsqlite3-dev libreadline-dev \
     libffi-dev curl libbz2-dev pkg-config make

# Install nvidia-pytriton using pip
RUN python3.8 -m pip install nvidia-pytriton==0.5.0
RUN python3.8 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

WORKDIR /workspace

I also reduced your Python script to just the bare minimum to reproduce the issue:


from pytriton.decorators import batch
from pytriton.model_config import ModelConfig, Tensor
from pytriton.triton import Triton, TritonConfig
import logging
import numpy as np

logging.basicConfig(level=logging.DEBUG)

@batch
def infer_fn(**inputs: np.ndarray):
    return {"token": np.zeros((1,1), dtype=np.int64) }

MAX_BATCH_SIZE = 10

with Triton(
        config=TritonConfig(http_port=1111, grpc_port=2222, metrics_port=3333)
    ) as triton:
        triton.bind(
            model_name="Whisper",
            infer_func=infer_fn,
            inputs=[
                Tensor(name="audio_feature", dtype=np.float32, shape=(80, 3000)),
            ],
            outputs=[
                Tensor(name="token", dtype=np.int64, shape=(-1,)),
            ],
            config=ModelConfig(max_batch_size=MAX_BATCH_SIZE),
            strict=True,
        )
        triton.serve()

Running in Python 3.8 and 3.9

I run this script using the following command to use Python 3.8.18:

python3.8 server.py

My server.py enables more logs, so I can see that Triton Inference Server is running with LD_LIBRARY_PATH set to:

"LD_LIBRARY_PATH": "/usr/local/lib/python3.8/dist-packages/nvidia_pytriton.libs"

It fails in very similar way like in your case.

I also tried to run it using Python 3.9.

Python 3.9 is not available in your environment, so I had to install it:

apt-get install python3.9 libpython3.9 python3.9-distutils

I also had to install PyTrition:

python3.9 -m pip install nvidia-pytriton==0.5.0

I run the script using Python 3.9:

python3.9 server.py

Here libraries are loaded from different location:

"LD_LIBRARY_PATH": "/usr/local/lib/python3.9/dist-packages/nvidia_pytriton.libs"

The failure is very similar to the one you're encountering:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/tmp/folder5qpGxk/1/data.py", line 21, in <module>
    import ctypes
  File "/usr/lib/python3.9/ctypes/__init__.py", line 8, in <module>
    from _ctypes import Union, Structure, Array
ModuleNotFoundError: No module named '_ctypes'

Running in Python 3.10

I also tried running script.py with python3.10.12, which is also installed in your environment.

I only need to install PyTriton in python3.10:

python3.10 -m pip install nvidia-pytriton==0.5.0

Run server:

python3.10 server.py

It worked fine and model was able to respond to client request (python3.10):

from pytriton.client import ModelClient
client = ModelClient("grpc://localhost:2222", "Whisper")
import numpy as np
client.infer_sample(np.zeros((80, 3000), dtype=np.float32))

Output:

{'token': array([0])}

It is clear that there is issue with supporting any other Python than 3.10 in your environment. I will investigate further this issue to see if we can avoid it.

Full logs

Full log from successfull run in Python 3.10:

DEBUG:pytriton.utils.workspace:Workspace path /root/.cache/pytriton/workspace_zixh0qhu
DEBUG:pytriton.triton:Preparing Triton Inference Server binaries and libs for execution.
DEBUG:pytriton.triton:Triton Inference Server binaries copied to /root/.cache/pytriton/workspace_zixh0qhu/tritonserver without stubs.
DEBUG:pytriton.utils.distribution:Obtained pytriton module path: /usr/local/lib/python3.10/dist-packages/pytriton
DEBUG:pytriton.utils.distribution:Obtained pytriton stubs path for 3.10: /usr/local/lib/python3.10/dist-packages/pytriton/tritonserver/python_backend_stubs/3.10/triton_python_backend_stub
DEBUG:pytriton.triton:Copying stub for version 3.10 from /usr/local/lib/python3.10/dist-packages/pytriton/tritonserver/python_backend_stubs/3.10/triton_python_backend_stub to /root/.cache/pytriton/workspace_zixh0qhu/tritonserver/backends/python/triton_python_backend_stub
DEBUG:pytriton.triton:Triton Inference Server binaries ready in /root/.cache/pytriton/workspace_zixh0qhu/tritonserver
DEBUG:pytriton.utils.distribution:Obtained pytriton module path: /usr/local/lib/python3.10/dist-packages/pytriton
DEBUG:pytriton.utils.distribution:Obtained pytriton module path: /usr/local/lib/python3.10/dist-packages/pytriton
DEBUG:pytriton.utils.distribution:pytriton is installed in editable mode: False
DEBUG:pytriton.utils.distribution:Obtained nvidia_pytriton.libs path: /usr/local/lib/python3.10/dist-packages/nvidia_pytriton.libs
DEBUG:pytriton.triton:Starting Triton Inference
DEBUG:pytriton.server.triton_server:Triton Server binary /root/.cache/pytriton/workspace_zixh0qhu/tritonserver/bin/tritonserver. Environment:
{
    "NVIDIA_VISIBLE_DEVICES": "all",
    "HOSTNAME": "piotrmubuntu2204",
    "PWD": "/root",
    "HOME": "/root",
    "LS_COLORS": "rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:",
    "TERM": "xterm",
    "SHLVL": "1",
    "PATH": "/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
    "OLDPWD": "/workspace",
    "_": "/usr/bin/python3.10",
    "LC_CTYPE": "C.UTF-8",
    "LD_LIBRARY_PATH": "/usr/local/lib/python3.10/dist-packages/nvidia_pytriton.libs"
}
DEBUG:pytriton.client.utils:Creating InferenceServerClient for http://127.0.0.1:1111 with {}
DEBUG:pytriton.client.utils:Waiting for server to be ready (timeout=119.99998903274536)
I0117 20:03:00.378866 1049 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7f17d0000000' with size 268435456
I0117 20:03:00.379116 1049 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0117 20:03:00.379643 1049 server.cc:592] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0117 20:03:00.379659 1049 server.cc:619] 
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0117 20:03:00.379669 1049 server.cc:662] 
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+

I0117 20:03:00.422781 1049 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA RTX A6000
I0117 20:03:00.422992 1049 metrics.cc:710] Collecting CPU metrics
I0117 20:03:00.423130 1049 tritonserver.cc:2458] 
+----------------------------------+------------------------------------------+
| Option                           | Value                                    |
+----------------------------------+------------------------------------------+
| server_id                        | triton                                   |
| server_version                   | 2.39.0                                   |
| server_extensions                | classification sequence model_repository |
|                                  |  model_repository(unload_dependents) sch |
|                                  | edule_policy model_configuration system_ |
|                                  | shared_memory cuda_shared_memory binary_ |
|                                  | tensor_data parameters statistics trace  |
|                                  | logging                                  |
| model_repository_path[0]         | /root/.cache/pytriton/workspace_zixh0qhu |
| model_control_mode               | MODE_EXPLICIT                            |
| strict_model_config              | 0                                        |
| rate_limit                       | OFF                                      |
| pinned_memory_pool_byte_size     | 268435456                                |
| cuda_memory_pool_byte_size{0}    | 67108864                                 |
| min_supported_compute_capability | 6.0                                      |
| strict_readiness                 | 1                                        |
| exit_timeout                     | 30                                       |
| cache_enabled                    | 0                                        |
+----------------------------------+------------------------------------------+

I0117 20:03:00.424784 1049 grpc_server.cc:2513] Started GRPCInferenceService at 0.0.0.0:2222
I0117 20:03:00.424972 1049 http_server.cc:4497] Started HTTPService at 0.0.0.0:1111
I0117 20:03:00.468354 1049 http_server.cc:270] Started Metrics Service at 0.0.0.0:3333
DEBUG:pytriton.client.utils:Creating InferenceServerClient for http://127.0.0.1:1111 with {}
DEBUG:pytriton.client.utils:Waiting for server to be ready (timeout=119.99999213218689)
DEBUG:pytriton.proxy.communication:Started remote block store at /root/.cache/pytriton/workspace_zixh0qhu/data_store.sock (pid=1086)
INFO:pytriton.triton:Read more about configuring and serving models in documentation: https://triton-inference-server.github.io/pytriton.
INFO:pytriton.triton:(Press CTRL+C or use the command `kill -SIGINT 1037` to send a SIGINT signal and quit)
DEBUG:pytriton.models.manager:Adding Whisper (1) to registry under ('whisper', 1).
DEBUG:pytriton.models.manager:Crating model Whisper with version 1.
DEBUG:pytriton.proxy.inference_handler:Binding IPC socket at ipc:///root/.cache/pytriton/workspace_zixh0qhu/ipc_proxy_backend_Whisper_0.
DEBUG:pytriton.proxy.communication:Already connectd to remote block store at /root/.cache/pytriton/workspace_zixh0qhu/data_store.sock
DEBUG:pytriton.proxy.inference_handler:Waiting for requests from proxy model for Whisper.
DEBUG:pytriton.client.client:Creating InferenceServerClient for http://127.0.0.1:1111 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
DEBUG:pytriton.client.client:Creating InferenceServerClient for http://127.0.0.1:1111 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
DEBUG:pytriton.client.utils:Waiting for server to be ready (timeout=119.9999930858612)
I0117 20:03:01.558045 1049 model_lifecycle.cc:461] loading: Whisper:1
I0117 20:03:02.866596 1049 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: Whisper_0_0 (CPU device 0)
DEBUG:pytriton.models.model:Closing handshake socket
I0117 20:03:03.089951 1049 model_lifecycle.cc:818] successfully loaded 'Whisper'
DEBUG:pytriton.client.client:Closing ModelClient
DEBUG:pytriton.models.manager:Done.
DEBUG:pytriton.triton:Triton Inference already connected.

Verbose log from processing request:

DEBUG:pytriton.proxy.communication.client:00000001 received requests
DEBUG:pytriton.proxy.inference:Preprocessing requests for 00000001
DEBUG:pytriton.proxy.inference:Performing inference on requests=00000001
DEBUG:pytriton.proxy.validators:Number of responses: 1
DEBUG:pytriton.proxy.validators:Response #0
DEBUG:pytriton.proxy.validators:    token: [[0]] shape=(1, 1) dtype=int64
DEBUG:pytriton.proxy.inference:Pushing responses for 00000001 into responses queue (0, [{'token': array([[0]])}])
DEBUG:pytriton.proxy.inference:Pushing responses for 00000001 into responses queue (1, None)
DEBUG:pytriton.proxy.inference:Postprocessing responses for 00000001
DEBUG:pytriton.proxy.inference:Finished inference on requests=00000001
DEBUG:pytriton.proxy.inference:Postprocessing responses for 00000001
DEBUG:pytriton.proxy.data:Releasing shared memory block for tensor psm_72cfe70d:0
DEBUG:pytriton.proxy.inference:Finished handling requests for 00000001
DEBUG:pytriton.proxy.communication.client:Finished handling requests 00000001

Full log from python3.8:

DEBUG:pytriton.utils.workspace:Workspace path /root/.cache/pytriton/workspace_mmceomm_
DEBUG:pytriton.triton:Preparing Triton Inference Server binaries and libs for execution.
DEBUG:pytriton.triton:Triton Inference Server binaries copied to /root/.cache/pytriton/workspace_mmceomm_/tritonserver without stubs.
DEBUG:pytriton.utils.distribution:Obtained pytriton module path: /usr/local/lib/python3.8/dist-packages/pytriton
DEBUG:pytriton.utils.distribution:Obtained pytriton stubs path for 3.8: /usr/local/lib/python3.8/dist-packages/pytriton/tritonserver/python_backend_stubs/3.8/triton_python_backend_stub
DEBUG:pytriton.triton:Copying stub for version 3.8 from /usr/local/lib/python3.8/dist-packages/pytriton/tritonserver/python_backend_stubs/3.8/triton_python_backend_stub to /root/.cache/pytriton/workspace_mmceomm_/tritonserver/backends/python/triton_python_backend_stub
DEBUG:pytriton.triton:Triton Inference Server binaries ready in /root/.cache/pytriton/workspace_mmceomm_/tritonserver
DEBUG:pytriton.utils.distribution:Obtained pytriton module path: /usr/local/lib/python3.8/dist-packages/pytriton
DEBUG:pytriton.utils.distribution:Obtained pytriton module path: /usr/local/lib/python3.8/dist-packages/pytriton
DEBUG:pytriton.utils.distribution:pytriton is installed in editable mode: False
DEBUG:pytriton.utils.distribution:Obtained nvidia_pytriton.libs path: /usr/local/lib/python3.8/dist-packages/nvidia_pytriton.libs
DEBUG:pytriton.triton:Starting Triton Inference
DEBUG:pytriton.server.triton_server:Triton Server binary /root/.cache/pytriton/workspace_mmceomm_/tritonserver/bin/tritonserver. Environment:
{
    "NVIDIA_VISIBLE_DEVICES": "all",
    "HOSTNAME": "piotrmubuntu2204",
    "PWD": "/root",
    "HOME": "/root",
    "LS_COLORS": "rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:",
    "TERM": "xterm",
    "SHLVL": "1",
    "PATH": "/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
    "OLDPWD": "/workspace",
    "_": "/usr/bin/python3.8",
    "LC_CTYPE": "C.UTF-8",
    "LD_LIBRARY_PATH": "/usr/local/lib/python3.8/dist-packages/nvidia_pytriton.libs"
}
DEBUG:pytriton.client.utils:Creating InferenceServerClient for http://127.0.0.1:1111 with {}
DEBUG:pytriton.client.utils:Waiting for server to be ready (timeout=119.99998927116394)
I0117 20:04:38.068349 1198 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7f11ee000000' with size 268435456
I0117 20:04:38.068604 1198 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0117 20:04:38.069474 1198 server.cc:606] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0117 20:04:38.069490 1198 server.cc:633] 
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0117 20:04:38.069500 1198 server.cc:676] 
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+

I0117 20:04:38.116246 1198 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA RTX A6000
I0117 20:04:38.118015 1198 metrics.cc:710] Collecting CPU metrics
I0117 20:04:38.118145 1198 tritonserver.cc:2483] 
+----------------------------------+------------------------------------------+
| Option                           | Value                                    |
+----------------------------------+------------------------------------------+
| server_id                        | triton                                   |
| server_version                   | 2.41.0                                   |
| server_extensions                | classification sequence model_repository |
|                                  |  model_repository(unload_dependents) sch |
|                                  | edule_policy model_configuration system_ |
|                                  | shared_memory cuda_shared_memory binary_ |
|                                  | tensor_data parameters statistics trace  |
|                                  | logging                                  |
| model_repository_path[0]         | /root/.cache/pytriton/workspace_mmceomm_ |
| model_control_mode               | MODE_EXPLICIT                            |
| strict_model_config              | 0                                        |
| rate_limit                       | OFF                                      |
| pinned_memory_pool_byte_size     | 268435456                                |
| cuda_memory_pool_byte_size{0}    | 67108864                                 |
| min_supported_compute_capability | 6.0                                      |
| strict_readiness                 | 1                                        |
| exit_timeout                     | 30                                       |
| cache_enabled                    | 0                                        |
+----------------------------------+------------------------------------------+

I0117 20:04:38.119865 1198 grpc_server.cc:2495] Started GRPCInferenceService at 0.0.0.0:2222
I0117 20:04:38.120083 1198 http_server.cc:4619] Started HTTPService at 0.0.0.0:1111
I0117 20:04:38.161658 1198 http_server.cc:282] Started Metrics Service at 0.0.0.0:3333
DEBUG:pytriton.client.utils:Creating InferenceServerClient for http://127.0.0.1:1111 with {}
DEBUG:pytriton.client.utils:Waiting for server to be ready (timeout=119.99998950958252)
INFO:pytriton.triton:Read more about configuring and serving models in documentation: https://triton-inference-server.github.io/pytriton.
INFO:pytriton.triton:(Press CTRL+C or use the command `kill -SIGINT 1185` to send a SIGINT signal and quit)
DEBUG:pytriton.models.manager:Adding Whisper (1) to registry under ('whisper', 1).
DEBUG:pytriton.models.manager:Creating model Whisper with version 1.
DEBUG:pytriton.client.client:Creating InferenceServerClient for http://127.0.0.1:1111 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
DEBUG:pytriton.client.client:Creating InferenceServerClient for http://127.0.0.1:1111 with {'network_timeout': 60.0, 'connection_timeout': 60.0}
DEBUG:pytriton.client.utils:Waiting for server to be ready (timeout=119.99998760223389)
I0117 20:04:38.900731 1198 model_lifecycle.cc:461] loading: Whisper:1
I0117 20:04:40.226306 1198 python_be.cc:2363] TRITONBACKEND_ModelInstanceInitialize: Whisper_0_0 (CPU device 0)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/tmp/folderlulWH6/1/data.py", line 21, in <module>
    import ctypes
  File "/usr/lib/python3.8/ctypes/__init__.py", line 7, in <module>
    from _ctypes import Union, Structure, Array
ModuleNotFoundError: No module named '_ctypes'
lionsheep0724 commented 5 months ago

Many thanks to your quick reply. It sounds to me there's a compatibility issue with python 3.8, at 0.5.0. I'm not sure that my env is okay with 3.8, but I'll try.

P.S : I have question about decoupled model in 0.5.0. Its documentation says that it is specifically useful in Automated Speech Recognition (ASR), but I don't understand why. Here's my questions.

  1. In my case, real time audio packet transmitted to fastapi server (keep alive connection with http) and the packet converted to audio feature when audio packet is accumulated enough to ASR. (maybe few seconds of packets or more)
  2. And then request to pytriton with extracted feature, get ASR response(text).
  3. In my scenario, how can I implement decopled models and whats the advantage of it? I wonder if it guarantees inference with ordered-manner, w.r.t each packet channels. I guess the audio packet source and fastapi server should be 1:1, and pytriton server and fastapi server will be 1:N, to handle multiple audio packet source with no audio sources mixed.
  4. Does pytriton with decoupled model can handle stream data? i.e. , can wee feed audio packet (bytes) to server, directly?
  5. How can we control the reponse length? (the doc says server deliver response whenever it deems fit)
  6. How can we control parallelism? (number of workers, etc..) refer to doc, It can receive many requests in parallel and perform inference on each request independently.

If my question is too broad, it is okay to open a new thread. Feel free to give me questions as my explanation is quite unclear.

piotrm-nvidia commented 5 months ago

Please be so kind to create new thread about decoupled models and ASR. Let's focus at Python 3.8 support here.

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 5 months ago

This issue was closed because it has been stalled for 7 days with no activity.