triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
7.8k stars 1.42k forks source link

Failed to initialize Python stub + ModuleNotFoundError: No module named 'nvtabular', 'merlin' #7158

Open zwei2016 opened 2 months ago

zwei2016 commented 2 months ago

Description A clear and concise description of what the bug is.

I am following the tutorial online: https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/examples/getting-started-session-based/03-serving-session-based-model-torch-backend.ipynb

After creating the model "executor_model", I tried to run the Triton Inference Server with

docker run --gpus=1 --rm --net=host -v /home/***/workspace/data/models:/models nvcr.io/nvidia/tritonserver:24.03-py3 tritonserver --model-repository=/models
=============================
== Triton Inference Server ==
=============================

NVIDIA Release 24.03 (build 86102629)
Triton Server Version 2.44.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

I0425 00:38:39.764967 1 pinned_memory_manager.cc:275] Pinned memory pool is created at '0x205000000' with size 268435456
I0425 00:38:39.765025 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0425 00:38:39.770824 1 model_lifecycle.cc:469] loading: 1_predictpytorchtriton:1
I0425 00:38:39.770870 1 model_lifecycle.cc:469] loading: executor_model:1
I0425 00:38:39.770891 1 model_lifecycle.cc:469] loading: 0_transformworkflowtriton:1
I0425 00:38:40.340567 1 libtorch.cc:2467] TRITONBACKEND_Initialize: pytorch
I0425 00:38:40.340593 1 libtorch.cc:2477] Triton TRITONBACKEND API version: 1.19
I0425 00:38:40.340603 1 libtorch.cc:2483] 'pytorch' TRITONBACKEND API version: 1.19
I0425 00:38:40.340635 1 libtorch.cc:2516] TRITONBACKEND_ModelInitialize: 1_predictpytorchtriton (version 1)
W0425 00:38:40.342571 1 libtorch.cc:318] skipping model configuration auto-complete for '1_predictpytorchtriton': not supported for pytorch backend
I0425 00:38:40.343279 1 libtorch.cc:347] Optimized execution is enabled for model instance '1_predictpytorchtriton'
I0425 00:38:40.343301 1 libtorch.cc:366] Cache Cleaning is disabled for model instance '1_predictpytorchtriton'
I0425 00:38:40.343304 1 libtorch.cc:383] Inference Mode is enabled for model instance '1_predictpytorchtriton'
I0425 00:38:40.343350 1 libtorch.cc:2560] TRITONBACKEND_ModelInstanceInitialize: 1_predictpytorchtriton_0_0 (GPU device 0)
I0425 00:38:40.431502 1 model_lifecycle.cc:835] successfully loaded '1_predictpytorchtriton'
I0425 00:38:40.653469 157 pb_stub.cc:290] I0425 00:38:40.653469 156 pb_stub.cc:290]  Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'nvtabular'

At:
  /models/0_transformworkflowtriton/1/model.py(32): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load
 Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'merlin'

At:
  /models/executor_model/1/model.py(31): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load

E0425 00:38:40.655660 1 model_lifecycle.cc:638] failed to load 'executor_model' version 1: Internal: ModuleNotFoundError: No module named 'merlin'

At:
  /models/executor_model/1/model.py(31): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load

E0425 00:38:40.655687 1 model_lifecycle.cc:638] failed to load '0_transformworkflowtriton' version 1: Internal: ModuleNotFoundError: No module named 'nvtabular'

At:
  /models/0_transformworkflowtriton/1/model.py(32): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load

I0425 00:38:40.655692 1 model_lifecycle.cc:773] failed to load 'executor_model'
I0425 00:38:40.655723 1 model_lifecycle.cc:773] failed to load '0_transformworkflowtriton'
I0425 00:38:40.655779 1 server.cc:607]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0425 00:38:40.655820 1 server.cc:634]
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path                                                    | Config
                                                                                                            |
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
| python  | /opt/tritonserver/backends/python/libtriton_python.so   | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0425 00:38:40.655868 1 server.cc:677]
+---------------------------+---------+-------------------------------------------------------------------------+
| Model                     | Version | Status                                                                  |
+---------------------------+---------+-------------------------------------------------------------------------+
| 0_transformworkflowtriton | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'nvtabular' |
|                           |         |                                                                         |
|                           |         | At:                                                                     |
|                           |         |   /models/0_transformworkflowtriton/1/model.py(32): <module>            |
|                           |         |   <frozen importlib._bootstrap>(241): _call_with_frames_removed         |
|                           |         |   <frozen importlib._bootstrap_external>(883): exec_module              |
|                           |         |   <frozen importlib._bootstrap>(703): _load_unlocked                    |
|                           |         |   <frozen importlib._bootstrap>(1006): _find_and_load_unlocked          |
|                           |         |   <frozen importlib._bootstrap>(1027): _find_and_load                   |
| 1_predictpytorchtriton    | 1       | READY                                                                   |
| executor_model            | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'merlin'    |
|                           |         |                                                                         |
|                           |         | At:                                                                     |
|                           |         |   /models/executor_model/1/model.py(31): <module>                       |
|                           |         |   <frozen importlib._bootstrap>(241): _call_with_frames_removed         |
|                           |         |   <frozen importlib._bootstrap_external>(883): exec_module              |
|                           |         |   <frozen importlib._bootstrap>(703): _load_unlocked                    |
|                           |         |   <frozen importlib._bootstrap>(1006): _find_and_load_unlocked          |
|                           |         |   <frozen importlib._bootstrap>(1027): _find_and_load                   |
+---------------------------+---------+-------------------------------------------------------------------------+

I0425 00:38:40.679675 1 metrics.cc:877] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3070 Ti Laptop GPU
I0425 00:38:40.689920 1 metrics.cc:770] Collecting CPU metrics
I0425 00:38:40.690196 1 tritonserver.cc:2538]
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value

     |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton

     |
| server_version                   | 2.44.0

     |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /models

     |
| model_control_mode               | MODE_NONE

     |
| strict_model_config              | 0

     |
| rate_limit                       | OFF

     |
| pinned_memory_pool_byte_size     | 268435456

     |
| cuda_memory_pool_byte_size{0}    | 67108864

     |
| min_supported_compute_capability | 6.0

     |
| strict_readiness                 | 1

     |
| exit_timeout                     | 30

     |
| cache_enabled                    | 0

     |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0425 00:38:40.690214 1 server.cc:307] Waiting for in-flight requests to complete.
I0425 00:38:40.690219 1 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
I0425 00:38:40.690350 1 server.cc:338] All models are stopped, unloading models
I0425 00:38:40.690365 1 server.cc:347] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0425 00:38:40.690430 1 libtorch.cc:2594] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0425 00:38:40.691390 1 libtorch.cc:2539] TRITONBACKEND_ModelFinalize: delete model state
I0425 00:38:40.691732 1 model_lifecycle.cc:620] successfully unloaded '1_predictpytorchtriton' version 1
I0425 00:38:41.690694 1 server.cc:347] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
W0425 00:38:41.693252 1 metrics.cc:631] Unable to get power limit for GPU 0. Status:Success, value:0.000000
error: creating server: Internal - failed to load all models
W0425 00:38:42.701892 1 metrics.cc:631] Unable to get power limit for GPU 0. Status:Success, value:0.000000

Triton Information What version of Triton are you using? tritonserver:24.03-py3

Are you using the Triton container or did you build it yourself? docker nvcr.io/nvidia/tritonserver:24.03-py3

To Reproduce Steps to reproduce the behavior.

I followed this online tutorial : https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/examples/getting-started-session-based/03-serving-session-based-model-torch-backend.ipynb

Expected behavior The server should reply to client with the following message: <HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '188'}> bytearray(b'[{"name":"0_transformworkflowtriton","version":"1","state":"READY"},{"name":"1_predictpytorchtriton","version":"1","state":"READY"},{"name":"executor_model","version":"1","state":"READY"}]')

rmccorm4 commented 2 months ago

Hi @zwei2016,

You'll need to install any python dependencies necessary for your python model inside of the container before starting the server. For example, via pip install ....

You can prep a custom Docker so you can re-use it across runs as well:

FROM nvcr.io/nvidia/tritonserver:24.03-py3
RUN pip install ...

You can also look into packaging the dependencies along with your python model though custom environments: https://github.com/triton-inference-server/python_backend?tab=readme-ov-file#creating-custom-execution-environments

zwei2016 commented 2 months ago

Thanks Ryan @rmccorm4 I customized the docker image nvcr.io/nvidia/tritonserver:24.03-py3 with installing the necessary libs and commit it as a new docker image. It works. Thank you.

By the way, when I try to use the server as described in the tutorial: from merlin.systems.triton.utils import send_triton_request response = send_triton_request(workflow.input_schema, df, output_schema.column_names, endpoint="localhost:8001")

I got another error: Failed to open the cudaIpcHandle After searching around, I found that the cause might be CUDA shared memory is not supported on Windows. As I deployed the server in WSL2 within Win11, it would always have this error? is there any solution now?

Best Wei