triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.23k stars 1.47k forks source link

Facing import error in python backend #7722

Open TheMightyRaider opened 3 days ago

TheMightyRaider commented 3 days ago

Description I'm trying to serve an embedding model [FastText] in triton-server using python as its backend. The external dependencies are just fasttext module which is inturn dependent on numpy. I have created a custom execution environment as mentioned here.

The problem is that, I'm facing the following error while running the triton server as a docker container,

+-------------------+---------+----------------------------------------------------------------------------------------------------+
| Model             | Version | Status                                                                                             |
+-------------------+---------+----------------------------------------------------------------------------------------------------+
| fast-text-service | 1       | UNAVAILABLE: Internal: ImportError: Error importing numpy: you should not try to import numpy from |
|                   |         |         its source directory; please exit the numpy source tree, and relaunch                      |
|                   |         |         your python interpreter from there.                                                        |
|                   |         |                                                                                                    |
|                   |         | At:                                                                                                |
|                   |         |   /tmp/python_env_3RJ5YZ/0/lib/python3.10/site-packages/numpy/__init__.py(119): <module>           |
|                   |         |   <frozen importlib._bootstrap>(241): _call_with_frames_removed                                    |
|                   |         |   <frozen importlib._bootstrap_external>(883): exec_module                                         |
|                   |         |   <frozen importlib._bootstrap>(703): _load_unlocked                                               |
|                   |         |   <frozen importlib._bootstrap>(1006): _find_and_load_unlocked                                     |
|                   |         |   <frozen importlib._bootstrap>(1027): _find_and_load                                              |
|                   |         |   /opt/tritonserver/backends/python/triton_python_backend_utils.py(30): <module>                   |
|                   |         |   <frozen importlib._bootstrap>(241): _call_with_frames_removed                                    |
|                   |         |   <frozen importlib._bootstrap_external>(883): exec_module                                         |
|                   |         |   <frozen importlib._bootstrap>(703): _load_unlocked                                               |
|                   |         |   <frozen importlib._bootstrap>(1006): _find_and_load_unlocked                                     |
|                   |         |   <frozen importlib._bootstrap>(1027): _find_and_load                                              |
|                   |         |   /mnt/data/model_repository/fast-text-service/1/model.py(1): <module>                             |
|                   |         |   <frozen importlib._bootstrap>(241): _call_with_frames_removed                                    |
|                   |         |   <frozen importlib._bootstrap_external>(883): exec_module                                         |
|                   |         |   <frozen importlib._bootstrap>(703): _load_unlocked                                               |
|                   |         |   <frozen importlib._bootstrap>(1006): _find_and_load_unlocked                                     |
|                   |         |   <frozen importlib._bootstrap>(1027): _find_and_load                                              |
+-------------------+---------+----------------------------------------------------------------------------------------------------+

Triton Information I'm running the triton container on a M2 chip and its image is nvcr.io/nvidia/tritonserver:24.09-pyt-python-py3.

To Reproduce

requirements.txt:

fasttext==0.9.3
numpy==2.1.2
pybind11==2.13.6
setuptools==75.2.0

configpb.txt

name: "fast-text-service"
backend: "python"
max_batch_size: 8 

dynamic_batching { }

input [
    {
        name: "TEXT"
        data_type: TYPE_STRING
        dims: [ 1 ]
    }
]

output [
    {
        name: "Status"
        data_type: TYPE_FP32
        dims: [ 1 ]
    },
    {
        name: "Embedding"
        data_type: TYPE_FP32
        dims: [ 300 ]
    }
]

parameters: {
    key: "EXECUTION_ENV_PATH"
    value: {string_value: "/mnt/data/model_repository/fast-text-service/fasttext-server.tar.gz"}
}

instance_group [
    {
        count: 1
        kind: KIND_CPU
    }
]

Expected behavior The container exits by saying error: creating server: Internal - failed to load all models. Below is a segment of a log generated by the triton container,

....
....
....
....
I1020 05:12:57.580370 1 model_lifecycle.cc:472] "loading: fast-text-service:1"
I1020 05:12:57.580948 1 backend_model.cc:503] "Adding default backend config setting: default-max-batch-size,4"
I1020 05:12:57.580969 1 shared_library.cc:112] "OpenLibraryHandle: /opt/tritonserver/backends/python/libtriton_python.so"
I1020 05:12:57.585539 1 python_be.cc:1618] "'python' TRITONBACKEND API version: 1.19"
I1020 05:12:57.585553 1 python_be.cc:1640] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I1020 05:12:57.585837 1 python_be.cc:1778] "Shared memory configuration is shm-default-byte-size=1048576,shm-growth-byte-size=1048576,stub-timeout-seconds=30"
I1020 05:12:57.586479 1 python_be.cc:2075] "TRITONBACKEND_GetBackendAttribute: setting attributes"
I1020 05:12:57.586515 1 python_be.cc:1879] "TRITONBACKEND_ModelInitialize: fast-text-service (version 1)"
I1020 05:12:57.587722 1 model_config_utils.cc:1941] "ModelConfig 64-bit fields:"
I1020 05:12:57.587729 1 model_config_utils.cc:1943] "\tModelConfig::dynamic_batching::default_priority_level"
I1020 05:12:57.587730 1 model_config_utils.cc:1943] "\tModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds"
I1020 05:12:57.587732 1 model_config_utils.cc:1943] "\tModelConfig::dynamic_batching::max_queue_delay_microseconds"
I1020 05:12:57.587734 1 model_config_utils.cc:1943] "\tModelConfig::dynamic_batching::priority_levels"
I1020 05:12:57.587735 1 model_config_utils.cc:1943] "\tModelConfig::dynamic_batching::priority_queue_policy::key"
I1020 05:12:57.587736 1 model_config_utils.cc:1943] "\tModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds"
I1020 05:12:57.587738 1 model_config_utils.cc:1943] "\tModelConfig::ensemble_scheduling::step::model_version"
I1020 05:12:57.587740 1 model_config_utils.cc:1943] "\tModelConfig::input::dims"
I1020 05:12:57.587741 1 model_config_utils.cc:1943] "\tModelConfig::input::reshape::shape"
I1020 05:12:57.587743 1 model_config_utils.cc:1943] "\tModelConfig::instance_group::secondary_devices::device_id"
I1020 05:12:57.587744 1 model_config_utils.cc:1943] "\tModelConfig::model_warmup::inputs::value::dims"
I1020 05:12:57.587746 1 model_config_utils.cc:1943] "\tModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim"
I1020 05:12:57.587748 1 model_config_utils.cc:1943] "\tModelConfig::optimization::cuda::graph_spec::input::value::dim"
I1020 05:12:57.587749 1 model_config_utils.cc:1943] "\tModelConfig::output::dims"
I1020 05:12:57.587751 1 model_config_utils.cc:1943] "\tModelConfig::output::reshape::shape"
I1020 05:12:57.587752 1 model_config_utils.cc:1943] "\tModelConfig::sequence_batching::direct::max_queue_delay_microseconds"
I1020 05:12:57.587754 1 model_config_utils.cc:1943] "\tModelConfig::sequence_batching::max_sequence_idle_microseconds"
I1020 05:12:57.587755 1 model_config_utils.cc:1943] "\tModelConfig::sequence_batching::oldest::max_queue_delay_microseconds"
I1020 05:12:57.587757 1 model_config_utils.cc:1943] "\tModelConfig::sequence_batching::state::dims"
I1020 05:12:57.587759 1 model_config_utils.cc:1943] "\tModelConfig::sequence_batching::state::initial_state::dims"
I1020 05:12:57.587760 1 model_config_utils.cc:1943] "\tModelConfig::version_policy::specific::versions"
I1020 05:12:57.588199 1 python_be.cc:1485] "Using Python execution env /mnt/data/model_repository/fast-text-service/fasttext-server.tar.gz"
I1020 05:12:57.588459 1 pb_env.cc:292] "Extracting Python execution env /mnt/data/model_repository/fast-text-service/fasttext-server.tar.gz"
I1020 05:12:58.119991 1 stub_launcher.cc:385] "Starting Python backend stub: source /tmp/python_env_V8xPqb/0/bin/activate && exec env LD_LIBRARY_PATH=/tmp/python_env_V8xPqb/0/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /mnt/data/model_repository/fast-text-service/1/model.py triton_python_backend_shm_region_a2fedd57-ea76-415c-9ec4-31883a32f342 1048576 1048576 1 /opt/tritonserver/backends/python 336 fast-text-service DEFAULT"
I1020 05:12:58.157720 98 pb_stub.cc:298]  Failed to initialize Python stub for auto-complete: ImportError: Error importing numpy: you should not try to import numpy from
        its source directory; please exit the numpy source tree, and relaunch
        your python interpreter from there.

At:
  /tmp/python_env_V8xPqb/0/lib/python3.10/site-packages/numpy/__init__.py(119): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load
  /opt/tritonserver/backends/python/triton_python_backend_utils.py(30): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load
  /mnt/data/model_repository/fast-text-service/1/model.py(1): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load

I1020 05:12:58.158159 1 python_be.cc:1902] "TRITONBACKEND_ModelFinalize: delete model state"
E1020 05:12:58.158182 1 model_lifecycle.cc:642] "failed to load 'fast-text-service' version 1: Internal: ImportError: Error importing numpy: you should not try to import numpy from\n        its source directory; please exit the numpy source tree, and relaunch\n        your python interpreter from there.\n\nAt:\n  /tmp/python_env_V8xPqb/0/lib/python3.10/site-packages/numpy/__init__.py(119): <module>\n  <frozen importlib._bootstrap>(241): _call_with_frames_removed\n  <frozen importlib._bootstrap_external>(883): exec_module\n  <frozen importlib._bootstrap>(703): _load_unlocked\n  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked\n  <frozen importlib._bootstrap>(1027): _find_and_load\n  /opt/tritonserver/backends/python/triton_python_backend_utils.py(30): <module>\n  <frozen importlib._bootstrap>(241): _call_with_frames_removed\n  <frozen importlib._bootstrap_external>(883): exec_module\n  <frozen importlib._bootstrap>(703): _load_unlocked\n  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked\n  <frozen importlib._bootstrap>(1027): _find_and_load\n  /mnt/data/model_repository/fast-text-service/1/model.py(1): <module>\n  <frozen importlib._bootstrap>(241): _call_with_frames_removed\n  <frozen importlib._bootstrap_external>(883): exec_module\n  <frozen importlib._bootstrap>(703): _load_unlocked\n  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked\n  <frozen importlib._bootstrap>(1027): _find_and_load\n"
I1020 05:12:58.158204 1 model_lifecycle.cc:777] "failed to load 'fast-text-service'"
I1020 05:12:58.158289 1 server.cc:604] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1020 05:12:58.158645 1 server.cc:631] 
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path                                                  | Config                                                                                                                                                        |
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python  | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1020 05:12:58.158680 1 server.cc:674] 
+-------------------+---------+----------------------------------------------------------------------------------------------------+
| Model             | Version | Status                                                                                             |
+-------------------+---------+----------------------------------------------------------------------------------------------------+
| fast-text-service | 1       | UNAVAILABLE: Internal: ImportError: Error importing numpy: you should not try to import numpy from |
|                   |         |         its source directory; please exit the numpy source tree, and relaunch                      |
|                   |         |         your python interpreter from there.                                                        |
|                   |         |                                                                                                    |
|                   |         | At:                                                                                                |
|                   |         |   /tmp/python_env_V8xPqb/0/lib/python3.10/site-packages/numpy/__init__.py(119): <module>           |
|                   |         |   <frozen importlib._bootstrap>(241): _call_with_frames_removed                                    |
|                   |         |   <frozen importlib._bootstrap_external>(883): exec_module                                         |
|                   |         |   <frozen importlib._bootstrap>(703): _load_unlocked                                               |
|                   |         |   <frozen importlib._bootstrap>(1006): _find_and_load_unlocked                                     |
|                   |         |   <frozen importlib._bootstrap>(1027): _find_and_load                                              |
|                   |         |   /opt/tritonserver/backends/python/triton_python_backend_utils.py(30): <module>                   |
|                   |         |   <frozen importlib._bootstrap>(241): _call_with_frames_removed                                    |
|                   |         |   <frozen importlib._bootstrap_external>(883): exec_module                                         |
|                   |         |   <frozen importlib._bootstrap>(703): _load_unlocked                                               |
|                   |         |   <frozen importlib._bootstrap>(1006): _find_and_load_unlocked                                     |
|                   |         |   <frozen importlib._bootstrap>(1027): _find_and_load                                              |
|                   |         |   /mnt/data/model_repository/fast-text-service/1/model.py(1): <module>                             |
|                   |         |   <frozen importlib._bootstrap>(241): _call_with_frames_removed                                    |
|                   |         |   <frozen importlib._bootstrap_external>(883): exec_module                                         |
|                   |         |   <frozen importlib._bootstrap>(703): _load_unlocked                                               |
|                   |         |   <frozen importlib._bootstrap>(1006): _find_and_load_unlocked                                     |
|                   |         |   <frozen importlib._bootstrap>(1027): _find_and_load                                              |
+-------------------+---------+----------------------------------------------------------------------------------------------------+

I1020 05:12:58.158773 1 metrics.cc:770] "Collecting CPU metrics"
I1020 05:12:58.158837 1 tritonserver.cc:2598] 
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.50.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /mnt/data/model_repository                                                                                                                                                                                      |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
| strict_model_config              | 0                                                                                                                                                                                                               |
| model_config_name                |                                                                                                                                                                                                                 |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1020 05:12:58.158866 1 server.cc:305] "Waiting for in-flight requests to complete."
I1020 05:12:58.158868 1 server.cc:321] "Timeout 30: Found 0 model versions that have in-flight inferences"
I1020 05:12:58.158926 1 server.cc:336] "All models are stopped, unloading models"
I1020 05:12:58.158934 1 server.cc:345] "Timeout 30: Found 0 live models and 0 in-flight non-inference requests"
I1020 05:12:58.158947 1 backend_manager.cc:138] "unloading backend 'python'"
I1020 05:12:58.158950 1 python_be.cc:1859] "TRITONBACKEND_Finalize: Start"
I1020 05:12:58.221014 1 python_be.cc:1864] "TRITONBACKEND_Finalize: End"
error: creating server: Internal - failed to load all models
TheMightyRaider commented 1 day ago

Update on this, I was able to spin up the service on a linux machine with AMD processor. So I'm assuming that this issue is specific to M2 chips.