triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
7.93k stars 1.44k forks source link

Python Backend: UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'model' #7410

Closed jlewi closed 1 month ago

jlewi commented 1 month ago

Description Triton fails to load the python backend add_sub example model. I get the error

I0704 00:33:26.738428 168 server.cc:676] 
+----------------+---------+---------------------------------------------------------------------+
| Model          | Version | Status                                                              |
+----------------+---------+---------------------------------------------------------------------+
| add_sub        | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'model' |
| identity_model | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'model' |
+----------------+---------+---------------------------------------------------------------------+

I believe the error indicates that the model.py file can't be found on the python path. Can anyone provide any information on why this might happen and how to fix it?

Here are the full logs.

3ecaa3fc739c:/usr# /usr/local/triton/bin/tritonserver --model-repository=/work/packages/models  --backend-directory=/usr/local/triton/backends/ --log-verbose 20
I0704 00:33:26.717550 168 cache_manager.cc:480] "Create CacheManager with cache_dir: '/opt/tritonserver/caches'"
W0704 00:33:26.717676 168 pinned_memory_manager.cc:271] "Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version"
I0704 00:33:26.717697 168 cuda_memory_manager.cc:117] "CUDA memory pool disabled"
E0704 00:33:26.717729 168 server.cc:243] "CudaDriverHelper has not been initialized."
I0704 00:33:26.718693 168 model_config_utils.cc:681] "Server side auto-completed config: "
name: "add_sub"
input {
  name: "INPUT0"
  data_type: TYPE_FP32
  dims: 4
}
input {
  name: "INPUT1"
  data_type: TYPE_FP32
  dims: 4
}
output {
  name: "OUTPUT0"
  data_type: TYPE_FP32
  dims: 4
}
output {
  name: "OUTPUT1"
  data_type: TYPE_FP32
  dims: 4
}
instance_group {
  kind: KIND_CPU
}
default_model_filename: "model.py"
backend: "python"

I0704 00:33:26.718796 168 model_config_utils.cc:681] "Server side auto-completed config: "
name: "identity_model"
max_batch_size: 8
input {
  name: "INPUT0"
  data_type: TYPE_FP32
  dims: 16
}
output {
  name: "OUTPUT0"
  data_type: TYPE_FP32
  dims: 16
}
default_model_filename: "model.py"
backend: "python"

I0704 00:33:26.718825 168 model_lifecycle.cc:441] "AsyncLoad() 'add_sub'"
I0704 00:33:26.718846 168 model_lifecycle.cc:472] "loading: add_sub:1"
I0704 00:33:26.718860 168 model_lifecycle.cc:441] "AsyncLoad() 'identity_model'"
I0704 00:33:26.718876 168 model_lifecycle.cc:472] "loading: identity_model:1"
I0704 00:33:26.718944 168 model_lifecycle.cc:550] "CreateModel() 'identity_model' version 1"
I0704 00:33:26.718945 168 model_lifecycle.cc:550] "CreateModel() 'add_sub' version 1"
I0704 00:33:26.719004 168 backend_model.cc:503] "Adding default backend config setting: default-max-batch-size,4"
I0704 00:33:26.719022 168 shared_library.cc:112] "OpenLibraryHandle: /usr/local/triton/backends/python/libtriton_python.so"
I0704 00:33:26.719022 168 backend_model.cc:503] "Adding default backend config setting: default-max-batch-size,4"
I0704 00:33:26.719943 168 python_be.cc:2099] "'python' TRITONBACKEND API version: 1.19"
I0704 00:33:26.719958 168 python_be.cc:2121] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/usr/local/triton/backends/\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I0704 00:33:26.719974 168 python_be.cc:2259] "Shared memory configuration is shm-default-byte-size=1048576,shm-growth-byte-size=1048576,stub-timeout-seconds=30"
I0704 00:33:26.720068 168 python_be.cc:2582] "TRITONBACKEND_GetBackendAttribute: setting attributes"
I0704 00:33:26.720094 168 python_be.cc:2360] "TRITONBACKEND_ModelInitialize: identity_model (version 1)"
I0704 00:33:26.720113 168 python_be.cc:2360] "TRITONBACKEND_ModelInitialize: add_sub (version 1)"
I0704 00:33:26.720434 168 model_config_utils.cc:1902] "ModelConfig 64-bit fields:"
I0704 00:33:26.720445 168 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::default_priority_level"
I0704 00:33:26.720448 168 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds"
I0704 00:33:26.720470 168 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::max_queue_delay_microseconds"
I0704 00:33:26.720473 168 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::priority_levels"
I0704 00:33:26.720476 168 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::priority_queue_policy::key"
I0704 00:33:26.720479 168 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds"
I0704 00:33:26.720482 168 model_config_utils.cc:1904] "\tModelConfig::ensemble_scheduling::step::model_version"
I0704 00:33:26.720485 168 model_config_utils.cc:1904] "\tModelConfig::input::dims"
I0704 00:33:26.720488 168 model_config_utils.cc:1904] "\tModelConfig::input::reshape::shape"
I0704 00:33:26.720491 168 model_config_utils.cc:1904] "\tModelConfig::instance_group::secondary_devices::device_id"
I0704 00:33:26.720494 168 model_config_utils.cc:1904] "\tModelConfig::model_warmup::inputs::value::dims"
I0704 00:33:26.720497 168 model_config_utils.cc:1904] "\tModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim"
I0704 00:33:26.720500 168 model_config_utils.cc:1904] "\tModelConfig::optimization::cuda::graph_spec::input::value::dim"
I0704 00:33:26.720503 168 model_config_utils.cc:1904] "\tModelConfig::output::dims"
I0704 00:33:26.720512 168 model_config_utils.cc:1904] "\tModelConfig::output::reshape::shape"
I0704 00:33:26.720515 168 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::direct::max_queue_delay_microseconds"
I0704 00:33:26.720519 168 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::max_sequence_idle_microseconds"
I0704 00:33:26.720522 168 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::oldest::max_queue_delay_microseconds"
I0704 00:33:26.720529 168 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::state::dims"
I0704 00:33:26.720537 168 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::state::initial_state::dims"
I0704 00:33:26.720543 168 model_config_utils.cc:1904] "\tModelConfig::version_policy::specific::versions"
I0704 00:33:26.720896 168 stub_launcher.cc:385] "Starting Python backend stub:  exec /usr/local/triton/backends/python/triton_python_backend_stub /work/packages/models/identity_model/1/model.py triton_python_backend_shm_region_92549c78-1875-47aa-82c4-9a4b97dfe997 1048576 1048576 168 /usr/local/triton/backends/python 336 identity_model /usr/local/triton/backends/python"
I0704 00:33:26.720940 168 stub_launcher.cc:385] "Starting Python backend stub:  exec /usr/local/triton/backends/python/triton_python_backend_stub /work/packages/models/add_sub/1/model.py triton_python_backend_shm_region_3bf73098-8ef4-45ef-bb8f-bbfe030d8c42 1048576 1048576 168 /usr/local/triton/backends/python 336 add_sub /usr/local/triton/backends/python"
I0704 00:33:26.737228 177 pb_stub.cc:317]  Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'model'
I0704 00:33:26.737440 178 pb_stub.cc:317]  Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'model'
I0704 00:33:26.738161 168 python_be.cc:2383] "TRITONBACKEND_ModelFinalize: delete model state"
E0704 00:33:26.738193 168 model_lifecycle.cc:641] "failed to load 'identity_model' version 1: Internal: ModuleNotFoundError: No module named 'model'"
I0704 00:33:26.738210 168 model_lifecycle.cc:695] "OnLoadComplete() 'identity_model' version 1"
I0704 00:33:26.738206 168 python_be.cc:2383] "TRITONBACKEND_ModelFinalize: delete model state"
I0704 00:33:26.738224 168 model_lifecycle.cc:733] "OnLoadFinal() 'identity_model' for all version(s)"
I0704 00:33:26.738242 168 model_lifecycle.cc:776] "failed to load 'identity_model'"
E0704 00:33:26.738245 168 model_lifecycle.cc:641] "failed to load 'add_sub' version 1: Internal: ModuleNotFoundError: No module named 'model'"
I0704 00:33:26.738274 168 model_lifecycle.cc:695] "OnLoadComplete() 'add_sub' version 1"
I0704 00:33:26.738284 168 model_lifecycle.cc:733] "OnLoadFinal() 'add_sub' for all version(s)"
I0704 00:33:26.738287 168 model_lifecycle.cc:776] "failed to load 'add_sub'"
I0704 00:33:26.738311 168 model_lifecycle.cc:297] "VersionStates() 'add_sub'"
I0704 00:33:26.738329 168 model_lifecycle.cc:297] "VersionStates() 'identity_model'"
I0704 00:33:26.738341 168 model_lifecycle.cc:297] "VersionStates() 'identity_model'"
I0704 00:33:26.738356 168 server.cc:606] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0704 00:33:26.738379 168 server.cc:633] 
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path                                                  | Config                                                                                                                                                         |
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python  | /usr/local/triton/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/usr/local/triton/backends/","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0704 00:33:26.738413 168 model_lifecycle.cc:276] "ModelStates()"
I0704 00:33:26.738428 168 server.cc:676] 
+----------------+---------+---------------------------------------------------------------------+
| Model          | Version | Status                                                              |
+----------------+---------+---------------------------------------------------------------------+
| add_sub        | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'model' |
| identity_model | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'model' |
+----------------+---------+---------------------------------------------------------------------+

I0704 00:33:26.738478 168 tritonserver.cc:2557] 
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.46.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /work/packages/models                                                                                                                                                                                           |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
| strict_model_config              | 0                                                                                                                                                                                                               |
| model_config_name                |                                                                                                                                                                                                                 |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0704 00:33:26.738540 168 server.cc:307] "Waiting for in-flight requests to complete."
I0704 00:33:26.738546 168 model_lifecycle.cc:226] "StopAllModels()"
I0704 00:33:26.738553 168 model_lifecycle.cc:244] "InflightStatus()"
I0704 00:33:26.738559 168 server.cc:323] "Timeout 30: Found 0 model versions that have in-flight inferences"
I0704 00:33:26.738596 168 model_lifecycle.cc:393] "AsyncUnload() 'add_sub'"
I0704 00:33:26.738603 168 model_lifecycle.cc:393] "AsyncUnload() 'identity_model'"
I0704 00:33:26.738616 168 server.cc:338] "All models are stopped, unloading models"
I0704 00:33:26.738621 168 model_lifecycle.cc:193] "LiveModelStates()"
I0704 00:33:26.738628 168 model_lifecycle.cc:268] "BackgroundModelsSize()"
I0704 00:33:26.738634 168 server.cc:347] "Timeout 30: Found 0 live models and 0 in-flight non-inference requests"
I0704 00:33:26.738642 168 backend_manager.cc:138] "unloading backend 'python'"
I0704 00:33:26.738649 168 python_be.cc:2340] "TRITONBACKEND_Finalize: Start"
I0704 00:33:26.738719 168 python_be.cc:2345] "TRITONBACKEND_Finalize: End"
error: creating server: Internal - failed to load all models

My models directory looks like the following

3ecaa3fc739c:/usr# tree  /work/packages/models  
/work/packages/models
├── add_sub
│   ├── 1
│   │   └── model.py
│   └── config.pbtxt
└── identity_model
    ├── 1
    │   ├── __pycache__
    │   │   └── model.cpython-312.pyc
    │   └── model.py
    └── config.pbtxt

5 directories, 5 files

Triton Information What version of Triton are you using? 2.46.0
Are you using the Triton container or did you build it yourself? I built it myself

sourabh-burnwal commented 1 month ago

@jlewi

  1. Are you importing any model inside your model.py?
  2. How was this image built?
jlewi commented 1 month ago

@sourabh-burnwal Appreciate you looking into this.

I'm using this model from the examples directory. https://github.com/triton-inference-server/python_backend/blob/main/examples/add_sub/model.py

I doesn't import "model" at least not directly. My assumption was that the ModuleNotFoundError: No module named 'model' was refererring to model.py.

The error UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'model' looks like a gRPC error.

Looking at the logs. It looks like it tries to start the backend stub for both models and then errors out trying to load the autocomplete models.

I0704 00:33:26.720896 168 stub_launcher.cc:385] "Starting Python backend stub:  exec /usr/local/triton/backends/python/triton_python_backend_stub /work/packages/models/identity_model/1/model.py triton_python_backend_shm_region_92549c78-1875-47aa-82c4-9a4b97dfe997 1048576 1048576 168 /usr/local/triton/backends/python 336 identity_model /usr/local/triton/backends/python"
I0704 00:33:26.720940 168 stub_launcher.cc:385] "Starting Python backend stub:  exec /usr/local/triton/backends/python/triton_python_backend_stub /work/packages/models/add_sub/1/model.py triton_python_backend_shm_region_3bf73098-8ef4-45ef-bb8f-bbfe030d8c42 1048576 1048576 168 /usr/local/triton/backends/python 336 add_sub /usr/local/triton/backends/python"
I0704 00:33:26.737228 177 pb_stub.cc:317]  Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'model'
I0704 00:33:26.737440 178 pb_stub.cc:317]  Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'model'

It looks like if there is a failure to load the AutoComplete config it will cause the stub process to terminate https://github.com/triton-inference-server/python_backend/blob/c848884d24e71b20a1636e7e63435ca8daba097b/src/pb_stub.cc#L330

So it looks like

Do you have any idea why "model.py" would fail to load with a NotFoundError? Do you have pointers to the code where it happens.

It looks like the model is referred to here https://github.com/triton-inference-server/python_backend/blob/c8b188f26a4e80c7204baaf73e27f11c33f52f57/src/pb_stub.cc#L499C29-L499C46

But I can't see where the attribute TritonPythonModel gets set.

How Am I Building It

I'm building it Wolfi OS. Using the CMAKE generated by running build.py.

Here's my cmake script for triton

      export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-12.3/lib:/usr/local/cuda-12.3/lib64:/usr/lib"
      export PATH="$PATH:/usr/local/cuda-12.3/bin"      
      export BUILDDIR=/tmp/tritonbuild/tritonservver/build
      export CMAKE_PREFIX_PATH=${BUILDDIR}/third-party/absl
      export CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH}:${BUILDDIR}/third-party/cares
      export CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH}:${BUILDDIR}/third-party/curl
      export CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH}:${BUILDDIR}/third-party/cnmem
      export CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH}:${BUILDDIR}/third-party/googletest
      export CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH}:${BUILDDIR}/third-party/google-cloud-cpp
      export CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH}:${BUILDDIR}/third-party/grpc
      export CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH}:${BUILDDIR}/third-party/nlohmann_json
      export CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH}:${BUILDDIR}/third-party/opentelemetry-cpp      
      export CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH}:${BUILDDIR}/third-party/protobuf
      export CMAKE_PREFIX_PATH=${CMAKE_PREFIX_PATH}:${BUILDDIR}/third-party/re2
      mkdir -p "${BUILDDIR}"
      cd ${BUILDDIR}
      export TRT_VERSION=10.0.1.6
      cmake "-DTRT_VERSION=${TRT_VERSION}" \
      "-DCMAKE_TOOLCHAIN_FILE=${CMAKE_TOOLCHAIN_FILE}" \
      "-DVCPKG_TARGET_TRIPLET=${VCPKG_TARGET_TRIPLET}" \
      "-DCMAKE_BUILD_TYPE=Release" \
      "-DCMAKE_INSTALL_PREFIX:PATH=/tmp/tritonbuild/tritonserver/install" \
      "-DTRITON_VERSION:STRING=2.46.0" \
      "-DTRITON_REPO_ORGANIZATION:STRING=https://github.com/triton-inference-server" \
      "-DTRITON_COMMON_REPO_TAG:STRING=r24.05" \
      "-DTRITON_CORE_REPO_TAG:STRING=r24.05" \
      "-DTRITON_BACKEND_REPO_TAG:STRING=r24.05" \
      "-DTRITON_THIRD_PARTY_REPO_TAG:STRING=r24.05" \
      "-DTRITON_ENABLE_LOGGING:BOOL=ON" \
      "-DTRITON_ENABLE_STATS:BOOL=ON" \
      "-DTRITON_ENABLE_METRICS:BOOL=OFF" \
      "-DTRITON_ENABLE_METRICS_GPU:BOOL=OFF" \
      "-DTRITON_ENABLE_METRICS_CPU:BOOL=OFF" \
      "-DTRITON_ENABLE_TRACING:BOOL=ON" \
      "-DTRITON_ENABLE_NVTX:BOOL=OFF" \
      "-DTRITON_ENABLE_GPU:BOOL=ON" \
      "-DTRITON_MIN_COMPUTE_CAPABILITY=6.0" \
      "-DTRITON_ENABLE_MALI_GPU:BOOL=OFF" \
      "-DTRITON_ENABLE_GRPC:BOOL=ON" \
      "-DTRITON_ENABLE_HTTP:BOOL=OFF" \
      "-DTRITON_ENABLE_SAGEMAKER:BOOL=OFF" \
      "-DTRITON_ENABLE_VERTEX_AI:BOOL=OFF" \
      "-DTRITON_ENABLE_GCS:BOOL=OFF" \
      "-DTRITON_ENABLE_S3:BOOL=OFF" \
      "-DTRITON_ENABLE_AZURE_STORAGE:BOOL=OFF" \
      "-DTRITON_ENABLE_ENSEMBLE:BOOL=ON" \
      "-DTRITON_ENABLE_TENSORRT:BOOL=OFF" \
      "-DEVENT__HAVE_ARC4RANDOM=0" \
      "-DEVENT__HAVE_ARC4RANDOM_BUF=0" \
      /home/build/server

And for the python backend

     export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-12.3/lib:/usr/local/cuda-12.3/lib64:/usr/lib"

      export PATH="$PATH:/usr/local/cuda-12.3/bin"      
      export BUILDDIR=/tmp/tritonbuild/pythonbackend/build
      export SRCDIR=/home/build/backend

      mkdir -p "${BUILDDIR}"
      cd ${BUILDDIR}
      export TRT_VERSION=10.0.1.6

      mkdir -p ${BUILDDIR}

      export INSTALLDIR=/tmp/tritonbuild/python/install

      cd ${BUILDDIR}
      cmake "-DTRT_VERSION=${TRT_VERSION}" \
      "-DCMAKE_TOOLCHAIN_FILE=${CMAKE_TOOLCHAIN_FILE}" \
      "-DVCPKG_TARGET_TRIPLET=${VCPKG_TARGET_TRIPLET}" \
      "-DCMAKE_BUILD_TYPE=Release" \
      "-DCMAKE_INSTALL_PREFIX:PATH=${INSTALLDIR}" \
      "-DTRITON_REPO_ORGANIZATION:STRING=https://github.com/triton-inference-server" \
      "-DTRITON_COMMON_REPO_TAG:STRING=r24.05" \
      "-DTRITON_CORE_REPO_TAG:STRING=r24.05" \
      "-DTRITON_BACKEND_REPO_TAG:STRING=r24.05" \
      "-DTRITON_ENABLE_GPU:BOOL=ON" \
      "-DTRITON_ENABLE_MALI_GPU:BOOL=OFF" \
      "-DTRITON_ENABLE_STATS:BOOL=ON" \
      "-DTRITON_ENABLE_METRICS:BOOL=ON" \
      "-DTRITON_ENABLE_MEMORY_TRACKER:BOOL=OFF" \
      "-DCMAKE_CXX_FLAGS=\"-Wno-error=deprecated-declarations\""  \
      ${SRCDIR}

      cmake --build . --config Release -j20  -t install
sourabh-burnwal commented 1 month ago

Thanks for the detailed reply @jlewi. I tried loading the models with the official image, and it worked fine. Your error may have something to do with the build itself. Can you share the steps you followed to build the image? I will try to reproduce it.

jlewi commented 1 month ago

It looks like the following command is used to startup the stub.

I0704 00:33:26.720940 168 stub_launcher.cc:385] "Starting Python backend stub:  exec /usr/local/triton/backends/python/triton_python_backend_stub /work/packages/models/add_sub/1/model.py triton_python_backend_shm_region_3bf73098-8ef4-45ef-bb8f-bbfe030d8c42 1048576 1048576 168 /usr/local/triton/backends/python 336 add_sub /usr/local/triton/backends/python"

It looks like the final argument /usr/local/triton/backends/python ends up being used here to locate the model https://github.com/triton-inference-server/python_backend/blob/c8b188f26a4e80c7204baaf73e27f11c33f52f57/src/pb_stub.cc#L1999 https://github.com/triton-inference-server/python_backend/blob/c8b188f26a4e80c7204baaf73e27f11c33f52f57/src/pb_stub.cc#L1883

In which case the final argument isn't correct the model file is not located in /usr/local/triton/backends/python its located /work/packages/models/add_sub/1/model.py

So one hypothesis that the final argument isn't getting set correctly and that's why it can't locate the model. I'm not sure where that parameter gets set.

jlewi commented 1 month ago

So copying model.path into the backends directory fixed the error and the gRPC server started.

cp /work/packages/models/add_sub/1/model.py  /usr/local/triton/backends/python/

So it looks like its looking in the backends directory for model.py. I think this indicates its not a build problem. I suspect I'm misconfiguring something and not using python backend correctly.

Logs of server startup

d9617b071a74:/work/packages# cp /work/packages/models/add_sub/1/model.py  /usr/local/triton/backends/python/
d9617b071a74:/work/packages# /usr/local/triton/bin/tritonserver --model-repository=/work/packages/models  --backend-directory=/usr/local/triton/backends/ --log-verbose 20
I0709 21:42:36.708812 93 cache_manager.cc:480] "Create CacheManager with cache_dir: '/opt/tritonserver/caches'"
W0709 21:42:36.708987 93 pinned_memory_manager.cc:271] "Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version"
I0709 21:42:36.709015 93 cuda_memory_manager.cc:117] "CUDA memory pool disabled"
E0709 21:42:36.709062 93 server.cc:243] "CudaDriverHelper has not been initialized."
I0709 21:42:36.710320 93 model_config_utils.cc:681] "Server side auto-completed config: "
name: "add_sub"
input {
  name: "INPUT0"
  data_type: TYPE_FP32
  dims: 4
}
input {
  name: "INPUT1"
  data_type: TYPE_FP32
  dims: 4
}
output {
  name: "OUTPUT0"
  data_type: TYPE_FP32
  dims: 4
}
output {
  name: "OUTPUT1"
  data_type: TYPE_FP32
  dims: 4
}
instance_group {
  kind: KIND_CPU
}
default_model_filename: "model.py"
backend: "python"

I0709 21:42:36.710386 93 model_lifecycle.cc:441] "AsyncLoad() 'add_sub'"
I0709 21:42:36.710414 93 model_lifecycle.cc:472] "loading: add_sub:1"
I0709 21:42:36.710489 93 model_lifecycle.cc:550] "CreateModel() 'add_sub' version 1"
I0709 21:42:36.710570 93 backend_model.cc:503] "Adding default backend config setting: default-max-batch-size,4"
I0709 21:42:36.710613 93 shared_library.cc:112] "OpenLibraryHandle: /usr/local/triton/backends/python/libtriton_python.so"
I0709 21:42:36.711900 93 python_be.cc:2099] "'python' TRITONBACKEND API version: 1.19"
I0709 21:42:36.711919 93 python_be.cc:2121] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/usr/local/triton/backends/\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I0709 21:42:36.711947 93 python_be.cc:2259] "Shared memory configuration is shm-default-byte-size=1048576,shm-growth-byte-size=1048576,stub-timeout-seconds=30"
I0709 21:42:36.712061 93 python_be.cc:2582] "TRITONBACKEND_GetBackendAttribute: setting attributes"
I0709 21:42:36.712092 93 python_be.cc:2360] "TRITONBACKEND_ModelInitialize: add_sub (version 1)"
I0709 21:42:36.712517 93 model_config_utils.cc:1902] "ModelConfig 64-bit fields:"
I0709 21:42:36.712529 93 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::default_priority_level"
I0709 21:42:36.712534 93 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds"
I0709 21:42:36.712538 93 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::max_queue_delay_microseconds"
I0709 21:42:36.712546 93 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::priority_levels"
I0709 21:42:36.712551 93 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::priority_queue_policy::key"
I0709 21:42:36.712557 93 model_config_utils.cc:1904] "\tModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds"
I0709 21:42:36.712563 93 model_config_utils.cc:1904] "\tModelConfig::ensemble_scheduling::step::model_version"
I0709 21:42:36.712567 93 model_config_utils.cc:1904] "\tModelConfig::input::dims"
I0709 21:42:36.712573 93 model_config_utils.cc:1904] "\tModelConfig::input::reshape::shape"
I0709 21:42:36.712577 93 model_config_utils.cc:1904] "\tModelConfig::instance_group::secondary_devices::device_id"
I0709 21:42:36.712584 93 model_config_utils.cc:1904] "\tModelConfig::model_warmup::inputs::value::dims"
I0709 21:42:36.712591 93 model_config_utils.cc:1904] "\tModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim"
I0709 21:42:36.712596 93 model_config_utils.cc:1904] "\tModelConfig::optimization::cuda::graph_spec::input::value::dim"
I0709 21:42:36.712616 93 model_config_utils.cc:1904] "\tModelConfig::output::dims"
I0709 21:42:36.712619 93 model_config_utils.cc:1904] "\tModelConfig::output::reshape::shape"
I0709 21:42:36.712623 93 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::direct::max_queue_delay_microseconds"
I0709 21:42:36.712628 93 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::max_sequence_idle_microseconds"
I0709 21:42:36.712635 93 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::oldest::max_queue_delay_microseconds"
I0709 21:42:36.712640 93 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::state::dims"
I0709 21:42:36.712645 93 model_config_utils.cc:1904] "\tModelConfig::sequence_batching::state::initial_state::dims"
I0709 21:42:36.712652 93 model_config_utils.cc:1904] "\tModelConfig::version_policy::specific::versions"
I0709 21:42:36.713100 93 stub_launcher.cc:385] "Starting Python backend stub:  exec /usr/local/triton/backends/python/triton_python_backend_stub /work/packages/models/add_sub/1/model.py triton_python_backend_shm_region_db2ad874-7b2e-49e8-bdd0-3d79878e01cf 1048576 1048576 93 /usr/local/triton/backends/python 336 add_sub /usr/local/triton/backends/python"
I0709 21:42:37.949203 93 python_be.cc:2055] "model configuration:\n{\n    \"name\": \"add_sub\",\n    \"platform\": \"\",\n    \"backend\": \"python\",\n    \"runtime\": \"\",\n    \"version_policy\": {\n        \"latest\": {\n            \"num_versions\": 1\n        }\n    },\n    \"max_batch_size\": 0,\n    \"input\": [\n        {\n            \"name\": \"INPUT0\",\n            \"data_type\": \"TYPE_FP32\",\n            \"format\": \"FORMAT_NONE\",\n            \"dims\": [\n                4\n            ],\n            \"is_shape_tensor\": false,\n            \"allow_ragged_batch\": false,\n            \"optional\": false\n        },\n        {\n            \"name\": \"INPUT1\",\n            \"data_type\": \"TYPE_FP32\",\n            \"format\": \"FORMAT_NONE\",\n            \"dims\": [\n                4\n            ],\n            \"is_shape_tensor\": false,\n            \"allow_ragged_batch\": false,\n            \"optional\": false\n        }\n    ],\n    \"output\": [\n        {\n            \"name\": \"OUTPUT0\",\n            \"data_type\": \"TYPE_FP32\",\n            \"dims\": [\n                4\n            ],\n            \"label_filename\": \"\",\n            \"is_shape_tensor\": false\n        },\n        {\n            \"name\": \"OUTPUT1\",\n            \"data_type\": \"TYPE_FP32\",\n            \"dims\": [\n                4\n            ],\n            \"label_filename\": \"\",\n            \"is_shape_tensor\": false\n        }\n    ],\n    \"batch_input\": [],\n    \"batch_output\": [],\n    \"optimization\": {\n        \"priority\": \"PRIORITY_DEFAULT\",\n        \"input_pinned_memory\": {\n            \"enable\": true\n        },\n        \"output_pinned_memory\": {\n            \"enable\": true\n        },\n        \"gather_kernel_buffer_threshold\": 0,\n        \"eager_batching\": false\n    },\n    \"instance_group\": [\n        {\n            \"name\": \"add_sub_0\",\n            \"kind\": \"KIND_CPU\",\n            \"count\": 1,\n            \"gpus\": [],\n            \"secondary_devices\": [],\n            \"profile\": [],\n            \"passive\": false,\n            \"host_policy\": \"\"\n        }\n    ],\n    \"default_model_filename\": \"model.py\",\n    \"cc_model_filenames\": {},\n    \"metric_tags\": {},\n    \"parameters\": {},\n    \"model_warmup\": []\n}"
I0709 21:42:37.949518 93 python_be.cc:2404] "TRITONBACKEND_ModelInstanceInitialize: add_sub_0_0 (CPU device 0)"
I0709 21:42:37.949546 93 backend_model_instance.cc:69] "Creating instance add_sub_0_0 on CPU using artifact 'model.py'"
I0709 21:42:37.950000 93 stub_launcher.cc:385] "Starting Python backend stub:  exec /usr/local/triton/backends/python/triton_python_backend_stub /work/packages/models/add_sub/1/model.py triton_python_backend_shm_region_f757b7a2-012d-4215-92d0-b8ed12f3cd9a 1048576 1048576 93 /usr/local/triton/backends/python 336 add_sub_0_0 /usr/local/triton/backends/python"
I0709 21:42:38.075827 93 python_be.cc:2425] "TRITONBACKEND_ModelInstanceInitialize: instance initialization successful add_sub_0_0 (device 0)"
I0709 21:42:38.076033 93 backend_model_instance.cc:772] "Starting backend thread for add_sub_0_0 at nice 0 on device 0..."
I0709 21:42:38.076130 93 backend_model.cc:675] "Created model instance named 'add_sub_0_0' with device id '0'"
I0709 21:42:38.076280 93 model_lifecycle.cc:695] "OnLoadComplete() 'add_sub' version 1"
I0709 21:42:38.076308 93 model_lifecycle.cc:733] "OnLoadFinal() 'add_sub' for all version(s)"
I0709 21:42:38.076334 93 model_lifecycle.cc:838] "successfully loaded 'add_sub'"
I0709 21:42:38.076399 93 model_lifecycle.cc:297] "VersionStates() 'add_sub'"
I0709 21:42:38.076451 93 model_lifecycle.cc:297] "VersionStates() 'add_sub'"
I0709 21:42:38.076499 93 server.cc:606] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0709 21:42:38.076541 93 server.cc:633] 
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path                                                  | Config                                                                                                                                           |
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| python  | /usr/local/triton/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/usr/local/triton/backends/","min-compute-capability":"6.000000","default-max-bat |
|         |                                                       | ch-size":"4"}}                                                                                                                                   |
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+

I0709 21:42:38.076609 93 model_lifecycle.cc:276] "ModelStates()"
I0709 21:42:38.076632 93 server.cc:676] 
+---------+---------+--------+
| Model   | Version | Status |
+---------+---------+--------+
| add_sub | 1       | READY  |
+---------+---------+--------+

I0709 21:42:38.076703 93 tritonserver.cc:2557] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                            |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                           |
| server_version                   | 2.46.0                                                                                                                                                                           |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data para |
|                                  | meters statistics trace logging                                                                                                                                                  |
| model_repository_path[0]         | /work/packages/models                                                                                                                                                            |
| model_control_mode               | MODE_NONE                                                                                                                                                                        |
| strict_model_config              | 0                                                                                                                                                                                |
| model_config_name                |                                                                                                                                                                                  |
| rate_limit                       | OFF                                                                                                                                                                              |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                        |
| min_supported_compute_capability | 6.0                                                                                                                                                                              |
| strict_readiness                 | 1                                                                                                                                                                                |
| exit_timeout                     | 30                                                                                                                                                                               |
| cache_enabled                    | 0                                                                                                                                                                                |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0709 21:42:38.078140 93 grpc_server.cc:2370] 
+----------------------------------------------+---------+
| GRPC KeepAlive Option                        | Value   |
+----------------------------------------------+---------+
| keepalive_time_ms                            | 7200000 |
| keepalive_timeout_ms                         | 20000   |
| keepalive_permit_without_calls               | 0       |
| http2_max_pings_without_data                 | 2       |
| http2_min_recv_ping_interval_without_data_ms | 300000  |
| http2_max_ping_strikes                       | 2       |
+----------------------------------------------+---------+

I0709 21:42:38.079837 93 grpc_server.cc:102] "Ready for RPC 'Check', 0"
I0709 21:42:38.079903 93 grpc_server.cc:102] "Ready for RPC 'ServerLive', 0"
I0709 21:42:38.079941 93 grpc_server.cc:102] "Ready for RPC 'ServerReady', 0"
I0709 21:42:38.079967 93 grpc_server.cc:102] "Ready for RPC 'ModelReady', 0"
I0709 21:42:38.079992 93 grpc_server.cc:102] "Ready for RPC 'ServerMetadata', 0"
I0709 21:42:38.080033 93 grpc_server.cc:102] "Ready for RPC 'ModelMetadata', 0"
I0709 21:42:38.080056 93 grpc_server.cc:102] "Ready for RPC 'ModelConfig', 0"
I0709 21:42:38.080109 93 grpc_server.cc:102] "Ready for RPC 'SystemSharedMemoryStatus', 0"
I0709 21:42:38.080136 93 grpc_server.cc:102] "Ready for RPC 'SystemSharedMemoryRegister', 0"
I0709 21:42:38.080163 93 grpc_server.cc:102] "Ready for RPC 'SystemSharedMemoryUnregister', 0"
I0709 21:42:38.080197 93 grpc_server.cc:102] "Ready for RPC 'CudaSharedMemoryStatus', 0"
I0709 21:42:38.080223 93 grpc_server.cc:102] "Ready for RPC 'CudaSharedMemoryRegister', 0"
I0709 21:42:38.080248 93 grpc_server.cc:102] "Ready for RPC 'CudaSharedMemoryUnregister', 0"
I0709 21:42:38.080276 93 grpc_server.cc:102] "Ready for RPC 'RepositoryIndex', 0"
I0709 21:42:38.080307 93 grpc_server.cc:102] "Ready for RPC 'RepositoryModelLoad', 0"
I0709 21:42:38.080337 93 grpc_server.cc:102] "Ready for RPC 'RepositoryModelUnload', 0"
I0709 21:42:38.080372 93 grpc_server.cc:102] "Ready for RPC 'ModelStatistics', 0"
I0709 21:42:38.080403 93 grpc_server.cc:102] "Ready for RPC 'Trace', 0"
I0709 21:42:38.080440 93 grpc_server.cc:102] "Ready for RPC 'Logging', 0"
I0709 21:42:38.080469 93 grpc_server.cc:366] "Thread started for CommonHandler"
I0709 21:42:38.080702 93 infer_handler.h:1198] "StateNew, 0 Step START"
I0709 21:42:38.080752 93 infer_handler.cc:680] "New request handler for ModelInferHandler, 0"
I0709 21:42:38.080798 93 infer_handler.h:1322] "Thread started for ModelInferHandler"
I0709 21:42:38.081150 93 infer_handler.h:1198] "StateNew, 0 Step START"
I0709 21:42:38.081197 93 infer_handler.cc:680] "New request handler for ModelInferHandler, 0"
I0709 21:42:38.081244 93 infer_handler.h:1322] "Thread started for ModelInferHandler"
I0709 21:42:38.081415 93 infer_handler.h:1198] "StateNew, 0 Step START"
I0709 21:42:38.081457 93 stream_infer_handler.cc:128] "New request handler for ModelStreamInferHandler, 0"
I0709 21:42:38.081474 93 infer_handler.h:1322] "Thread started for ModelStreamInferHandler"
I0709 21:42:38.081482 93 grpc_server.cc:2463] "Started GRPCInferenceService at 0.0.0.0:8001"
jlewi commented 1 month ago

It looks like the default installation directory for backends is /opt/tritonserver/backends. I'm installing the backends into /usr/tritonserver/backends

I was using the flag

  --backend-directory <string>
    The global directory searched for backend shared libraries.
    Default is '/opt/tritonserver/backends'.

To change the directory to /usr/tritonserver/backends. It looks like using a non default directory changes how the stub gets invoked. When we use the Default directory the stub gets invoked as

I0709 22:18:52.330980 302 stub_launcher.cc:385] "Starting Python backend stub:  exec /models/add_sub/triton_python_backend_stub /models/add_sub/1/model.py triton_python_backend_shm_region_ad4e5300-5abd-462b-9f90-ec385d3a113e 1048576 1048576 302 /opt/tritonserver/backends/python 336 add_sub DEFAULT"

Note the final argument is now "DEFAULT" and not the backend directory.

Here's an if statement. https://github.com/triton-inference-server/python_backend/blob/c8b188f26a4e80c7204baaf73e27f11c33f52f57/src/pb_stub.cc#L1880

So it looks like the logic is the following

  1. If the final argument "modeldir" is not default then use it to look for a model.py
    • So in this case even though modelpath is specified ; ignore it
  2. If the final argument is DEFAULT then use the modelpath to get the modelpath

This seems like very confusing semantics. I suspect this has something to do with being able to use a custom python environment for a particular model https://github.com/triton-inference-server/python_backend/tree/c8b188f26a4e80c7204baaf73e27f11c33f52f57?tab=readme-ov-file#managing-python-runtime-and-libraries

In this case it looks like it is expected that the triton_server_python_backend_stub will be colocated with the "model.py" file.

It looks like in my case, I can solve this problem by just putting the stub in the default location /opt/tritonserver/backends/python and not specifying --backend-directory.