triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
7.8k stars 1.42k forks source link

CPU-only mode unable to load Models got CUDA error #3980

Closed Tamannaverma1912 closed 2 years ago

Tamannaverma1912 commented 2 years ago

Problem Description I was trying to follow the official example starting the server on a cpu-only device by calling the command:

docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/Users/tamannaverma/triton-inference-server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:22.01-py3 tritonserver --model-repository=/models

Here is the logs:

> =============================
> == Triton Inference Server ==
> =============================
> 
> NVIDIA Release 22.01 (build 31237564)
> 
> Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
> 
> Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
> 
> This container image and its contents are governed by the NVIDIA Deep Learning Container License.
> By pulling and using the container, you accept the terms and conditions of this license:
> https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
> find: File system loop detected; '/usr/local/cuda-11.6/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda-11.6/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda-11.6/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda-11.6/compat/lib'.
> find: File system loop detected; '/usr/local/cuda-11/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda-11/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda-11/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda-11/compat/lib'.
> find: File system loop detected; '/usr/local/cuda/compat/lib.real/lib.real' is part of the same file system loop as '/usr/local/cuda/compat/lib.real'.
> find: File system loop detected; '/usr/local/cuda/compat/lib/lib.real' is part of the same file system loop as '/usr/local/cuda/compat/lib'.
> 
> WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
>    Use Docker with NVIDIA Container Toolkit to start this container; see
>    https://github.com/NVIDIA/nvidia-docker.
> 
> WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 999
> I0224 09:20:10.194531 1 libtorch.cc:1227] TRITONBACKEND_Initialize: pytorch
> I0224 09:20:10.194635 1 libtorch.cc:1237] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.194639 1 libtorch.cc:1243] 'pytorch' TRITONBACKEND API version: 1.7
> 2022-02-24 09:20:10.482327: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
> 2022-02-24 09:20:10.533967: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
> I0224 09:20:10.534722 1 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
> I0224 09:20:10.534746 1 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.534749 1 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.7
> I0224 09:20:10.534752 1 tensorflow.cc:2216] backend configuration:
> {}
> I0224 09:20:10.546856 1 onnxruntime.cc:2232] TRITONBACKEND_Initialize: onnxruntime
> I0224 09:20:10.546921 1 onnxruntime.cc:2242] Triton TRITONBACKEND API version: 1.7
> I0224 09:20:10.546924 1 onnxruntime.cc:2248] 'onnxruntime' TRITONBACKEND API version: 1.7
> I0224 09:20:10.546927 1 onnxruntime.cc:2278] backend configuration:
> {}
> W0224 09:20:10.563170 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: unknown error
> E0224 09:20:10.563244 1 server.cc:198] Failed to initialize CUDA memory manager: unable to get number of CUDA devices: unknown error
> W0224 09:20:10.563249 1 server.cc:205] failed to enable peer access for some device pairs
> E0224 09:20:10.584340 1 model_repository_manager.cc:1844] Poll failed for model directory 'densenet_onnx': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.596656 1 model_repository_manager.cc:1844] Poll failed for model directory 'inception_graphdef': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.607955 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.619405 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_dyna_sequence': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.632553 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_identity': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.640729 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_int8': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.649843 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_sequence': unable to get number of CUDA devices: unknown error
> E0224 09:20:10.661630 1 model_repository_manager.cc:1844] Poll failed for model directory 'simple_string': unable to get number of CUDA devices: unknown error
> I0224 09:20:10.661776 1 server.cc:519] 
> +------------------+------+
> | Repository Agent | Path |
> +------------------+------+
> +------------------+------+
> 
> I0224 09:20:10.661800 1 server.cc:546] 
> +-------------+-----------------------------------------------------------------+--------+
> | Backend     | Path                                                            | Config |
> +-------------+-----------------------------------------------------------------+--------+
> | pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}     |
> | tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {}     |
> | onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
> +-------------+-----------------------------------------------------------------+--------+
> 
> I0224 09:20:10.661807 1 server.cc:589] 
> +-------+---------+--------+
> | Model | Version | Status |
> +-------+---------+--------+
> +-------+---------+--------+
> 
> I0224 09:20:10.661952 1 tritonserver.cc:1865] 
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | Option                           | Value                                                                                                                                                                                  |
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | server_id                        | triton                                                                                                                                                                                 |
> | server_version                   | 2.18.0                                                                                                                                                                                 |
> | server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
> | model_repository_path[0]         | /models                                                                                                                                                                                |
> | model_control_mode               | MODE_NONE                                                                                                                                                                              |
> | strict_model_config              | 1                                                                                                                                                                                      |
> | rate_limit                       | OFF                                                                                                                                                                                    |
> | pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
> | response_cache_byte_size         | 0                                                                                                                                                                                      |
> | min_supported_compute_capability | 6.0                                                                                                                                                                                    |
> | strict_readiness                 | 1                                                                                                                                                                                      |
> | exit_timeout                     | 30                                                                                                                                                                                     |
> +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> 
> I0224 09:20:10.662202 1 server.cc:249] Waiting for in-flight requests to complete.
> I0224 09:20:10.662208 1 server.cc:264] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
> error: creating server: Internal - failed to load all models

Triton Information Version: 22.01 I am using Mac M1 pro for the local setup.

CoderHam commented 2 years ago

@Tamannaverma1912 you must use the docker --gpus=all flag when launching the Triton container to ensure the docker has access to the GPU. For running on a CPU only system you would want to use the CPU only container. @jbkyang-nvi can you provide the steps for the same?

jbkyang-nvi commented 2 years ago

Hi @Tamannaverma1912 as noted in the instructions here Triton is unable to load any model configuration that requires a GPU.

Additionally, the instructions are a bit outdated since triton-inference-server/docs/examples/model_repository does not work as is. You need to copy an actual CPU only model and put that in the model repository. I recommend doing:

mkdir models
cp -r /Users/tamannaverma/triton-inference-server/docs/examples/model_repository/simple models/
docker run -it --rm -v$PWD/models:/models nvcr.io/nvidia/tritonserver:22.01-py3
tritonserver --model-repository=/models
fabiofumarola commented 2 years ago

Hi, I have the same error. How to replicate the scenario.

  1. run the docker container of triton nvcr.io/nvidia/tritonserver:22.01-py3 on mac M1
  2. try to load a model with onnx format on CPU
name: "damage_onnx_batching"
platform: "onnxruntime_onnx"
max_batch_size: 10
dynamic_batching {
  # we group batches at least to 100 ms
  max_queue_delay_microseconds: 100000
}

input [
  {
    name: "input_0"
    data_type: TYPE_FP32
    dims: [3, 640, 640 ]
  }
]
output [
  {
    name: "output_0"
    data_type: TYPE_FP32
    dims: [9, -1, -1 ]
  }
]

model_warmup {
  name: "warmup"
  batch_size: 2
  inputs: {
    key: "input_0"
    value: {
      data_type: TYPE_FP32
      dims: 3
      dims: 640
      dims: 640
      random_data: true
    }
  }
}
instance_group [
  {
    count: 1
    kind: KIND_CPU
  }
]
  1. then I get this error
ton-playground-triton-1  | E0302 16:50:00.503077 1 model_repository_manager.cc:1844] Poll failed for model directory 'damage_onnx': unable to get number of CUDA devices: unknown error
triton-playground-triton-1  | I0302 16:50:00.503099 1 model_repository_manager.cc:546] VersionStates() 'damage_onnx'
fabiofumarola commented 2 years ago

but I think the problem is on the architecture used when building the image

jbkyang-nvi commented 2 years ago

Hi @fabiofumarola can you share the model you are using? So I can try to repro.

xiaoFine commented 2 years ago

Same issue. Try the command below if you just want to run the demo , which works for me

docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v $(pwd)/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:22.02-py3 tritonserver  --model-control-mode=explicit --load-model simple --model-repository=/models

you can input multiple --load-model to load more specific models

fabiofumarola commented 2 years ago

Hi @jbkyang-nvi I'll share all the repository so that you can take a look. Anyway I've solved by building a docker image using the compose.py on my mac with m1 and specifying the in the docker build command --platform=linux/amd64

dyastremsky commented 2 years ago

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue.

amarflybot commented 2 years ago

I have the same issue with m1. Neither I am able to build.

fabiofumarola commented 2 years ago

Hi Amarendra,

I’ll send you the link of the built image tomorrow. Best, Fabio

On Mon, 25 Apr 2022 at 06:09, Amarendra Kumar @.***> wrote:

I have the same issue with m1. Neither I am able to build.

— Reply to this email directly, view it on GitHub https://github.com/triton-inference-server/server/issues/3980#issuecomment-1108054518, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL2SUHSUIIW7LWGSPVBYEDVGYLIBANCNFSM5PHCMW5A . You are receiving this because you were mentioned.Message ID: @.***>

-- Sent from Gmail Mobile

hitcxz commented 2 years ago

I have the same issue with m1, how to deal with it?

jbkyang-nvi commented 2 years ago

Currently, Triton does not officially support M1 builds. @fabiofumarola can you share your compose.py command so future users can try it?

hitcxz commented 2 years ago

Hi Amarendra, I’ll send you the link of the built image tomorrow. Best, Fabio On Mon, 25 Apr 2022 at 06:09, Amarendra Kumar @.> wrote: I have the same issue with m1. Neither I am able to build. — Reply to this email directly, view it on GitHub <#3980 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL2SUHSUIIW7LWGSPVBYEDVGYLIBANCNFSM5PHCMW5A . You are receiving this because you were mentioned.Message ID: @.> -- Sent from Gmail Mobile

can you send me the link of the built image?

blackhathedgehog commented 2 years ago

i would also appreciate a link to the cpu-only M1 build

fabiofumarola commented 2 years ago

Yes sorry for the delay. Here you have the release I've built https://hub.docker.com/repository/docker/prometeiads/tritonserver I'll update it with the latest version

ashrafguitoni commented 1 year ago

Yes sorry for the delay. Here you have the release I've built https://hub.docker.com/repository/docker/prometeiads/tritonserver I'll update it with the latest version

I can't access the repository... I get a 404 :)

talbarda commented 5 months ago

do we have a resolution for that? what is the docker image we should use?