triton-inference-server / dali_backend

The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
MIT License
123 stars 29 forks source link

dlopen libcuda.so failed!. Please install GPU dirverTraceback (most recent call last): #217

Closed aj2622 closed 12 months ago

aj2622 commented 12 months ago

logs

(base) ➜  counterfeit-model-triton-server git:(feature/grpc) ✗ docker run --rm -it -p8000:8000 -p8001:8001 -p8002:8002 \
-v /Users/aj/counterfeit-model-triton-server/model_repository:/models \
tritonserver:dali-latest \
tritonserver --model-repository=/models

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.03 (build 56086596)
Triton Server Version 2.32.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

W1011 03:30:55.066908 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I1011 03:30:55.067115 1 cuda_memory_manager.cc:115] CUDA memory pool disabled
I1011 03:30:55.113690 1 model_lifecycle.cc:459] loading: dali_decoder:1
I1011 03:30:55.113929 1 model_lifecycle.cc:459] loading: counterfeit_model:1
I1011 03:30:55.472872 1 tensorflow.cc:2565] TRITONBACKEND_Initialize: tensorflow
I1011 03:30:55.472940 1 tensorflow.cc:2575] Triton TRITONBACKEND API version: 1.12
I1011 03:30:55.472981 1 tensorflow.cc:2581] 'tensorflow' TRITONBACKEND API version: 1.12
I1011 03:30:55.473013 1 tensorflow.cc:2605] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
I1011 03:30:55.473096 1 tensorflow.cc:2671] TRITONBACKEND_ModelInitialize: counterfeit_model (version 1)
2023-10-11 03:30:55.473728: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /models/counterfeit_model/1/model.savedmodel
2023-10-11 03:30:55.677617: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2023-10-11 03:30:55.677781: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /models/counterfeit_model/1/model.savedmodel
2023-10-11 03:30:55.680399: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-11 03:30:55.682824: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:66] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/tritonserver/lib:/opt/tritonserver/backends/onnxruntime:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda-11/lib64
2023-10-11 03:30:55.682877: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
2023-10-11 03:30:55.682906: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (59403030d9ba): /proc/driver/nvidia/version does not exist
2023-10-11 03:30:55.932107: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:357] MLIR V1 optimization pass is not enabled
2023-10-11 03:30:55.984790: I tensorflow/cc/saved_model/loader.cc:231] Restoring SavedModel bundle.
2023-10-11 03:30:57.536148: I tensorflow/cc/saved_model/loader.cc:215] Running initialization op on SavedModel bundle at path: /models/counterfeit_model/1/model.savedmodel
2023-10-11 03:30:57.860810: I tensorflow/cc/saved_model/loader.cc:325] SavedModel load for tags { serve }; Status: success: OK. Took 2387087 microseconds.
I1011 03:30:58.022766 1 dali_backend.cc:43] TRITONBACKEND_Initialize: dali
I1011 03:30:58.022829 1 dali_backend.cc:50] Triton TRITONBACKEND API version: 1.12
I1011 03:30:58.022837 1 dali_backend.cc:54] 'dali' TRITONBACKEND API version: 1.10
I1011 03:30:58.022843 1 dali_backend.cc:71] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
I1011 03:30:58.023005 1 dali_backend.cc:119] TRITONBACKEND_ModelInitialize: dali_decoder (version 1)
I1011 03:30:58.023053 1 dali_backend.cc:131] Repository location: /models/dali_decoder
I1011 03:30:58.023060 1 dali_backend.cc:142] backend state is 'backend state'
Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "<frozen importlib._bootstrap>", line 568, in module_from_spec
AttributeError: 'NoneType' object has no attribute 'loader'
dlopen libcuda.so failed!. Please install GPU dirverTraceback (most recent call last):
  File "<string>", line 8, in <module>
  File "/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/_utils/autoserialize.py", line 77, in invoke_autoserialize
    dali_pipeline().serialize(filename=filename)
  File "/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/pipeline.py", line 1228, in serialize
    self._init_pipeline_backend()
  File "/opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/pipeline.py", line 698, in _init_pipeline_backend
    self._pipe = b.Pipeline(self._max_batch_size,
RuntimeError: [/opt/dali/dali/core/device_guard.cc:31] Assert on "cuInitChecked()" failed: Failed to load libcuda.so. Check your library paths and if the driver is installed correctly.
Stacktrace (30 entries):
[frame 0]: /opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/libdali_core.so(+0x233fb) [0x7fce6b0783fb]
[frame 1]: /opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/libdali_core.so(dali::DeviceGuard::DeviceGuard(int)+0x1a8) [0x7fce6b09b548]
[frame 2]: /opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/libdali.so(dali::Pipeline::Init(int, int, int, long, bool, bool, bool, unsigned long, bool, int, int, dali::QueueSizes)+0x50) [0x7fce70cd78a0]
[frame 3]: /opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/backend_impl.cpython-310-x86_64-linux-gnu.so(dali::Pipeline::Pipeline(int, int, int, long, bool, int, bool, unsigned long, bool, int, int)+0x361) [0x7fce660cb3b1]
[frame 4]: /opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/backend_impl.cpython-310-x86_64-linux-gnu.so(+0x60763) [0x7fce6609f763]
[frame 5]: /opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/nvidia/dali/backend_impl.cpython-310-x86_64-linux-gnu.so(+0xbfa0a) [0x7fce660fea0a]
[frame 6]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(+0x144516) [0x5594861e4516]
[frame 7]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(_PyObject_MakeTpCall+0x26b) [0x5594861dda6b]
[frame 8]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(+0x15093b) [0x5594861f093b]
[frame 9]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(+0x14e5a3) [0x5594861ee5a3]
[frame 10]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(+0x13ddbb) [0x5594861dddbb]
[frame 11]: /opt/tritonserver/backends/dali/conda/envs/dalienv/lib/python3.10/site-packages/tree/_tree.cpython-310-x86_64-linux-gnu.so(+0x1b539) [0x7fce719d9539]
[frame 12]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(_PyObject_MakeTpCall+0x26b) [0x5594861dda6b]
[frame 13]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(_PyEval_EvalFrameDefault+0x54a6) [0x5594861d99d6]
[frame 14]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(_PyFunction_Vectorcall+0x6c) [0x5594861e499c]
[frame 15]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(_PyEval_EvalFrameDefault+0x72c) [0x5594861d4c5c]
[frame 16]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(+0x1504f2) [0x5594861f04f2]
[frame 17]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(_PyEval_EvalFrameDefault+0x13ca) [0x5594861d58fa]
[frame 18]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(_PyFunction_Vectorcall+0x6c) [0x5594861e499c]
[frame 19]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(_PyEval_EvalFrameDefault+0x320) [0x5594861d4850]
[frame 20]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(+0x1d7f90) [0x559486277f90]
[frame 21]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(PyEval_EvalCode+0x87) [0x559486277ed7]
[frame 22]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(+0x20842a) [0x5594862a842a]
[frame 23]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(+0x203833) [0x5594862a3833]
[frame 24]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(PyRun_StringFlags+0x7d) [0x55948629bc3d]
[frame 25]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(PyRun_SimpleStringFlags+0x3c) [0x55948629ba7c]
[frame 26]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(Py_RunMain+0x26b) [0x55948629a98b]
[frame 27]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(Py_BytesMain+0x37) [0x55948626b527]
[frame 28]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fce71e86083]
[frame 29]: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python3(+0x1cb421) [0x55948626b421]

I1011 03:30:58.617971 1 dali_backend.cc:170] TRITONBACKEND_ModelFinalize: delete model state
E1011 03:30:58.618035 1 model_lifecycle.cc:597] failed to load 'dali_decoder' version 1: Unknown: DALI Backend error: Failed to load model file. The program looked in the following locations: /models/dali_decoder/1/dali.py, /models/dali_decoder/1/dali.py. Please make sure that the model exists in any of the locations and is properly serialized or can be properly serialized.
I1011 03:30:58.618039 1 tensorflow.cc:2720] TRITONBACKEND_ModelInstanceInitialize: counterfeit_model_0 (CPU device 0)
2023-10-11 03:30:58.618129: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /models/counterfeit_model/1/model.savedmodel
2023-10-11 03:30:58.667634: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2023-10-11 03:30:58.667725: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /models/counterfeit_model/1/model.savedmodel
2023-10-11 03:30:58.814250: I tensorflow/cc/saved_model/loader.cc:231] Restoring SavedModel bundle.
2023-10-11 03:31:00.158854: I tensorflow/cc/saved_model/loader.cc:215] Running initialization op on SavedModel bundle at path: /models/counterfeit_model/1/model.savedmodel
2023-10-11 03:31:00.533156: I tensorflow/cc/saved_model/loader.cc:325] SavedModel load for tags { serve }; Status: success: OK. Took 1915031 microseconds.
I1011 03:31:00.533869 1 tensorflow.cc:2720] TRITONBACKEND_ModelInstanceInitialize: counterfeit_model_1 (CPU device 0)
2023-10-11 03:31:00.534036: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /models/counterfeit_model/1/model.savedmodel
2023-10-11 03:31:00.586897: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2023-10-11 03:31:00.586996: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /models/counterfeit_model/1/model.savedmodel
2023-10-11 03:31:00.839475: I tensorflow/cc/saved_model/loader.cc:231] Restoring SavedModel bundle.
2023-10-11 03:31:02.145701: I tensorflow/cc/saved_model/loader.cc:215] Running initialization op on SavedModel bundle at path: /models/counterfeit_model/1/model.savedmodel
2023-10-11 03:31:02.491258: I tensorflow/cc/saved_model/loader.cc:325] SavedModel load for tags { serve }; Status: success: OK. Took 1957231 microseconds.
I1011 03:31:02.492090 1 model_lifecycle.cc:694] successfully loaded 'counterfeit_model' version 1
E1011 03:31:02.492164 1 model_repository_manager.cc:526] Invalid argument: ensemble 'counterfeit_pipeline' depends on 'dali_decoder' which has no loaded version. Model 'dali_decoder' loading failed with error: version 1 is at UNAVAILABLE state: Unknown: DALI Backend error: Failed to load model file. The program looked in the following locations: /models/dali_decoder/1/dali.py, /models/dali_decoder/1/dali.py. Please make sure that the model exists in any of the locations and is properly serialized or can be properly serialized.;
I1011 03:31:02.492374 1 server.cc:583] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1011 03:31:02.492424 1 server.cc:610] 
+------------+----------------------------------------------------+----------------------------------------------------+
| Backend    | Path                                               | Config                                             |
+------------+----------------------------------------------------+----------------------------------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_t | {"cmdline":{"auto-complete-config":"true","min-com |
|            | ensorflow2.so                                      | pute-capability":"6.000000","backend-directory":"/ |
|            |                                                    | opt/tritonserver/backends","default-max-batch-size |
|            |                                                    | ":"4"}}                                            |
|            |                                                    |                                                    |
| dali       | /opt/tritonserver/backends/dali/libtriton_dali.so  | {"cmdline":{"auto-complete-config":"true","min-com |
|            |                                                    | pute-capability":"6.000000","backend-directory":"/ |
|            |                                                    | opt/tritonserver/backends","default-max-batch-size |
|            |                                                    | ":"4"}}                                            |
+------------+----------------------------------------------------+----------------------------------------------------+

I1011 03:31:02.492535 1 server.cc:653] 
+-------------------+---------+-----------------------------------------------------------------------------------------+
| Model             | Version | Status                                                                                  |
+-------------------+---------+-----------------------------------------------------------------------------------------+
| counterfeit_model | 1       | READY                                                                                   |
| dali_decoder      | 1       | UNAVAILABLE: Unknown: DALI Backend error: Failed to load model file. The program looked |
|                   |         |  in the following locations: /models/dali_decoder/1/dali.py, /models/dali_decoder/1/dal |
|                   |         | i.py. Please make sure that the model exists in any of the locations and is properly se |
|                   |         | rialized or can be properly serialized.                                                 |
+-------------------+---------+-----------------------------------------------------------------------------------------+

I1011 03:31:02.492796 1 metrics.cc:640] Collecting CPU metrics
I1011 03:31:02.492974 1 tritonserver.cc:2364] 
+----------------------------------+------------------------------------------------------------------------------------+
| Option                           | Value                                                                              |
+----------------------------------+------------------------------------------------------------------------------------+
| server_id                        | triton                                                                             |
| server_version                   | 2.32.0                                                                             |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) sched |
|                                  | ule_policy model_configuration system_shared_memory cuda_shared_memory binary_tens |
|                                  | or_data parameters statistics trace logging                                        |
| model_repository_path[0]         | /models                                                                            |
| model_control_mode               | MODE_NONE                                                                          |
| strict_model_config              | 0                                                                                  |
| rate_limit                       | OFF                                                                                |
| pinned_memory_pool_byte_size     | 268435456                                                                          |
| min_supported_compute_capability | 6.0                                                                                |
| strict_readiness                 | 1                                                                                  |
| exit_timeout                     | 30                                                                                 |
| cache_enabled                    | 0                                                                                  |
+----------------------------------+------------------------------------------------------------------------------------+

I1011 03:31:02.493043 1 server.cc:284] Waiting for in-flight requests to complete.
I1011 03:31:02.493053 1 server.cc:300] Timeout 30: Found 0 model versions that have in-flight inferences
I1011 03:31:02.493132 1 server.cc:315] All models are stopped, unloading models
I1011 03:31:02.493287 1 server.cc:322] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I1011 03:31:02.493658 1 tensorflow.cc:2758] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 03:31:02.626863 1 tensorflow.cc:2758] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 03:31:02.626970 1 tensorflow.cc:2697] TRITONBACKEND_ModelFinalize: delete model state
I1011 03:31:02.763119 1 model_lifecycle.cc:579] successfully unloaded 'counterfeit_model' version 1
I1011 03:31:03.493507 1 server.cc:322] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
I1011 03:31:03.493608 1 dali_backend.cc:99] TRITONBACKEND_Finalize: state is 'backend state'
error: creating server: Internal - failed to load all models

i built my image following

git clone --recursive https://github.com/triton-inference-server/dali_backend.git
cd dali_backend
docker build -f docker/Dockerfile.release -t tritonserver:dali-latest .

i am using a macbook without a gpu

my file model_repository/dali_decoder/1/dali.py is the following

from nvidia.dali import pipeline_def
import nvidia.dali.fn as fn
from nvidia.dali.plugin.triton import autoserialize

@autoserialize
@pipeline_def(batch_size=0, num_threads=1, device_id=0)
def pipe():
    jpegs = fn.external_source(name='input', device='cpu', layout='HWC')
    images = fn.decoders.image(jpegs, device='cpu')
    return images

my model_repository/dali_decoder/config.pbtxt is

name: "dali_decoder"
backend: "dali"
max_batch_size: 0
input [
  {
    name: "input"
    data_type: TYPE_STRING
    dims: [ 1 ]
  }
]
output [
  {
    name: "images"
    data_type: TYPE_FP32
    dims: [ -1, -1, 3 ]
  }
]
JanuszL commented 12 months ago

Hi @aj2622,

Thank you for reaching out. Please run the container with the GPU support enabled --gpus all for example. The very first message in the log indicates


WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .```
the GPU is not detected.
You can learn more about it for example [here](https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html).
aj2622 commented 12 months ago

Hi @aj2622,

Thank you for reaching out. Please run the container with the GPU support enabled --gpus all for example. The very first message in the log indicates

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .```
the GPU is not detected.
You can learn more about it for example [here](https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html).

Thanks for the prompt response. I am using a mac, it has no NVIDIA GPU.

running the container with GPU support enabled --gpus all just gives me a similar error,

(base) ➜  counterfeit-model-triton-server git:(feature/grpc) ✗ docker run --rm -it --gpus all  -p8000:8000 -p8001:8001 -p8002:8002 \
-v /Users/aj/counterfeit-model-triton-server/model_repository:/models \
tritonserver:dali-latest \
tritonserver --model-repository=/models
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

Do I need to run this on a gpu enabled device ? My code will eventually run on an EC2 with a GPU, I am hoping for a workaround so I can test my code locally.

JanuszL commented 12 months ago

Hi @aj2622,

Now I understand your intention to run TRITON and DALI on the CPU only. To achieve that please set the device_id in the pipeline to None to avoid any interaction with the GPU:

@pipeline_def(batch_size=0, num_threads=1, device_id=None)
aj2622 commented 12 months ago

Thanks ! That resolved my issue.