triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8k stars 1.44k forks source link

Unable to use pytoch library with libtorch backend when using triton inference server In-Process python API #7222

Open sivanantha321 opened 3 months ago

sivanantha321 commented 3 months ago

Description A clear and concise description of what the bug is. I am trying to use the newly introduced triton inference server In-Process python API to serve pytorch models using the libtorch backend. I am using pytorch and torchvision libraries to do some pre and post processing of the input data before sending it to the triton server for prediction. But when I try to use pytorch or torchvision i am getting the follwing error.

failed to load 'cifar10' version 1: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Triton Server logs:

I0515 09:22:40.092038 265 cache_manager.cc:480] Create CacheManager with cache_dir: '/opt/tritonserver/caches'
W0515 09:22:40.092110 265 pinned_memory_manager.cc:271] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0515 09:22:40.092129 265 cuda_memory_manager.cc:117] CUDA memory pool disabled
E0515 09:22:40.092267 265 server.cc:243] CudaDriverHelper has not been initialized.
I0515 09:22:40.093620 265 model_config_utils.cc:680] Server side auto-completed config: name: "cifar10"
platform: "pytorch_libtorch"
max_batch_size: 1
input {
  name: "INPUT__0"
  data_type: TYPE_FP32
  dims: 3
  dims: 32
  dims: 32
}
output {
  name: "OUTPUT__0"
  data_type: TYPE_FP32
  dims: 10
}
default_model_filename: "model.pt"
backend: "pytorch"

I0515 09:22:40.093699 265 model_lifecycle.cc:469] loading: cifar10:1
I0515 09:22:40.093820 265 backend_model.cc:502] Adding default backend config setting: default-max-batch-size,4
I0515 09:22:40.093847 265 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/pytorch/libtriton_pytorch.so
I0515 09:22:40.098713 265 backend_manager.cc:138] unloading backend 'pytorch'
E0515 09:22:40.098758 265 model_lifecycle.cc:638] failed to load 'cifar10' version 1: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKcS2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
I0515 09:22:40.098775 265 model_lifecycle.cc:773] failed to load 'cifar10'
I0515 09:22:40.098860 265 server.cc:607] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0515 09:22:40.098880 265 server.cc:634] 
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0515 09:22:40.098907 265 server.cc:677] 
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model   | Version | Status                                                                                                                                                                 |
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| cifar10 | 1       | UNAVAILABLE: Not found: unable to load shared library: /opt/tritonserver/backends/pytorch/libtorchtrt_runtime.so: undefined symbol: _ZN3c106detail14torchCheckFailEPKc |
|         |         | S2_jRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE                                                                                                             |
+---------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0515 09:22:40.099027 265 metrics.cc:770] Collecting CPU metrics
I0515 09:22:40.099151 265 tritonserver.cc:2538] 
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                  |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                 |
| server_version                   | 2.45.0                                                                                                                                                 |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memo |
|                                  | ry binary_tensor_data parameters statistics trace logging                                                                                              |
| model_repository_path[0]         | models_dir                                                                                                                                             |
| model_control_mode               | MODE_NONE                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                      |
| rate_limit                       | OFF                                                                                                                                                    |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                              |
| min_supported_compute_capability | 6.0                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                     |
| cache_enabled                    | 0                                                                                                                                                      |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+

I0515 09:22:40.099172 265 server.cc:307] Waiting for in-flight requests to complete.
I0515 09:22:40.099176 265 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
I0515 09:22:40.099204 265 server.cc:338] All models are stopped, unloading models
I0515 09:22:40.099210 265 server.cc:347] Timeout 30: Found 0 live models and 0 in-flight non-inference requests

Triton Information What version of Triton are you using?

$ pip show tritonserver

Name: tritonserver
Version: 2.45.0
Summary: Triton Inference Server In-Process Python API
Home-page: https://developer.nvidia.com/nvidia-triton-inference-server
Author: NVIDIA Inc.
Author-email: sw-dl-triton@nvidia.com
License: BSD
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy
Required-by: 
$ pip show torch
Name: torch
Version: 2.3.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /usr/local/lib/python3.10/dist-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: torchvision
$ pip show torchvision
Name: torchvision
Version: 0.18.0
Summary: image and video datasets and models for torch deep learning
Home-page: https://github.com/pytorch/vision
Author: PyTorch Core Team
Author-email: soumith@pytorch.org
License: BSD
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy, pillow, torch
Required-by: 

Are you using the Triton container or did you build it yourself? I am using nvcr.io/nvidia/tritonserver:24.04-py3 container to serve the model using in-process python API.

To Reproduce Steps to reproduce the behavior. A simple script to reproduce the error.

import time
import tritonserver
from torchvision import transforms  # importing this leads to errors
import torch  # importing this leads to errors

def start():
    server = tritonserver.Server(model_repository="python/models",
                                 log_error=True,
                                 log_info=True,
                                 log_verbose=True,
                                 )
    print("tritonserver version : ", tritonserver.__version__)
    server.start()
    print("server started")
    model = server.model("cifar10")

if __name__ == "__main__":
    start()

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

name: "cifar10"
platform: "pytorch_libtorch"
max_batch_size: 1
input [
  {
    name: "INPUT__0"
    data_type: TYPE_FP32
    dims: [3,32,32]
  }
]
output [
  {
    name: "OUTPUT__0"
    data_type: TYPE_FP32
    dims: [10]
  }
]

Expected behavior A clear and concise description of what you expected to happen. Pytorch and torchvision should work with tritonserver in-process python API

sivanantha321 commented 3 months ago

/CC @yuzisun

nnshah1 commented 3 months ago

@sivanantha321 - is it possible to provide the .pt file / instructions on recreating it?

nnshah1 commented 3 months ago

nvcr.io/nvidia/tritonserver:24.04-py3

never mind - I was able to find a model to reproduce it locally.

I believe the issue is that the latest public pytorch version as installed via pip conflicts with the torch libraries used for the libtorch backend -

will experiment with some potential workarounds

sivanantha321 commented 3 months ago

nvcr.io/nvidia/tritonserver:24.04-py3

never mind - I was able to find a model to reproduce it locally.

I believe the issue is that the latest public pytorch version as installed via pip conflicts with the torch libraries used for the libtorch backend -

will experiment with some potential workarounds

Thanks for looking into this.

nnshah1 commented 3 months ago

I made some progress in using the NGC pytorch image as base and then copying in tritonserver binaries into that:

https://github.com/triton-inference-server/tutorials/blob/nnshah1-meetup-04-2024/Triton_Inference_Server_Python_API/docker/Dockerfile.pytorch

However - when doing that with pre-built libraries I still ran into an issue with torchvision as the shared library was imported twice and that caused conflicts (I think that is a fundamental issue with the libtorchvision.so).

I then rebuilt the triton pytorch backend without torch vision support (seen in Dockerfile above).

However - I haven't been able to confirm with a use case - I was testing out a resnet50 model but didn't get to the stage where the results looked correct to me.

I'm giving this as an update here - in case you have time to try / test on your end

nnshah1 commented 3 months ago

@sivanantha321 - were you able to try the work around?

sivanantha321 commented 3 months ago

@nnshah1 Thanks for the big help! Yes, I tried the workaround and it worked successfully. There is one more thing I like to know., Is there a way to use custom pytorch version other than what's comes with the NGC pytorch image ?

nnshah1 commented 3 months ago

@sivanantha321 - I believe you would just need to rebuild the pytorch backend with the custom version of pytorch you want to use:

https://github.com/triton-inference-server/pytorch_backend?tab=readme-ov-file#build-the-pytorch-backend-with-custom-pytorch

nnshah1 commented 3 months ago

@Tabrizian , @rmccorm4 , @tanmayv25 for visability.

In this work around I searched and replaced the backend pytorch libraries with symlinks to the system ones.

that may be a simple recipe for enabling installing pytorch and pytorch backend in the same container w/o doubling the libraries - but needs further review and testing,.

sivanantha321 commented 2 months ago

@nnshah1 It looks like this problem also true for tensorflow backend. If I try to import tensorflow, then I get the undefined symbol error from libtriton_tensorflow.so.