triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.32k stars 1.48k forks source link

Can we include commonly used data pre-processing library in triton server docker image? #7107

Open HQ01 opened 7 months ago

HQ01 commented 7 months ago

Is your feature request related to a problem? Please describe.

I find the current docker image xx.yy-py3 doesn't have commonly use data preprocessing libraries like huggingface transformers for accessing the tokenizer, for example. Missing this single missing package greatly limits our abilities to use triton-inference-server with its ensemble model feature.

In our specific usecase, pip install at runtime or using conda-pack are highly discouraged for various reasons. This is somewhat similar to https://github.com/triton-inference-server/server/issues/6467 and I believe might be common in many other industrial scenarios too.

Describe the solution you'd like

Given the prevalence of using triton server for NLP-related workload, would suggest including the transformers library in the pre-built docker image if possible.

Describe alternatives you've considered

There are other images like 24.03-trtllm-python-py3 that does come with transformers pre-installed, however we need to serve bert-like models and accordding to https://github.com/triton-inference-server/tensorrtllm_backend/issues/368, there is no clear timeline to support this. So we have to rely on other backend (like ORT) to execute our model.

Additional context Any thoughts / suggestions will be greatly appreciated!

MatthieuToulemont commented 7 months ago

In our specific usecase, pip install at runtime

How about building your own image on top of the xx.yy-py3 ? This way you will not run pip at runtime or require conda-pack

Given the prevalence of using triton server for NLP-related workload

In our case, we use triton for computer vision models and would not require transformers installed.

FROM nvcr.io/nvidia/tritonserver:XX.YY-py3                                                                                                                                                                                   
RUN pip install transformers --no-cache-dir

This Dockerfile will do what you need and will not require everyone having transformers installed by default ? Maybe this could work?

Tabrizian commented 6 months ago

Unfortunately, we cannot install these libraries as it can increase the container size significantly and there are many other customers asking for different libraries to be included. If we accommodate all these requests, the container size would be much larger than it already is. Creating conda-pack environments or custom images are our only recommendation at this point. Let us know if you have any other suggestions that might help with this issue.