triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
704 stars 105 forks source link

Very large images due to multistage Dockerfile error #132

Open mtaron opened 12 months ago

mtaron commented 12 months ago

Hello,

I noticed that images for 23.10-trtllm-python-py3 are about 10 GB larger than other Triton server images. This is due to a bug in your Dockerfile here: https://github.com/triton-inference-server/tensorrtllm_backend/blob/47b609b670d6bb33a5ff113d98ad8a44d961c5c6/dockerfile/Dockerfile.trt_llm_backend#L51

-FROM trt_llm_backend_builder as final
+FROM base as final

You are accidentally including all the builder stages, defeating the point of having a multi-stage Dockerfile.

kaiyux commented 11 months ago

@mtaron Thanks for pointing that out, we already have a fix in the internal codebase, which will be included in the next update to GitHub.

Before that, please use the modification as you shared as a workaround, thanks!

kaiyux commented 11 months ago

Sorry, the changes are not going to be included in today's update to GitHub. We realized that if the line is modified to FROM base as final, the latest version of TensorRT is not going to be included in the container, which will lead to an issue when using it. However, if you install TensorRT and PyTorch in the last stage of the dockerfile, the size of the container is not going to be reduced much compared to what we have for now.

We'll keep the dockerfile for a while until we find a better solution, does that make sense to you? @mtaron Thanks again for reporting the issue.

mtaron commented 11 months ago

Ah, yeah - you'll always have two versions of TensorRT in the container (as far as image size is concerned) unless you use a base image without it or the base image has the version you want.