replicate / cog-triton

A cog implementation of Nvidia's Triton server
Apache License 2.0
11 stars 0 forks source link

Unify cog-triton Dockerfile LANG-213 #23

Closed joehoover closed 4 months ago

joehoover commented 4 months ago

Currently, we have an awful build process where the cog-triton Dockerfile uses a locally-built image as its base. The locally built image is produced via the tensorrtllm_backend image build process. This is bad because it is slow and it has repeatedly hindered reproducibility and collaboration.

Now that trt-llm can be pip installed and NVIDIA is publishing nightlies, we should unify the cog-triton Dockerfile so it uses a Triton base image and pip installs TensorRT-LLM.

This will make it easier for us to collaborate and it will make our builds and development process more reproducible. For these reasons, this is a high priority, as it should remove collaboration bottlenecks and hopefully help decrease the frequency of regressions.

This PR: