Currently, we have an awful build process where the cog-triton Dockerfile uses a locally-built image as its base. The locally built image is produced via the tensorrtllm_backend image build process. This is bad because it is slow and it has repeatedly hindered reproducibility and collaboration.
Now that trt-llm can be pip installed and NVIDIA is publishing nightlies, we should unify the cog-triton Dockerfile so it uses a Triton base image and pip installs TensorRT-LLM.
This will make it easier for us to collaborate and it will make our builds and development process more reproducible. For these reasons, this is a high priority, as it should remove collaboration bottlenecks and hopefully help decrease the frequency of regressions.
This PR:
Updates the cog-triton Dockerfile to use a tritonserver base
pip installs TensorRT-LLM==0.8.0
Aliases python3 to python to maintain backwards compatibility and presumably compatibility with cog
Currently, we have an awful build process where the cog-triton Dockerfile uses a locally-built image as its base. The locally built image is produced via the tensorrtllm_backend image build process. This is bad because it is slow and it has repeatedly hindered reproducibility and collaboration.
Now that trt-llm can be pip installed and NVIDIA is publishing nightlies, we should unify the cog-triton Dockerfile so it uses a Triton base image and pip installs TensorRT-LLM.
This will make it easier for us to collaborate and it will make our builds and development process more reproducible. For these reasons, this is a high priority, as it should remove collaboration bottlenecks and hopefully help decrease the frequency of regressions.
This PR: