triton-inference-server / triton_cli

Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inference Server.
48 stars 2 forks source link

GPT Engine Builder #24

Closed fpetrini15 closed 8 months ago

fpetrini15 commented 9 months ago

Goal: To add automatic TRT LLM engine building (for the hf:gpt)

Steps:

  1. docker pull nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3
  2. Run image and either clone triton_cli in the container or mount it to the container
  3. pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com/ tensorrt-llm==0.7.0
  4. cd triton_cli && pip install .
  5. triton repo add -m gpt --source hf:gpt2 --backend tensorrtllm
  6. triton server start

Notes:

Current status:

rmccorm4 commented 8 months ago

Nice work!