triton-inference-server / triton_cli

Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inference Server.
48 stars 2 forks source link

chore: Update TRT-LLM checkpoint scripts to v0.10 and Fix Github Actions Pipeline #78

Closed KrishnanPrash closed 3 months ago

KrishnanPrash commented 3 months ago

Two Sets of Changes in this PR:

1. Upgrading TensorRT-LLM Checkpoint Scripts

Upgrading convert_checkpoint.py scripts for gpt2, llama, and opt to support tensorrt_llm v0.10.0. Source for code changes: https://github.com/NVIDIA/TensorRT-LLM/tree/v0.10.0/examples

2. GitHub Actions Workflow Fix

(Credit to @rmccorm4 for figuring this out.) Currently, the GitHub Actions pipeline of the Triton CLI is failing the test case: test_non_llm[http]. Currently, during testing, when a Mock Server (ScopedTritonServer) is started, Popen("triton start") is called and a new process is created. After this, when an individual triton command is tested, another Popen("triton ...") command is called, creating a new sub-process. So, when the sub-process fails or errors out, then it returns an error code to the process, but the original process doesn't terminate and ends up hanging in a zombie state, where the original process is no longer doing any work, but is still running as a valid process. This is what is causing the indefinite hanging of the test case (test_non_llm(http)). One potential reason for the sub-process hanging is a port conflict from an existing tritonserver instance from not terminating correctly and occupying the ports 8000-8002.

For now, these hanging tests will be skipped in Github Actions, and still run in Gitlab. This will be further investigated/fixed in a follow-up PR.

rmccorm4 commented 3 months ago

FYI if the github trigger check is blocking this the runner currently down, feel free to manually start the gitlab pipeline and we can merge if that looks good.