(Credit to @rmccorm4 for figuring this out.)
Currently, the GitHub Actions pipeline of the Triton CLI is failing the test case: test_non_llm[http].
Currently, during testing, when a Mock Server (ScopedTritonServer) is started, Popen("triton start") is called and a new process is created. After this, when an individual triton command is tested, another Popen("triton ...") command is called, creating a new sub-process. So, when the sub-process fails or errors out, then it returns an error code to the process, but the original process doesn't terminate and ends up hanging in a zombie state, where the original process is no longer doing any work, but is still running as a valid process. This is what is causing the indefinite hanging of the test case (test_non_llm(http)). One potential reason for the sub-process hanging is a port conflict from an existing tritonserver instance from not terminating correctly and occupying the ports 8000-8002.
For now, these hanging tests will be skipped in Github Actions, and still run in Gitlab. This will be further investigated/fixed in a follow-up PR.
FYI if the github trigger check is blocking this the runner currently down, feel free to manually start the gitlab pipeline and we can merge if that looks good.
Two Sets of Changes in this PR:
1. Upgrading TensorRT-LLM Checkpoint Scripts
Upgrading convert_checkpoint.py scripts for gpt2, llama, and opt to support tensorrt_llm v0.10.0. Source for code changes: https://github.com/NVIDIA/TensorRT-LLM/tree/v0.10.0/examples
2. GitHub Actions Workflow Fix
(Credit to @rmccorm4 for figuring this out.) Currently, the GitHub Actions pipeline of the Triton CLI is failing the test case:
test_non_llm[http]
. Currently, during testing, when a Mock Server (ScopedTritonServer
) is started,Popen("triton start")
is called and a new process is created. After this, when an individualtriton
command is tested, anotherPopen("triton ...")
command is called, creating a new sub-process. So, when the sub-process fails or errors out, then it returns an error code to the process, but the original process doesn't terminate and ends up hanging in a zombie state, where the original process is no longer doing any work, but is still running as a valid process. This is what is causing the indefinite hanging of the test case (test_non_llm(http)
). One potential reason for the sub-process hanging is a port conflict from an existingtritonserver
instance from not terminating correctly and occupying the ports 8000-8002.For now, these hanging tests will be skipped in Github Actions, and still run in Gitlab. This will be further investigated/fixed in a follow-up PR.