triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
581 stars 81 forks source link

[Bugfix] Launch Triton server without waiting for a signal #470

Closed michaelnny closed 3 weeks ago

michaelnny commented 1 month ago

Hi,

Problem: This PR fix a silent bug inside the scripts\launch_triton_server.py module, this issue only occurs if we try to automatically launch the triton server inside a container using either CMD in the Dockerfile, or command in the docker-compose.yaml file.

For example in a Dockerfile:

CMD ["python3", "scripts\launch_triton_server.py", "--model_repo", "/workspace/model_repos/llama3_ifb", "--world_size", "1"]

Cause: The cause of the issue is we're not wait for the signal when we call subprocess.Popen(cmd, env=env)

Before the change:

    subprocess.Popen(cmd, env=env) 

After the change:


    # Start the subprocess and wait for signal
    with subprocess.Popen(cmd, env=env) as proc:
        try:
            retcode = proc.wait()
        except KeyboardInterrupt:
            proc.kill()
            return 0
schetlur-nv commented 3 weeks ago

Hey @michaelnny - we are planning to modify this script to use subprocess.run which should do what you intend. Hope that's OK with your use-case?

michaelnny commented 3 weeks ago

Hi @schetlur-nv

That's cool, thanks for the update, looking forward to the new release which hopefully will fix this problem.