Open vonchenplus opened 3 months ago
@vonchenplus would it be possible to confirm this with 24.03 release?
@vonchenplus would it be possible to confirm this with 24.03 release?
Hello @nnshah1, Still have the same problem with 24.02.
The following is the error log triton_server.log
Thanks for the confirmation - will try to reproduce
Description Using tritonserver to delay loading(--model-control-mode=explicit) the llava-mixtral-8x7b model, there is a probability that when my client initiates load_model, it triggers the server to load the same model multiple times(There is a certain probability).
Triton Information nvcr.io/nvidia/tritonserver:23.08-py3
To Reproduce
use python backend, and load llava-maxtral-8x7b in initialize method.
Expected behavior Models should only be loaded once
The following is the error log triton_server.log