Tritonserver may be load model multi times

vonchenplus commented 3 months ago

Description Using tritonserver to delay loading(--model-control-mode=explicit) the llava-mixtral-8x7b model, there is a probability that when my client initiates load_model, it triggers the server to load the same model multiple times(There is a certain probability).

Triton Information nvcr.io/nvidia/tritonserver:23.08-py3

To Reproduce

Start tritonserver with --model-control-mode=explicit
start create grpc client and try to load_model(Multiple loading by multiple process).

use python backend, and load llava-maxtral-8x7b in initialize method.

Expected behavior Models should only be loaded once

The following is the error log triton_server.log

nnshah1 commented 3 months ago

@vonchenplus would it be possible to confirm this with 24.03 release?

vonchenplus commented 3 months ago

@vonchenplus would it be possible to confirm this with 24.03 release?

Hello @nnshah1, Still have the same problem with 24.02.

The following is the error log triton_server.log

nnshah1 commented 3 months ago

Thanks for the confirmation - will try to reproduce

triton-inference-server / server

Tritonserver may be load model multi times #7058