Sending two "load" requests to server makes it load twice

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html

BSD 3-Clause "New" or "Revised" License

8.08k stars 1.45k forks source link

Sending two "load" requests to server makes it load twice #7018

Open ShuaiShao93 opened 6 months ago

ShuaiShao93 commented 6 months ago

Description When I use two clients to send /v2/repository/models/MODEL/load requests to the same server at the same time, the model is loaded twice

Triton Information What version of Triton are you using? 23.11

Are you using the Triton container or did you build it yourself? Container nvcr.io/nvidia/tritonserver:23.11-py3

To Reproduce Start a server in explicit mode, and load no model.

Open two terminals, run curl -X POST "http://localhost:8000/v2/repository/models/MODEL/load" -d "{}" at the same time. You can see logs like

 successfully loaded MODEL
loading: MODEL
successfully loaded MODEL
successfully unloaded MODEL

Expected behavior The model should be only loaded once. And the log successfully unloaded MODEL should be before successfully loaded MODEL

indrajit96 commented 5 months ago

Hi @ShuaiShao93 , thanks a lot for reaching out. Can you provide with the following details

What type of model/backend?
Can you reproduce this behavior with other types of models/backends? Or is it specific to this one?
Not sure how are you getting the unloaded log? Are you making a unload request?

I am unable to reproduce this

When I try to load a model simultaneously it just gets loaded once.

ShuaiShao93 commented 5 months ago

Hi @ShuaiShao93 , thanks a lot for reaching out. Can you provide with the following details

What type of model/backend?

Ensemble pipeline with Python & ONNX backends

Can you reproduce this behavior with other types of models/backends? Or is it specific to this one?

Sorry didn't get a chance to test more

Not sure how are you getting the unloaded log? Are you making a unload request?

No, I just made load requests simultaneously from two clients, and I saw the unloaded logs

I am unable to reproduce this

When I try to load a model simultaneously it just gets loaded once.

sourabh-burnwal commented 2 months ago

@ShuaiShao93 I guess this is an expected behavior in the case of explicit control. If you want to validate if that particular model is loaded before sending the load request, you can always hit the /index endpoint to get the loaded models list.