triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.14k stars 1.46k forks source link

Version with -1 makes the triton inference server - python backend freeze #7052

Open Kanupriyagoyal opened 6 months ago

Kanupriyagoyal commented 6 months ago

Description A clear and concise description of what the bug is.

r23.04

I0718 11:39:24.385839 1 server.cc:653] 
| Model     | Version | Status                                                                                                                       |
+-----------+---------+------------------------------------------------------------------------------------------------------------------------------+
| model_1l | -1      | UNAVAILABLE: Internal: model.py does not exist in the model repository path: /models/model_1/18446744073709551615/model.py 

r23.12

I0327 12:09:02.167324 1 stub_launcher.cc:253] Starting Python backend stub:  exec /opt/tritonserver/backends/python/triton_python_backend_stub /models/model_1/18446744073709551615/model.py triton_python_backend_shm_region_1 1048576 1048576 1 /opt/tritonserver/backends/python 312 model_1 DEFAULT
I0327 12:09:02.182305 14 pb_stub.cc:1926]  Failed to preinitialize Python stub: Python model file not found in '/models/model_1/18446744073709551615/model.py'

Server is going on freezing state. Killing the container only solution to come out

Triton Information What version of Triton are you using? r23.12

Are you using the Triton container or did you build it yourself? Triton container

To Reproduce Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well). |-- model_1 | |-- -1 | | |-- model.py | | -- model.txt |-- config.pbtxt

Expected behavior It should gracefully exit with the message "Failed to load all the models"

rmccorm4 commented 6 months ago

Hi @Kanupriyagoyal, thanks for raising this issue!

-1 isn't meant to be used for the actual version folder name, this folder should be a positive unsigned integer version, like 1,2,3.

On the client/API side, using -1 refers to the latest version and the server will look for the highest version number in the model repository.

Hopefully this should resolve your issue. In parallel, I filed a ticket (DLIS-6399) to more gracefully handle some edge cases around this.

Kanupriyagoyal commented 6 months ago

@rmccorm4 Another scenario model.py isn't present in the version folder r23.04

| model_test   | 1       | UNAVAILABLE: Internal: model.py does not exist in the model repository path:/models/model_test/1/model.py |

r23.12 Server is getting freeze, need to stop the triton container

I0403 11:36:04.965052 17 pb_stub.cc:1926]  Failed to preinitialize Python stub: Python model file not found in '/models/model_xgb_rgs_15num/1/model.py'

Expected behavior It should gracefully exit with the message "Failed to load all the models"