triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.4k stars 1.49k forks source link

/v2/health/ready endpoint does not work as expected #7588

Open beratturan opened 2 months ago

beratturan commented 2 months ago

When using Triton server with the --strict-readiness flag set to true, the /v2/health/ready endpoint is expected to return an error code if any models are unloaded. However, after unloading a model via the /v2/repository/models/model/unload endpoint, the /v2/health/ready endpoint still returns a 200 OK status. According to the documentation, this behavior suggests that the server is still reporting as ready despite the model being unloaded, which is incorrect.

Triton Server Version: 28.03 Deployment Method: KServe KServe Version: 13.0.1