debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-06-13T14:04:08.580558509+00:00", grpc_status:13, grpc_message:"Model \"myModel\" has no worker to serve inference request. Please use scale workers API to add workers. If this is a sequence inference, please check if it is closed, or expired; or exceeds maxSequenceJobQueueSize\nInternalServerException.()"}"
I stress tested a model until it gave me IllegalStateException and all my worker died. Then i sent a management request and a ping and both returned to me that everything is alright while its not.
Possible Solution
The best solution i am thinking of is that when an error like this happen i check the error message and if its telling me to scale the workers i just do, but this shouldn't be the correct behaviour and describe_model should show me the correct information of the workers.
🐛 Describe the bug
Worder dead and yet describe_model gives me
worker.status: Ready
Error logs
This is what describe_model returned to me:
And then when i predict it gives me this error:
Installation instructions
Docker: torchserve 0.10.0
Model Packaging
torch model archiver with custom handler
config.properties
Versions
I am using docker torchserve:0.10.0 image and this is the output of pip freeze:
Repro instructions
I stress tested a model until it gave me
IllegalStateException
and all my worker died. Then i sent a management request and a ping and both returned to me that everything is alright while its not.Possible Solution
The best solution i am thinking of is that when an error like this happen i check the error message and if its telling me to scale the workers i just do, but this shouldn't be the correct behaviour and
describe_model
should show me the correct information of the workers.