Open tuxedocat opened 1 year ago
I have the same error on k8s . Any solution?
Seems this issue has been treated as backend specific error, but I thought that this is more like design discussion for error handling in the server:
As I wrote in the description TLDR: The server should handle backend internal error.
In the meanwhile, we need to detect error somehow e.g. via log output and trigger restart.
I agree 100%, Triton should handle this.
I am currently having issues where a Cuda Kernel error is triggered on a BLS model that uses torch inside. The model is still considered READY by Triton and can still process requests, however all following requests timeout :/
@tuxedocat does unloading then reloading the model fixes your issue or you need to restart the full server ?
@tuxedocat does unloading then reloading the model fixes your issue or you need to restart the full server ?
Technically yes, reloading the model is sufficient for my case which uses explicit loading mode. However in production uses, we would need both model-manager side and server-side error handling.
Somebody solve it? Maybe upgrade of triton version help you to eluminate this error?
We still have this in 24.07 version
Description
The model is not reloaded when the underlying backend runtime, pytorch_backend and libtorch in this case, causes some errors.
In such cases, it would be useful in a production environment if either:
Here's an actual case that occurred:
First, I got the following error from libtorch.cc, which is from the underlying CUDA runtime:
Then, the same log entry repeatedly occurred until the Triton pod on k8s was manually restarted.
NOTES:
nvcr.io/nvidia/tritonserver:23.03-py3
, and added some packages via Dockerfile for Python backend modelsTLDR: The server should handle backend internal error.
Triton Information
nvcr.io/nvidia/tritonserver:23.03-py3
To Reproduce
This issue may not be specific to the model, but the settings below is used in our case:
Steps to reproduce the behavior:
Expected behavior
In the case above, the desired behavior of the tritonserver would be to "exit if unrecoverable." Then, the liveness probe would detect that the pod is unhealthy, and the pod would be restarted automatically.
As a sidenote and in this pytorch backend specific case, it seems we need to handle error in backend too.