Open taoye114 opened 1 year ago
currently we don't find a way to fix it. we want to known how to mitigate this error:
This issue looks similar to #5765. The frameworks like pytorch does not validate the tensor values before sending it on execution. For a model that doesn't handle unexpected values(out of bounds indices for example), this can lead to corruption in cuda context and hence a sticky error which can fail other inference executions as well. There is a single cuda context per Triton process. The solution is to ensure the model has proper handling of the tensor data which might overflow the indices.
We can investigate whether we can detect a corrupted cuda context and mark the server down.
any updates on this ? we really need this
We have discovered that our tritonserver has the non-determinist bug.
@taoye114 Can you help us getting a minimal reproducer? Most likely the issue would appear when the model is fed an invalid data. Can you check if the bug can be made deterministic by providing the exact same set of data to the model. @Jack47 If my hunch is correct then the model should be modified to handle the invalid data so that it can map to proper index. This would allow all the operations in pytorch to run without an issue. As a part of longer term goal, Triton may add some cuda context health check, to update the model readiness.
@tanmayv25 hi, seems there is a long way to go before triton can officially handle this cuda context thing.
Would you give us some hint, or perhaps some beta code for use to try?
- a best practice to check invalid data? cause our online model is rather complex and thus impossible to check input line by line.
You can catch the CUDNN exception and dump the request input data to a file within model.py implementation itself. Then you can try running the same scenario outside Triton in a separate python script. Why the data is considered invalid depends highly on the model architecture.
- any way to detect corrupted cuda context and how to maybe replace it?
The only way to fix such sticky cuda context error is to restart the application (which is triton in this case).
Additionally, I found similar issue on pytorch: https://github.com/pytorch/pytorch/issues/27588 It might be resolved by using the correct versions of pytorch and cuda?
I have recently had the same issue, having a proper management of sticky errors / cuda errors would be tremendous
Still have this problem in 24.06 version, server not exit when met an unrecoverable backend
Description We use tritonserver with python backend to deploy a customized stable-diffusion model running on pytorch.
We have discovered that our tritonserver has the non-determinist bug:
RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.