Closed kbramhendra closed 3 months ago
Would need much more information. Presumably a process is dying: what code is it running? How is it terminating, e.g. what signal? If python, there should be ways to catch the signal with a try-catch at the outer level of the code and report it before dying.
hi thanks for replying...Its running on triton inference server with python. overall setup is there with kubernetes. How to stop this process from dying. There aren't any signals per say. Memory is fine both GPU and CPU and its in idle state.
Inference-server stuff would normally be a sherpa issue, did you build that with sherpa? If so you should probably open an issue on the associated repo. IDK why you think this is specifically about the FST. But when a process dies, either it exits or it dies by a signal. I'm not an expert on how to debug such things, and haven't used triton, but there should always be a way to track it down, e.g. get a stack trace. Perhaps some debug setting.
Its not build with sherpa. I have 3 process running on the GPU. Encoder and CTC and FST modules. Encoder and CTC are onnx modules , these processes are still running only FST process is getting died down. All these are in docker setup, so its becoming difficult for track me to track it down. It exists suddenly. Can we prevent this from happening ?
M that's tricky, but in principle it should be possible to reproduce it without docker for debugging purposes.
yeah...I have been trying to reproduce it but couldn't succeed. I will try to share logs if i find any...if you find any such cases or solution in future please let me know. Thank you.
Hi, The issue was found to be in the triton memory management. Thanks for helping.
Hi , I am using FST for production kind of setup. I have built fst using #1218 branch and torch 1.14. The fst is going down abruptly without any particular reason. its not because of OOM issue neither any utterance is triggering it. @pkufool can you please suggest any ways to mitigate this.