Closed Desmond819 closed 2 weeks ago
Did pm2 kill the original engine? The cancelled error in engine happens when requests get cancelled (e.g. you send a chat completion request, but did not iterate over all of it, so the server side decide to cancel the request), but the original engine should continue to be functioning.
if you can find a way to reproduce the error we would be happy to dig deeper
I run 2 GPU so literally 2 python programs was running, after the engine restarted, pm2 only killed 1 program, causing overwhelming to the GPU. I can't myself reproduce the error as I get completion requests from the blockchain, but this happens occasionally after the server is running for sometime. Here is how it looks like when this issue happen. the pid 2161 is the idle process which huge memory usage and not release.
I think when the memory usage goes very high, it will start to get error and pm2 will restart the process but without killing one of them (created for parallel tensor)
Would be nice if you can confirm there is a local command to reproduce the issue. On latest mlc engine, I tried starting the engine with tp=2, and manually run kill
the main engine process, the extra process did get released as well. Not sure if it is related to how pm2 killed the process
I use pm2 to run the mlc-llm server and after it's running for 2 days. I start to get this error and the server will restart. But after restarted, there will be idle python process occupying 100% of the GPU and make the speed become very slow. Is there anyway to resolve it?