Open apolloFER opened 10 years ago
I'm having exactly the same issue which is very blocking for me. I used the same debugging method as apolloFER and I have the same stack trace
This issues has stopped appearing at our end. Don't know if some of the updates we did fixed it or if Tornado 4.0 solved the problem.
I also managed to fix it. The project started on brukva and after migrating to tornado-redis, some changes in the unsubscribe callback triggered this error. However, it still shows a possible lock in the ioloop
Issue: Tornado server no longer responds to HTTP requests.
Backtracking: Debugged the Tornado process with gdb. Managed to backtrace the code to a mutex deadlock in IOLoop's addCallback. It happens when a Redis command receives the response (in my case a BLPOP) and adds the callback to the ioloop. During the partial wrapping in the IOLoop's addCallback method the Tornado-Redis client becomes deleted (del is called) which disconnects the Connection object. The disconnected Connection object then tries to add another callback to IOLoop. Since addCallback has been called already, the mutex is locked. The second call to addCallback stops indefinitely at the Mutex. I have provided the following gdb backtrack: