taoensso / sente

Realtime web comms library for Clojure/Script
https://www.taoensso.com/sente
Eclipse Public License 1.0
1.74k stars 193 forks source link

Undertow worker pool thread leak caused by ajax fallback requests #409

Closed kajism closed 2 years ago

kajism commented 2 years ago

We were experiencing Undertow worker pool starvation in production. It was caused by ajax requests waiting on never timeouting core.async/<!! channel read. It works better after replacing core.async channel by promise with deref timeout.

ptaoussanis commented 2 years ago

@kajism Hi Karel, thanks for pinging about this- and for the PR. Will add some specific comments shortly.

kajism commented 2 years ago

@ptaoussanis Hello Peter, I have commited the backward compatible suggested changes and will verify it in production.

ptaoussanis commented 2 years ago

@kajism Thanks! Content of your PR has been merged on 2022-09-20-pr-409 branch and pushed to [com.taoensso/sente "1.18.0-SNAPSHOT"] on Clojars.

I don't use Undertow, so it'd be helpful to get your confirmation that this version:

  1. Solves the problem
  2. Still works correctly otherwise

Once I've got your confirmation, I'll merge with master.

Cheers

kajism commented 2 years ago

@ptaoussanis Peter, there seems to be a problem on line 44. I don't see a 3 arity version of clojure.core/deliver ...

kajism commented 2 years ago

@ptaoussanis We have moved to Undertow from http-kit after experiencing huge memory leaks in allocated direct memory buffers. After a few days direct memory went up to the max heap size (4GB) and then some short videos on our page stopped playing. The problem is that direct memory is not allocated on the heap, so there was 4GB of heap + 4GB in those buffers.

At first, it was hard to find the the problem, because almost everybody talks only about the heap when speaking about JVM memory. But in the system the number of memory was almost 2 times higher. Also jconsole don't show this memory. It can be found using JMX beans (java.nio:name=direct,type=BufferPool MemoryUsed).

We are sending quite huge ws messages (sometimes more than 1MB), so I was suspecting this may be the reason. After move to Undertow this problem disappeared. In our case, Undertow allocates only up to 90MB in direct memory buffers. But then we experienced the worker thread starvation...

I have investigated this further and created this http-kit issue: https://github.com/http-kit/http-kit/issues/496

ptaoussanis commented 2 years ago

Peter, there seems to be a problem on line 44. I don't see a 3 arity version of clojure.core/deliver ...

@kajism Apologies, fixed. And updated SNAPSHOT.

kajism commented 2 years ago

Peter, there seems to be a problem on line 44. I don't see a 3 arity version of clojure.core/deliver ...

@kajism Apologies, fixed. And updated SNAPSHOT.

Thanks. Locally works fine. Will be deployed in a few days and will confirm then.

kajism commented 2 years ago

Works fine. Thanks!

ptaoussanis commented 2 years ago

Great, appreciate the confirmation 👍 Cheers