Closed userlaojie closed 3 weeks ago
This is jstat GC statistics, and the number of cgc and ygc is almost the same.
I believe we have the same issue.
reactor-netty 1.1.22
Netty 4.1.112
Spring Boot 3.3.3
uname -a: Linux batch-service-794ddfb76-bqnlb 6.5.0-45-generic #45~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Jul 15 16:40:02 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
java -version:
openjdk version "21.0.4" 2024-07-16 LTS
OpenJDK Runtime Environment Temurin-21.0.4+7 (build 21.0.4+7-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.4+7 (build 21.0.4+7-LTS, mixed mode, sharing)
EpollSocketChannel
objects are not being garbage-collected:
I looked into some of these instances and they all seem to be referenced by invalidated pooled connections:
Have recent releases changed anything w.r.t. pool entry invalidation? Note that we have not configured the connection pool in any way (reactor.netty.pool.maxIdleTime
and friends), so all the defaults should apply.
We have the same issue. Reactor version(s) used:1.1.20 Spring Boot: 3.2.7 JVM version (java -version):OpenJDK 17 EpollSocketChannel is more than 1.27G,but not being garbage-collected。
All, Please try to provide a reproducible example
Ok, we will try to reproduce the scene locally with the jmeter pressure test interface, which will take a day
@userlaojie any luck so far? I myself have been unable to reliably reproduce it. The tricky thing is that even in my production application, the leak doesn't always happen. Sometimes it starts leaking until a crash, then, after the reboot, everything is fine for many days.
Considering that in my heap dump, the pool refs are all in STATE_INVALIDATED
, maybe its related to connections being closed abnormally?
Hi @userlaojie, I strongly believe you came across the same problem as I did, please check my issue if you observe the same behavior. I spent a lot of time trying to simulate it locally, but never managed to produce a reliable reproducer.
Sorry, again, we can't replicate this locally. Our latest progress is to remove as many factors as possible that cause http connection unrelease, such as eliminating micrometer usage and not using custom MeterRegistry. The following is the latest monitoring data, some pod memory is still too high: channel-qrcode-pay-7686b6d777-5pjgj channel-qrcode-pay-6959cf9bb4-b5zst
Hi, I managed to replicate part of the problem and currently discussing it on gitter. If you have the same problem, there are 2 ways how to mitigate it at the moment. Either replace reactor-netty with different, WebClient supported library (we used Apache HttpClient 5, works well), or if you can handle it disable connection keepAlive. In our case, both options eliminate the leak. Of course disabling keepalive is not great and a long-term solution, but you can at least verify if its the same problem. Performance hit will depend on your use case.
I'm working on a fix for an issue that I found with the reproducer that @vitjouda provided on Gitter
@userlaojie @vitjouda #3459 should be addressing this issue. If you are able to test the snapshot version, it will be great!
Hi, I am going to deploy the snapshot and let it sit for a day or 2 and report back. Thank you for the fix.
Hi again, I tested the snapshot and it looks good! Thank you for the fix :)
Hello, we are revamping our system with spring-webflux. After the service was started in a Linux environment, it was found that the memory kept increasing, and the memory was never reclaimed by the jvm. After pulling the service dump file, we suspected that the webclient connection pool was cross-referenced, resulting in the EpollSocketChannel object not being reclaimed. Please help to check whether there is any problem with webclient configuration, or you can check it from other aspects. Thank you.
This is MAT after analyzing a single object over 80M.
This is the jvm memory monitoring usage.
Steps to Reproduce
The webclient configuration parameters are as follows:
Possible Solution
I have two considerations. The first is that ByteBuf references are not released in epoll model, and the second is that webclient connection pool has configuration problems.
Your Environment
netty
, ...):java -version
):OpenJDKuname -a
):CentOS Linux 7 (Core)