Open akoufa opened 1 year ago
We keep seeing the issue but we were not able to track it down yet.
@mp911de You can find the Spring Boot project which has this issue here: https://github.com/arconsis/server_benchmarks/tree/main/bookstore-springboot . The issue pops up only when the service is under load.
The issue pops up only when the service is under load.
This is part of the problem why we were not able to pin-point the issue because under high load, debugging is impossible.
We have started seeing this issue reliably in an application which is regularly under high load. I hope following logs with ResourceLeakDetector.setLevel(ADVANCED)
can help to localize the cause.
We also encountered the same issue. Any solution/workaround for this ? (besides increasing memory)
Is it the only error in logs? We kind of have leak in BackendMessageSubscriber.addConversation
on if (!isConnected.get())
branch so there, since we are releasing encoded parameters only on encoding part. The may be other places like that, but all of them would require some kind of connection errors to appear before.
Wanted to drop some info here in case it's helpful in tracking this down. I maintain a Spring Boot Webflux service (3.0.0) and am also seeing these leak issues from r2dbc-postgres
, but more specifically when requests to the service are canceled.
A client of my service will make a burst of requests all at once, and will sometimes cancel a big batch of them upon some failure on their end. Every time that happens, I get a burst of kotlinx.coroutines.JobCancellationException: MonoCoroutine was cancelled
from my own error handling, and it's reliably accompanied by a handful of Netty leak logs as well. The logs are slightly different than what's been posted above so I'll include them here.
I've been able to reproduce this synthetically as well with a load test where requests start to time out on the client side under enough load. Hope this helps!
Getting the same issue. Are we sure this is an r2dbc issue and not reactor-netty? Perhaps to do with https://github.com/reactor/reactor-netty/issues/881?
Getting the same "DataRow.Release() was not called", which under paranoid
debugging shows up as "ByteBuf.release() was not called". Happens under load usually when a subscriber closes the connection prematurely, but not necessarily under that specific condition.
With the limited testing and debugging we could do, it doesn't seem to actually affect the service's performance or stability for now, but the errors do pile up quite a bit and it does scare anyone who looks at the logs.
Edit. The connections to the database remain open, unfortunately. We decided to abandon r2dbc until this issue is resolved.
Hi,
we are running into a similar problem using spring 3.1.3 and r2db2 1.0.1. Our service logs this every now and then:
LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.
io.netty.buffer.SimpleLeakAwareByteBuf.unwrappedDerived(SimpleLeakAwareByteBuf.java:144)
io.netty.buffer.SimpleLeakAwareByteBuf.readRetainedSlice(SimpleLeakAwareByteBuf.java:67)
io.r2dbc.postgresql.message.backend.DataRow.decodeColumn(DataRow.java:149)
io.r2dbc.postgresql.message.backend.DataRow.decode(DataRow.java:139)
io.r2dbc.postgresql.message.backend.BackendMessageDecoder.decodeBody(BackendMessageDecoder.java:65)
io.r2dbc.postgresql.message.backend.BackendMessageDecoder.decode(BackendMessageDecoder.java:39)
reactor.core.publisher.FluxMap$MapConditionalSubscriber.onNext(FluxMap.java:208)
reactor.core.publisher.FluxMap$MapConditionalSubscriber.onNext(FluxMap.java:224)
reactor.netty.channel.FluxReceive.drainReceiver(FluxReceive.java:292)
reactor.netty.channel.FluxReceive.onInboundNext(FluxReceive.java:401)
reactor.netty.channel.ChannelOperations.onInboundNext(ChannelOperations.java:404)
reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:114)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:333)
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:454)
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:509)
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407)
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.base/java.lang.Thread.run(Unknown Source)"
Hello,
i use io.r2dbc:r2dbc-pool:1.0.1.RELEASE
and io.r2dbc:r2dbc-postgresql:0.8.13.RELEASE
with org.springframework.data:spring-data-r2dbc:3.0.2
,
I've got a similar problem, but ~before the LEAK: ByteBuf.release()
, i usualy get a LEAK: DataRow.release()
first :~
EDIT : I resolved this in my app by refactoring a supid code causing very high accesses to the database.
2023-09-26T15:14:08.473+02:00 ERROR 1 --- [tor-tcp-epoll-4] io.netty.util.ResourceLeakDetector : LEAK: DataRow.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information. Recent access records: Created at: io.r2dbc.postgresql.message.backend.DataRow.
(DataRow.java:37) io.r2dbc.postgresql.message.backend.DataRow.decode(DataRow.java:134) io.r2dbc.postgresql.message.backend.BackendMessageDecoder.decodeBody(BackendMessageDecoder.java:65) io.r2dbc.postgresql.message.backend.BackendMessageDecoder.decode(BackendMessageDecoder.java:39) reactor.core.publisher.FluxMap$MapConditionalSubscriber.onNext(FluxMap.java:208) reactor.core.publisher.FluxMap$MapConditionalSubscriber.onNext(FluxMap.java:224) reactor.netty.channel.FluxReceive.drainReceiver(FluxReceive.java:294) reactor.netty.channel.FluxReceive.onInboundNext(FluxReceive.java:403) reactor.netty.channel.ChannelOperations.onInboundNext(ChannelOperations.java:411) reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:113) io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:333) io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:454) io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800) io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499) io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397) io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) java.base/java.lang.Thread.run(Thread.java:833) 2023-09-26T15:17:50.169+02:00 ERROR 1 --- [tor-tcp-epoll-2] io.netty.util.ResourceLeakDetector : LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information. Recent access records: Created at: io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:403) io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188) io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179) io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53) io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:120) io.netty.channel.epoll.EpollRecvByteAllocatorHandle.allocate(EpollRecvByteAllocatorHandle.java:75) io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:785) io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499) io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397) io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) java.base/java.lang.Thread.run(Thread.java:833) 2023-09-26T15:20:36.134+02:00 ERROR 1 --- [or-http-epoll-4] io.netty.util.ResourceLeakDetector : LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information. Recent access records: Created at: io.netty.buffer.SimpleLeakAwareByteBuf.unwrappedDerived(SimpleLeakAwareByteBuf.java:143) io.netty.buffer.SimpleLeakAwareByteBuf.retainedSlice(SimpleLeakAwareByteBuf.java:57) io.netty.handler.codec.LengthFieldBasedFrameDecoder.extractFrame(LengthFieldBasedFrameDecoder.java:502) io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:440) io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:333) io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800) io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499) io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397) io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) java.base/java.lang.Thread.run(Thread.java:833)
@siegfried-chaisson Can you post a bit more detail? With knowing, what caused the issue, we could get some insights on what end we need to investigate.
The impact of this issue remains a consistent problem in several applications. Since no resolutions is in sight, we intend to switch over to JDBC for the meantime and wrap queries in Mono.fromCallable(() -> repository.query(id)).subscribeOn(Schedulers.boundedElastic())
. Are there any other workarounds?
@siegfried-chaisson Can you post a bit more detail? With knowing, what caused the issue, we could get some insights on what end we need to investigate.
I'm sorry, but the bug is still occurring 😓. I switched to an H2 database with no issues, so perhaps the problem isn't related to my code. As mentioned earlier, it seems to happen under heavy load. I've tried various connection pool parameters, such as max-idle-time: -1 and max-life-time: -1, with no success. I initially thought the issue might be related to connection timeouts, but at this point, I don't have any concrete information.
Hello everyone, I wanted share to a workaround I implemented to address it, and i hope it could help you to investigate.
The problem was related to the timing of updates, which were occurring too rapidly. This resulted in a situation where an update was attempted before a create operation had been completed. In my case, I'm using a UUID as the primary key (@Id).
In the initial approach: Initially, the entity.isNew() method generated a random UUID. During the first 'update,' the findByName method returned an empty result because the first entity hadn't been saved yet, and the saved created a second Id for the same entity.name On the second update, the findByName method returned multiple entities with the same name, leading to an error that expected a single result. This situation finally cause the Leak. (I don't know why)
In the second approach: Similar to the first approach, the entity.isNew() method generated a deterministic UUID, based on the name field. During the first 'update,' the findById method again returned an empty result. However, during the first update, the save operation returned an error because the key already existed in the database, causing again the Leak.
Finally, to resolve this issue, I decided not to merge entities too quickly. Instead, I get back to random UUID and performed a simple save operation. Additionally, I had to implement a grouping mechanism when reading the data. This approach successfully eliminated the leak issue for me. I'll probably use a kafka topic for those too quicky updated entites, then stream the groupby operation. I don't know yet.
Here's a code snippet example that shows the change I made in the save method:
public Mono\<Void> save(MyEntity entity) { return repository.findByName(entity.getName()) .defaultIfEmpty(new MyEntity ()) .flatMap(currentEntity -> { entity.setId(currentEntity.getId()); return repository.save(entity); }) .then(); }
changed in :
public Mono\<Void> save(MyEntity entity) { return repository.save(entity).then(); }
Leaks do occur in integration tests as well. At random locations though. :sweat_smile:
Hello, this isn't happening on my side since I read the Netty docs and found un-released ByteBuf(s) in my code or suscribers not completed (map/flatMap/then/subscribe and more)😅
Special thanks to the paranoid level detector from netty, sorry but in my case it wasn't an r2dbc-postgresql issue.
Bug Report
Hello we are using Spring Boot 3.0.2, Spring Data R2DBC and r2dbc-postgresql. Under normal conditions everything works fine. Though when stress testing our service we observe Netty leaks which force our service to restart. I have attached an example stacktrace below.
Stacktrace:
Our code is very simple we are just using Spring Data Repositories:
and then just using the above repository like this:
Is there anything we did configure wrongly or we have to watchout?