Open vleushin opened 6 years ago
To clarify:
Can you please provide the stack trace showing the OOM. Getting a heap dump with heap buffers would be ideal. You can set -Dio.netty.noPreferDirect=true
to prefer heap buffers and adjust the memory size with -Xmx=…
to limit the heap size through JVM settings.
I've created minimal project where it is easy to reproduce it: https://github.com/vleushin/lettuce-oom , takes less than a minute. Sort of illustrates what I have in real -- tons of microservices (GRPC), inside many of them there is Redis backend, so I end up with bunch of Redis clients.
Exception:
febr. 19, 2018 9:16:38 AM io.lettuce.core.protocol.CommandHandler exceptionCaught
WARNING: null Unexpected exception during request: java.lang.NullPointerException
java.lang.NullPointerException
febr. 19, 2018 9:16:38 AM io.lettuce.core.protocol.CommandHandler exceptionCaught
WARNING: null Unexpected exception during request: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 100663296, max: 104857600)
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 100663296, max: 104857600)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:640)
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:594)
at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:740)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:214)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:146)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176)
at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137)
at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:125)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
BTW, in some setups I get OutOfDirectMemoryError
in some OutOfMemory
.
Heapdump is here: https://github.com/vleushin/lettuce-oom/releases/download/0.1.0/heapdump-1519024984196.zip
Awesome, thanks a lot. I'll take a deeper look as soon as I can.
Also, sometimes I get this exception:
febr. 19, 2018 10:31:30 AM io.lettuce.core.cluster.ClusterTopologyRefreshScheduler$ClusterTopologyRefreshTask run
WARNING: Cannot refresh Redis Cluster topology
io.lettuce.core.RedisException: Cannot retrieve initial cluster partitions from initial URIs [RedisURI [host='127.0.0.1', port=7000]]
at io.lettuce.core.cluster.RedisClusterClient.loadPartitions(RedisClusterClient.java:808)
at io.lettuce.core.cluster.RedisClusterClient.reloadPartitions(RedisClusterClient.java:732)
at io.lettuce.core.cluster.ClusterTopologyRefreshScheduler$ClusterTopologyRefreshTask.run(ClusterTopologyRefreshScheduler.java:197)
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
at io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:73)
at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
Caused by: io.lettuce.core.RedisConnectionException: Unable to establish a connection to Redis Cluster
at io.lettuce.core.cluster.topology.AsyncConnections.get(AsyncConnections.java:84)
at io.lettuce.core.cluster.topology.ClusterTopologyRefresh.loadViews(ClusterTopologyRefresh.java:68)
at io.lettuce.core.cluster.RedisClusterClient.loadPartitions(RedisClusterClient.java:787)
... 8 more
Suppressed: io.lettuce.core.RedisConnectionException: Unable to connect to 127.0.0.1:7000
at io.lettuce.core.cluster.topology.ClusterTopologyRefresh.lambda$getConnections$2(ClusterTopologyRefresh.java:226)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at io.lettuce.core.AbstractRedisClient.lambda$initializeChannelAsync$1(AbstractRedisClient.java:275)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:512)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:486)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:425)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:987)
at io.netty.channel.AbstractChannel$AbstractUnsafe.ensureOpen(AbstractChannel.java:970)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:243)
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1346)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:545)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:530)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:512)
at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:1004)
at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:259)
at io.netty.bootstrap.Bootstrap$3.run(Bootstrap.java:252)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute$$$capture(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
... 3 more
Caused by: java.util.concurrent.CompletionException: java.nio.channels.ClosedChannelException
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
... 24 more
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.ensureOpen(...)(Unknown Source)
This is one I saw initially in logging and was very confused -- because there were valid and working RedisURIs, but this one consequence of OOM.
@mp911de Hey! Any news? Have you managed to reproduce the problem?
I'm busy with other things right now. This one gets my attention as one of the next issues.
Sure, no rush. I turned off refresh for now. Seems no memory leaks, at least on couple day scale.
I took a look at the heap dump and I have the slight feeling that the memory consumption is a consequence of the runtime environment.
Netty creates tons of threads/event loops – too many if you consider the memory limit of 100MB.
Your heap dump shows the event loops. Looking at your code, you create lots of RedisClusterClient
instances which create lots of event loops. Please reuse ClientResources
as much as possible, ideally a single instance per JVM.
This will reduce on one hand the number of lingering threads. On the other hand, every netty thread has a thread-local memory buffer pool cache:
io.netty.buffer.PoolThreadCache$SubPageMemoryRegionCache
The more threads you spin up, the more memory caches are allocated and the more memory buffers are contained within such a cache and the more memory you require.
The more threads you use, the more (direct) memory you require. I'm not fully aware since which JVM version cgroup limits (especially processor count) is respected. By default, Lettuce creates 2 * Runtime.availableProcessors()
threads. You can limit the number by either setting -Dio.netty.eventLoopThreads=…
or via the API: DefaultClientResources.builder().ioThreadPoolSize(4).computationThreadPoolSize(4)
.
@mp911de Point taken. Will be using as few ClientResources
as possible. For cluster this will work, but for non-cluster I will still need at least one for each DB in Redis, right?
But I still suspect there is the problem. My production nodes (-Xmx=1G) eventually get OutOfDirectMemory anyway, so I turn off periodic refresh. Should I increase memory -- to check if I still be getting OutOfDirectMemory?
Also, this: https://github.com/vleushin/lettuce-oom/commit/49cabc59c4e99b75f3c557d71d06ab9d5b5ad41e#r27648655 . I'm getting this error with single RedisClusterClient
and single-node Redis cluster.
ClientResources
encapsulate thread pools, basically. You can share it amongst RedisClient
and RedisClusterClient
instances within a single JVM. In general, you need a single RedisClient
for multiple servers/databases/connections (a singleton in your JVM is fine, too). For clusters, you need a RedisClusterClient
instance per cluster each.
If you still run into issues, please use heap-buffers and please provide a heap dump. If there's a problem, we will find and fix it!
I think we found the cause. The OOME looks related to a connection leak reported in #721.
@mp911de Great! You can close this ticket, I will follow on #721
Let's keep that one open until we can be sure #721 will fix the issue.
A snapshot build of Lettuce 5.0.3 is available now. Care to give it a spin?
@mp911de Tried my test in https://github.com/vleushin/lettuce-oom . Worked without problems.
Tried with 5.0.2. Indeed, connection count to Redis was growing. With 5.0.3 it was stable.
Thanks a lot for your response. I'm closing this one as a duplicate of #721.
@mp911de I am getting the same error message but only when hitting my redis cluster hard via my micronaut
micorservice (netty based). I am not sure if it's a lettuce
error but seems like it is based on the stack trace below. Any feedback would be appreciated.
I am creating StatefulRedisConnection
using code below:
@Bean(preDestroy = "close")
@Singleton
public StatefulRedisClusterConnection<String, String> getRedisClusterConnection() {
final String clusterUris = getRedisClusterUris();
logger.info("Redis cluster URIs: {}", clusterUris);
final List<RedisURI> redisURIS = Arrays.stream(clusterUris.split(","))
.map(RedisURI::create)
.collect(Collectors.toList());
final RedisClusterClient redisClusterClient = RedisClusterClient.create(redisURIS);
return redisClusterClient.connect();
}
Stacktrace:
13:06:46.265 [nioEventLoopGroup-1-15] ERROR i.m.h.s.netty.RoutingInBoundHandler - Unexpected error occurred: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 457179136 byte(s) of direct memory (used: 620756999, max: 1073741824)
java.util.concurrent.CompletionException: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 457179136 byte(s) of direct memory (used: 620756999, max: 1073741824)
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at io.lettuce.core.protocol.AsyncCommand.doCompleteExceptionally(AsyncCommand.java:137)
at io.lettuce.core.protocol.AsyncCommand.completeExceptionally(AsyncCommand.java:130)
at io.lettuce.core.cluster.ClusterCommand.completeExceptionally(ClusterCommand.java:97)
at io.lettuce.core.protocol.CommandHandler.exceptionCaught(CommandHandler.java:216)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 457179136 byte(s) of direct memory (used: 620756999, max: 1073741824)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:640)
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:594)
at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764)
at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:754)
at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:260)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:231)
at io.netty.buffer.PoolArena.reallocate(PoolArena.java:397)
at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:118)
at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:285)
at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:265)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1079)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1072)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1062)
at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:506)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
... 22 common frames omitted
OOME can happen due to high memory requirements. Lettuce uses the shared memory allocator which means that it potentially shares direct memory with other infrastructure components.
Please check for memory requirements, ideally switch to heap buffers and collect a heap dump. If you can't solve the issue yourself, please file an issue along with some debug context.
Please check for memory requirements, ideally switch to heap buffers and collect a heap dump.
Can you please advise how to do above? Any best practices page, common pitfalls or any other documentation so that I can understand how to tweak configuration such that I get best out of Lettuce?
Thanks in advance @mp911de
@RaviH to switch to heap-based buffers, the JVM option below needs to be provided. Make sure your Xmx
accounts for this additional heap space.
-Dio.netty.noPreferDirect=true
Caused by: io.lettuce.core.RedisException: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 1367343104 byte(s) of direct memory (used: 1396703232, max: 2762997760)
at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:129) ~[lettuce-core-5.1.3.RELEASE.jar!/:na]
at io.lettuce.core.FutureSyncInvocationHandler.handleInvocation(FutureSyncInvocationHandler.java:69) ~[lettuce-core-5.1.3.RELEASE.jar!/:na]
at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:80) ~[lettuce-core-5.1.3.RELEASE.jar!/:na]
at com.sun.proxy.$Proxy175.get(Unknown Source) ~[na:na]
at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.get(LettuceStringCommands.java:66) ~[spring-data-redis-2.1.3.RELEASE.jar!/:2.1.3.RELEASE]
... 90 common frames omitted
Caused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 1367343104 byte(s) of direct memory (used: 1396703232, max: 2762997760)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:656) ~[netty-common-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:610) ~[netty-common-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764) ~[netty-buffer-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:754) ~[netty-buffer-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:260) ~[netty-buffer-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:231) ~[netty-buffer-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.buffer.PoolArena.reallocate(PoolArena.java:397) ~[netty-buffer-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:118) ~[netty-buffer-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:299) ~[netty-buffer-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:278) ~[netty-buffer-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103) ~[netty-buffer-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1096) ~[netty-buffer-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1087) ~[netty-buffer-4.1.31.Final.jar!/:4.1.31.Final]
at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:554) ~[lettuce-core-5.1.3.RELEASE.jar!/:na]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:648) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:583) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:500) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) ~[netty-transport-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) ~[netty-common-4.1.31.Final.jar!/:4.1.31.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.31.Final.jar!/:4.1.31.Final]
I have a similar problem。
@gezhiwei8899 Can you try updating to 5.2.0 snapshots to check whether the changes in 5.2 address your issue?
It is not easy for me to update code and do the test,We have Million users.
But I see the differences.
I see the changes,5.1.8.RELEASE.
So lettuce update netty version?
@mp911de I test the lastest version of lettuce 5.2.0build. Direct Memory OOM yet.
I got the same problem:
failed to allocate 16777216 byte(s) o f direct memory (used: 2147483648, max: 2147483648) io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 2147483648, max: 2147483648) at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:754) ~[netty-common-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:709) ~[netty-common-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:755) ~[netty-buffer-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:731) ~[netty-buffer-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:247) ~[netty-buffer-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:215) ~[netty-buffer-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:147) ~[netty-buffer-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:356) ~[netty-buffer-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187) ~[netty-buffer-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178) ~[netty-buffer-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53) ~[netty-all-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) ~[netty-transport-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.channel.epoll.EpollRecvByteAllocatorHandle.allocate(EpollRecvByteAllocatorHandle.java:75) ~[netty-all-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:777) [netty-all-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) [netty-all-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [netty-all-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.51.Final.jar!/:4.1.51.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.51.Final.jar!/:4.1.51.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
trying using : -Dio.netty.noPreferDirect=true
using lettuce:5.3.2.RELEASE
I had the same problem,use 5.2.0,has the problem been solved?
I had the same problem,use 5.2.0,has the problem been solved?
I changed to redisson client in spring boot projects
2021-02-23 06:53:38,798 ERROR --- [pool-12-thread-1] org.springframework.scheduling.support.TaskUtils$LoggingErrorHandler : Unexpected error occurred in scheduled task. org.springframework.data.redis.RedisSystemException: Redis exception; nested exception is io.lettuce.core.RedisException: io.netty.handler.codec.EncoderException: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 1056964612, max: 1073741824) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:74) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41) at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42) at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:257) at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.convertLettuceAccessException(LettuceStringCommands.java:718) at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.set(LettuceStringCommands.java:143) at org.springframework.data.redis.connection.DefaultedRedisConnection.set(DefaultedRedisConnection.java:231) at org.springframework.data.redis.core.DefaultValueOperations$3.inRedis(DefaultValueOperations.java:202) at org.springframework.data.redis.core.AbstractOperations$ValueDeserializingRedisCallback.doInRedis(AbstractOperations.java:59) at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:224) at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:184) at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:95) at org.springframework.data.redis.core.DefaultValueOperations.set(DefaultValueOperations.java:198) at com.qdcares.mobile.service.web.cache.FlightCacheImpl.cacheAllFlight(FlightCacheImpl.java:112) at com.qdcares.mobile.service.web.cache.FlightCacheImpl.fullRefreshFlightCache(FlightCacheImpl.java:108) at com.qdcares.mobile.service.data.task.MessageHandleSchedule.executeHandleFullFlightMessage(MessageHandleSchedule.java:56) at com.qdcares.mobile.service.data.task.MessageHandleSchedule.executeFullFlightTask(MessageHandleSchedule.java:39) at sun.reflect.GeneratedMethodAccessor764.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.runAndReset(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: io.lettuce.core.RedisException: io.netty.handler.codec.EncoderException: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 1056964612, max: 1073741824) at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:129) at io.lettuce.core.cluster.ClusterFutureSyncInvocationHandler.handleInvocation(ClusterFutureSyncInvocationHandler.java:123) at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:80) at com.sun.proxy.$Proxy192.set(Unknown Source) at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.set(LettuceStringCommands.java:141) ... 23 common frames omitted Caused by: io.netty.handler.codec.EncoderException: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 1056964612, max: 1073741824) at io.netty.handler.codec.MessageToByteEncoder.write(MessageToByteEncoder.java:125) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723) at io.lettuce.core.protocol.CommandHandler.writeSingleCommand(CommandHandler.java:421) at io.lettuce.core.protocol.CommandHandler.write(CommandHandler.java:353) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:38) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1081) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1128) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1070) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ... 1 common frames omitted Caused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 1056964612, max: 1073741824) at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:640) at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:594) at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764) at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:740) at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244) at io.netty.buffer.PoolArena.allocate(PoolArena.java:226) at io.netty.buffer.PoolArena.reallocate(PoolArena.java:397) at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:118) at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:285) at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:265) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1079) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1072) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1062) at io.lettuce.core.protocol.CommandArgs$ByteBufferArgument.writeByteBuf(CommandArgs.java:473) at io.lettuce.core.protocol.CommandArgs$ValueArgument.encode(CommandArgs.java:703) at io.lettuce.core.protocol.CommandArgs.encode(CommandArgs.java:346) at io.lettuce.core.protocol.Command.encode(Command.java:119) at io.lettuce.core.protocol.AsyncCommand.encode(AsyncCommand.java:185) at io.lettuce.core.cluster.ClusterCommand.encode(ClusterCommand.java:92) at io.lettuce.core.protocol.CommandEncoder.encode(CommandEncoder.java:90) at io.lettuce.core.protocol.CommandEncoder.encode(CommandEncoder.java:75) at io.netty.handler.codec.MessageToByteEncoder.write(MessageToByteEncoder.java:107) ... 18 common frames omitted
using lecctuce:5.2.0.RELEASE
changed to redisson client in spring boot projects
redission is ok ?
changed to redisson client in spring boot projects
redission is ok ?
yep
Instead of changing the client, configure -Dio.netty.noPreferDirect=true
to use heap buffers if you run out of configured direct memory.
Instead of changing the client, configure
-D=io.netty.noPreferDirect=true
to use heap buffers if you run out of configured direct memory.
thanks
Setting -XX:MaxDirectMemorySize
helped us with this issue. We discovered that our platform was capping direct memory (related to https://github.com/cloudfoundry/java-buildpack-memory-calculator/issues/17).
Not sure if anything can be done about it from the Lettuce side, but one scenario that grows direct memory usage linearly (and hits the 10 MB cap set by CF memory calculator quite easy) can be reproduced using only spring-boot-starter-data-redis-reactive
and something like this:
@SpringBootApplication
public class SampleApplication implements CommandLineRunner {
public static void main(String[] args) {
SpringApplication.run(SampleApplication.class, args);
}
@Autowired
private ReactiveStringRedisTemplate redisTemplate;
@Override
public void run(String... args) throws Exception {
for (int i = 0; i < 200; ++i) {
Thread.sleep(100L);
this.redisTemplate.listenToChannel("test").log().subscribe();
}
}
}
Starting the JVM with -Dio.netty.noPreferDirect=true
naturally addresses the issue.
redisTemplate.listenToChannel("test")
creates a new ReactiveRedisMessageListenerContainer
that allocates a new connection. Each connection allocates two buffers for response processing (64kb + 1 kb) that map roughly to 13 MB of memory for your code example.
Yep, that's why I opened spring-projects/spring-data-redis#2229 to consider documentation improvements for #listenTo
variants as they somewhat set up users for failure.
@gezhiwei8899 hello,Have you solved this problem
@mp911de Hello, Have you solved this problem, using heap memory, the performance will drop
@mp911de we have same issue here,
<netty.version>4.1.109.Final</netty.version>
<lettuce.version>6.3.2.RELEASE</lettuce.version>
and below configs
useSsl: true
maxTotal: 400
subscriptionsMaxTotal: 400
timeout: 5s
connectTimeout: 500ms
nettyThreads: 64
We use aws redis serverless, we have 12 tasks running and for some reason, one tasks becomes faulty, and we start to have continues "Redis command timed out" on this single task.
Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 2 second(s)
at io.lettuce.core.internal.ExceptionFactory.createTimeoutException(ExceptionFactory.java:59)
at io.lettuce.core.internal.Futures.awaitOrCancel(Futures.java:246)
at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:74)
When this happens long for "CachePut", it exhausts Direct Memory, and we get OOM.
org.springframework.data.redis.RedisSystemException: Redis exception; nested exception is io.lettuce.core.RedisException: java.lang.OutOfMemoryError: Cannot reserve 4194304 bytes of direct buffer memory (allocated: 786254691, limit: 786432000)
....
Caused by: io.lettuce.core.RedisException: java.lang.OutOfMemoryError: Cannot reserve 4194304 bytes of direct buffer memory (allocated: 786254691, limit: 786432000)
at io.lettuce.core.internal.Exceptions.bubble(Exceptions.java:83)
at io.lettuce.core.internal.Futures.awaitOrCancel(Futures.java:250)
at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:74)
Any idea how this can be fixed ?
Also, this is side issue and client side. I will open another issue for lettuce <---> redis serverless issue, I guess this is an edge case, while it does not happen continiusly.
@mp911de we have same issue here,
Hey @mshahmoradi87 ,
the issue is that the shared memory could be exhausted for a number of (outside) reasons. The suggestion @mp911de had tried to address that by limiting the memory constraints to the JVM that is running the Lettuce driver and thus proving the memory leak is in the driver and not some other component.
If - with heap memory - the problem remains, then we can try and do a heap dump and see which are the offending objects and their parents, to try to address the issue.
Without it we can't trace the issue. Do you think you can do that?
@mp911de Hello, Have you solved this problem, using heap memory, the performance will drop
Hey @DourFeng,
assuming we have an issue to begin with. Could you help us trace this with a heap dump?
It might not be clear but the suggestion to use the heap is not meant to "fix" the issue, but to trace it.
@gezhiwei8899 hello,Have you solved this problem
Same here. We need to trace the problem if there is a problem in the driver to begin with. Direct memory is shared and any other running process could be causing an OOM issue.
@mp911de we have same issue here,
Hey @mshahmoradi87 ,
the issue is that the shared memory could be exhausted for a number of (outside) reasons. The suggestion @mp911de had tried to address that by limiting the memory constraints to the JVM that is running the Lettuce driver and thus proving the memory leak is in the driver and not some other component.
If - with heap memory - the problem remains, then we can try and do a heap dump and see which are the offending objects and their parents, to try to address the issue.
Without it we can't trace the issue. Do you think you can do that?
Hi, I have tried to simulate this in test env, but no luck yet ! What I could do is to enable direct memory profiling in production env, here is a method trace when I have timeout with CachePut error. I hope this helps.
@tishun does the above help? Would you reopen the issue, or should we create a new one ?
@tishun does the above help? Would you reopen the issue, or should we create a new one ?
Let's reopen the issue and I will try to take another look when I have the time
Lettuce version: 5.0.2.RELEASE Reproducible in Linux (Kubernetes), Windows (my local machine), likely everywhere
I've started testing Redis cluster in Kubernetes. So far -- not bad, all failover scenarios worked fine, but there was one big problem -- memory leaks. It was not evident for me at first (because it was direct memory leak and I was looking at heap charts), but I think I tackled two cases.
First one is topology refresh. I have single node redis cluster (
redis-cluster
) in docker compose for local testing. With this options:And small direct memory size, e.g. -XX:MaxDirectMemorySize=100M or 200M, I can get OOM exception in 1-2 minutes. Exception looks like this:
Looks like Netty is out of direct memory.
Second case I'm less sure, I did not do extensive testing, but I think they are connected. I have 7 redis node cluster in our Kubernetes environment. We killed one master to see if it will failover. It did, topology refreshed, everything seemed OK. But in background Lettuce kept pinging/trying to connect to dead node (only seen when turned on Lettuce debug log), and direct memory quickly dried up and node died.
Any thoughts?