jchambers / pushy

A Java library for sending APNs (iOS/macOS/Safari) push notifications
https://pushy-apns.org/
MIT License
1.81k stars 451 forks source link

Memory leak? #478

Closed flozano closed 7 years ago

flozano commented 7 years ago

We have found this in one of our components which uses latest pushy:

LEAK: ReferenceCountedOpenSslContext.release() was not called before it's garbage-collected. Enable advanced leak reporting to find out where the leak occurred. To enable advanced leak reporting, specify the JVM option '-Dio.netty.leakDetection.level=advanced' or call ResourceLeakDetector.setLevel() See http://netty.io/wiki/reference-counted-objects.html for more information.

Stacktrace (most recent call first):

    at Slf4JLogger.java.error(Slf4JLogger.java:176)
    at ResourceLeakDetector.java.reportUntracedLeak(ResourceLeakDetector.java:313)
    at ResourceLeakDetector.java.reportLeak(ResourceLeakDetector.java:289)
    at ResourceLeakDetector.java.track0(ResourceLeakDetector.java:248)
    at ResourceLeakDetector.java.track(ResourceLeakDetector.java:237)
    at ReferenceCountedOpenSslContext.java.<init>(ReferenceCountedOpenSslContext.java:239)
    at ReferenceCountedOpenSslContext.java.<init>(ReferenceCountedOpenSslContext.java:219)
    at ReferenceCountedOpenSslClientContext.java.<init>(ReferenceCountedOpenSslClientContext.java:59)

We'll try to enable the advanced leak detection, but anyhow... anyone else has seen this in latest pushy?

flozano commented 7 years ago
-r--r-----. 1 root wheel  269731 May 16 13:53 netty-buffer-4.1.11.Final.jar
-r--r-----. 1 root wheel  311824 May 16 13:53 netty-codec-4.1.11.Final.jar
-r--r-----. 1 root wheel  367915 May 16 13:52 netty-codec-http2-4.1.11.Final.jar
-r--r-----. 1 root wheel  550549 May 16 13:53 netty-codec-http-4.1.11.Final.jar
-r--r-----. 1 root wheel  119730 May 16 15:12 netty-codec-socks-4.1.11.Final.jar
-r--r-----. 1 root wheel  532341 May 16 13:53 netty-common-4.1.11.Final.jar
-r--r-----. 1 root wheel  353876 May 16 13:53 netty-handler-4.1.11.Final.jar
-r--r-----. 1 root wheel   20743 May 16 15:16 netty-handler-proxy-4.1.11.Final.jar
-r--r-----. 1 root wheel   32233 May 16 13:53 netty-resolver-4.1.11.Final.jar
-r--r-----. 1 root wheel 2393039 Apr 26 15:35 netty-tcnative-boringssl-static-2.0.1.Final.jar
-r--r-----. 1 root wheel  915772 Apr 26 15:35 netty-tcnative-boringssl-static-2.0.1.Final-linux-x86_64.jar
-r--r-----. 1 root wheel  432522 May 16 13:53 netty-transport-4.1.11.Final.jar
-r--r-----. 1 root wheel  102351 May 25 08:00 pushy-0.10.jar
 uname -a
Linux myserver.com 2.6.32-573.18.1.el6.x86_64 #1 SMP Tue Feb 9 22:46:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
jorcasso commented 7 years ago

We have lot of handshake errors, maybe that could be related:

[2017-06-09T04:08:27.309Z] [WARN] [nioEventLoopGroup-2-3] io.netty.handler.ssl.ApplicationProtocolNegotiationHandler:? [id: 0x869dade6, L:/172.31.17.75:43440 - R:api.development.push.apple.com/17.188.138.73:443] 
TLS handshake failed:
javax.net.ssl.SSLHandshakeException: error:10000438:SSL routines:OPENSSL_internal:TLSV1_ALERT_INTERNAL_ERROR
        at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.shutdownWithError(ReferenceCountedOpenSslEngine.java:751)
        at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.sslReadErrorResult(ReferenceCountedOpenSslEngine.java:959)
        at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:914)
        at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:978)
        at io.netty.handler.ssl.ReferenceCountedOpenSslEngine.unwrap(ReferenceCountedOpenSslEngine.java:1021)
        at io.netty.handler.ssl.SslHandler$SslEngineType$1.unwrap(SslHandler.java:205)
        at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1156)
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1078)
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        at java.lang.Thread.run(Thread.java:745)
[2017-06-09T04:08:27.310Z] [INFO] [nioEventLoopGroup-2-3] com.turo.pushy.apns.ApnsClient:? Failed to connect.
java.lang.IllegalStateException: Channel closed before HTTP/2 preface completed.
        at com.turo.pushy.apns.ApnsClient$3.operationComplete(ApnsClient.java:395)
        at com.turo.pushy.apns.ApnsClient$3.operationComplete(ApnsClient.java:387)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
        at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
        at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
        at io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:1058)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:686)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:664)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:607)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1276)
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:624)
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:608)
        at io.netty.handler.ssl.SslHandler$8.operationComplete(SslHandler.java:1657)
        at io.netty.handler.ssl.SslHandler$8.operationComplete(SslHandler.java:1646)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
        at io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:163)
        at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:93)
        at io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:28)
        at io.netty.handler.ssl.SslHandler.safeClose(SslHandler.java:1646)
        at io.netty.handler.ssl.SslHandler.closeOutboundAndChannel(SslHandler.java:1457)
        at io.netty.handler.ssl.SslHandler.close(SslHandler.java:613)
        at io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:624)
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:608)
        at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:465)
        at io.netty.handler.ssl.ApplicationProtocolNegotiationHandler.handshakeFailure(ApplicationProtocolNegotiationHandler.java:116)
        at io.netty.handler.ssl.ApplicationProtocolNegotiationHandler.userEventTriggered(ApplicationProtocolNegotiationHandler.java:94)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
        at io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
        at io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307)
        at io.netty.handler.ssl.SslUtils.notifyHandshakeFailure(SslUtils.java:246)
        at io.netty.handler.ssl.SslHandler.notifyHandshakeFailure(SslHandler.java:1415)
        at io.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1406)
        at io.netty.handler.ssl.SslHandler.setHandshakeFailure(SslHandler.java:1380)
        at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1088)
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        at java.lang.Thread.run(Thread.java:745)
jchambers commented 7 years ago

This sounds a little like #425, but more conveniently, it also it sounds a LOT like this problem for which a fix was released about an hour ago:

These releases update to netty-tcnative 2.0.3.Final which fixes a small native memory leak which happened when you destroy an OpenSSLContext. So if you create a lot and destroy a lot of these its important to update.

I'll put out a new version of Pushy shortly, but in the meantime, I suspect that using the latest version of Netty/netty-tcnative should solve your problem.

flozano commented 7 years ago

OK, thanks for the info! we're going to rebuild/redeploy with the newer netty and see how it goes.

flozano commented 7 years ago

with netty 4.1.12 and tcnative 2.0.3, we reproduced:

io.netty.util.ResourceLeakDetector:? LEAK: ReferenceCountedOpenSslContext.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records: 0
Created at:
    io.netty.util.ResourceLeakDetector.track(ResourceLeakDetector.java:237)
    io.netty.handler.ssl.ReferenceCountedOpenSslContext.<init>(ReferenceCountedOpenSslContext.java:239)
    io.netty.handler.ssl.ReferenceCountedOpenSslContext.<init>(ReferenceCountedOpenSslContext.java:219)
    io.netty.handler.ssl.ReferenceCountedOpenSslClientContext.<init>(ReferenceCountedOpenSslClientContext.java:59)
    io.netty.handler.ssl.SslContext.newClientContextInternal(SslContext.java:781)
    io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:446)
    com.turo.pushy.apns.ApnsClientBuilder.build(ApnsClientBuilder.java:417)
jchambers commented 7 years ago

To be clear, is this happening as soon as you build a client?

flozano commented 7 years ago

I'm not sure, as in the same thread and in almost same time, we shutdown the 'old' client and create a new one. This is an edited 'log' from the component that shows the leak alert:

[2017-06-12T06:36:06.429Z] [INFO] [EventProcessing-8-push] com.mycompany.MyAPNSClientCode:? Sending push message to APNS server
[2017-06-12T06:36:06.430Z] [INFO] [EventProcessing-8-push] com.mycompany.MyAPNSClientCode:? Certificate changed (development=true, oldVersion=26176, newVersion=26186)
[2017-06-12T06:36:06.430Z] [INFO] [EventProcessing-8-push] com.mycompany.MyAPNSClientCode:? Releasing APNS client (development=true, version=26176, hadProblems=false)
[2017-06-12T06:36:06.430Z] [INFO] [EventProcessing-8-push] com.mycompany.MyAPNSClientCode:? Shutting down APNS client (development=true)
[2017-06-12T06:36:06.430Z] [INFO] [EventProcessing-8-push] com.turo.pushy.apns.ApnsClient:? Disconnecting.
[2017-06-12T06:36:06.430Z] [INFO] [EventProcessing-8-push] com.mycompany.MyAPNSClientCode:? Initializing APNS client (development=true)
[2017-06-12T06:36:06.432Z] [INFO] [nioEventLoopGroup-2-4] com.mycompany.MyAPNSClientCode:? ApnsClient shutdown properly (development=true)
[2017-06-12T06:36:06.437Z] [INFO] [EventProcessing-8-push] com.turo.pushy.apns.SslUtil:? Native SSL provider is available and supports ALPN; will use native provider.
[2017-06-12T06:36:06.437Z] [ERROR] [EventProcessing-8-push] io.netty.util.ResourceLeakDetector:? LEAK: ReferenceCountedOpenSslContext.release() was not called before it's garbage-collected. Enable advanced leak reporting to find out where the leak occurred. To enable advanced leak reporting, specify the JVM option '-Dio.netty.leakDetection.level=advanced' or call ResourceLeakDetector.setLevel() See http://netty.io/wiki/reference-counted-objects.html for more information.

(context: the certificate configuration is updated, so we close the old client and create a new one).

jchambers commented 7 years ago

@flozano I'm struggling to reproduce this in production, and have not been able to find a minimal reproduction case. I wrote a simple test app:

public static void main(final String[] args) throws Exception {
    final ApnsClient client = new ApnsClientBuilder()
            .setSigningKey(ApnsSigningKey.loadFromPkcs8File(new File("REDACTED"), "REDACTED", "REDACTED"))
            .build();

    client.connect(ApnsClient.DEVELOPMENT_APNS_HOST).await();

    Thread.sleep(5000);

    client.disconnect().await();
}

…but that doesn't turn anything up even with paranoid leak detection. How long are clients in service before you rotate them out? Does this happen every time you "retire" a client?

I emphasize that I absolutely believe this is a Real Thing™, but am trying to figure out what you're doing that I'm not. Thanks for your patience!

EDIT: To be clear, the most likely thing you're doing that I'm not doing in this test app is sending a bazillion notifications.

jchambers commented 7 years ago

Oh boy! Just kidding! I cranked the log output on unit tests and applied paranoid leak detection there and now have a leak detection volcano. I think it's safe to say I can reproduce the problem.

jchambers commented 7 years ago

I think the problem here is very simple: I just misunderstood how/when reference-counted SSL contexts need to be released. This will require more thought during daylight hours, but I'm confident we can fix the problem quickly.

dozer47528 commented 6 years ago

@jorcasso still have this issue in v0.11.3

LEAK: ReferenceCountedOpenSslContext.release() was not called before it's garbage-collected.

    at io.netty.util.internal.logging.Slf4JLogger.error(Slf4JLogger.java:176)
    at io.netty.util.ResourceLeakDetector.reportUntracedLeak(ResourceLeakDetector.java:318)
    at io.netty.util.ResourceLeakDetector.reportLeak(ResourceLeakDetector.java:294)
    at io.netty.util.ResourceLeakDetector.track0(ResourceLeakDetector.java:253)
    at io.netty.util.ResourceLeakDetector.track(ResourceLeakDetector.java:242)
    at io.netty.handler.ssl.ReferenceCountedOpenSslContext.<init>(ReferenceCountedOpenSslContext.java:227)
    at io.netty.handler.ssl.ReferenceCountedOpenSslContext.<init>(ReferenceCountedOpenSslContext.java:207)
    at io.netty.handler.ssl.ReferenceCountedOpenSslClientContext.<init>(ReferenceCountedOpenSslClientContext.java:59)