Closed matzew closed 8 years ago
The '2' error code corresponds to this:
INTERNAL_ERROR (0x2): The endpoint encountered an unexpected internal error.
I do find it a little strange that netty doesn't give you some sort of 'black box' http2 API where you just send POST requests or whatever and it handles pooling/GOAWAY/reconnection/etc by itself.
I do wonder if it's worth looking into https://github.com/square/okhttp instead of spending the time reinventing the wheel with netty... but maybe there's not that much work involved with pooling etc.
My read of those logs is that we're actually generating that on our end and we're sending the GOAWAY
to the server, not the other way around. Will investigate more carefully soon.
I do wonder if it's worth looking into https://github.com/square/okhttp
I think all approaches have tradeoffs. It seems like a good backup option if we just can't work out what's going wrong here, but I don't think it's that bad yet.
And the messages that are attempted when the connection isn't connected are just dropped.
Can you elaborate? Does the Future
report success, failure, or does it just never complete?
Sorry, what I meant was:
DEBUG [2016-01-06 21:37:25,503] [default-akka.actor.default-dispatcher-5] com.relayrides.pushy.apns.ApnsClient - Failed to send push notification because client is not connected: SimpleApnsPushNotification [token=X, payload={"aps":{"content-available":1},"rtm":{"ts":"1452116244000"}}, invalidationTime=Thu Jan 07 21:37:25 UTC 2016, priority=IMMEDIATE, topic=com.rememberthemilk.mobilertm]
That notification is not resent by pushy as it's in a reconnection phase. I need to code around that stuff myself (vs old pushy handling it).
I'm running again with more debug logging. :)
Here we go:
java.lang.NullPointerException
at com.relayrides.pushy.apns.ApnsClient.handlePushNotificationResponse(ApnsClient.java:427)
at com.relayrides.pushy.apns.ApnsClientHandler$ApnsClientHandlerFrameAdapter.onHeadersRead(ApnsClientHandler.java:117)
at com.relayrides.pushy.apns.ApnsClientHandler$ApnsClientHandlerFrameAdapter.onHeadersRead(ApnsClientHandler.java:101)
at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:316)
at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:265)
at io.netty.handler.codec.http2.DefaultHttp2FrameReader$2.processFragment(DefaultHttp2FrameReader.java:466)
at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:474)
at io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:243)
at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:147)
at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:102)
at io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:515)
at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:575)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:360)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:83)
at io.netty.channel.DefaultChannelHandlerInvoker.invokeChannelRead(DefaultChannelHandlerInvoker.java:163)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:155)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1110)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:963)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:360)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:83)
at io.netty.channel.DefaultChannelHandlerInvoker.invokeChannelRead(DefaultChannelHandlerInvoker.java:163)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:155)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:950)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:125)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:510)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:467)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:381)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:353)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:742)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
I've added more logging. Will report soon. :tada:
So this is pretty strange but...
I added this line to ApnsClient.java:
protected void handlePushNotificationResponse(final PushNotificationResponse<T> response) {
log.debug("Received response from APNs gateway: {}", response);
+ log.trace("handlePushNotificationResponse: responsePromises = {}", this.responsePromises);
this.responsePromises.remove(response.getPushNotification()).setSuccess(response);
}
And now I'm well into stream id 30,000 with no issues.
WTF?
Ok, so the issue is that the response promise doesn't exist at the point handlePushNotificationResponse is called:
TRACE [2016-01-07 02:28:36,951] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClientHandler - Received headers from APNs gateway on stream 42065: DefaultHttp2Headers[:status: 200, apns-id: B9BE075B-DEFE-9AD7-3A42-3A802AB2B07C]
TRACE [2016-01-07 02:28:36,951] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClientHandler - pushNotification for stream 42065: SimpleApnsPushNotification [token=X1, payload={"aps":{"content-available":1},"rtm":{"ts":"1452133715894"}}, invalidationTime=Fri Jan 08 02:28:36 UTC 2016, priority=IMMEDIATE, topic=com.rememberthemilk.mobilertm]
DEBUG [2016-01-07 02:28:36,951] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Received response from APNs gateway: SimplePushNotificationResponse [pushNotification=SimpleApnsPushNotification [token=X1, payload={"aps":{"content-available":1},"rtm":{"ts":"1452133715894"}}, invalidationTime=Fri Jan 08 02:28:36 UTC 2016, priority=IMMEDIATE, topic=com.rememberthemilk.mobilertm], success=true, rejectionReason=null, tokenExpirationTimestamp=null]
TRACE [2016-01-07 02:28:36,952] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - handlePushNotificationResponse: responsePromises = {SimpleApnsPushNotification [token=X2, payload={"aps":{"content-available":1},"rtm":{"ts":"1452133715926"}}, invalidationTime=Fri Jan 08 02:28:36 UTC 2016, priority=IMMEDIATE, topic=com.rememberthemilk.mobilertm]=DefaultPromise@53ac138b(incomplete), SimpleApnsPushNotification [token=X3, payload={"aps":{"content-available":1},"rtm":{"ts":"1452133715926"}}, invalidationTime=Fri Jan 08 02:28:36 UTC 2016, priority=IMMEDIATE, topic=com.rememberthemilk.mobilertm]=DefaultPromise@30e9c16e(incomplete), SimpleApnsPushNotification [token=X4, payload={"aps":{"content-available":1},"rtm":{"ts":"1452133715894"}}, invalidationTime=Fri Jan 08 02:28:36 UTC 2016, priority=IMMEDIATE, topic=com.rememberthemilk.mobilertm]=DefaultPromise@4c1e7977(incomplete)}
DEBUG [2016-01-07 02:28:36,961] [nioEventLoopGroup-2-1] io.netty.handler.codec.http2.Http2ConnectionHandler - Sent GOAWAY: lastStreamId '0', errorCode '2', debugData ''. Forcing shutdown of the connection.
DEBUG [2016-01-07 02:28:36,966] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Disconnected. Next automatic reconnection attempt in 1 seconds.
TRACE [2016-01-07 02:28:36,969] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClientHandler - onError received with cause: {}
java.lang.NullPointerException
at com.relayrides.pushy.apns.ApnsClient.handlePushNotificationResponse(ApnsClient.java:428)
at com.relayrides.pushy.apns.ApnsClientHandler$ApnsClientHandlerFrameAdapter.onHeadersRead(ApnsClientHandler.java:118)
at com.relayrides.pushy.apns.ApnsClientHandler$ApnsClientHandlerFrameAdapter.onHeadersRead(ApnsClientHandler.java:101)
A notification for token X1 is not present in responsePromises so setSuccess is being called on null.
I'll try to figure out why.
So doing this:
--- a/src/main/java/com/relayrides/pushy/apns/ApnsClient.java
+++ b/src/main/java/com/relayrides/pushy/apns/ApnsClient.java
@@ -424,7 +424,12 @@ public class ApnsClient<T extends ApnsPushNotification> {
protected void handlePushNotificationResponse(final PushNotificationResponse<T> response) {
log.debug("Received response from APNs gateway: {}", response);
- this.responsePromises.remove(response.getPushNotification()).setSuccess(response);
+ Promise<PushNotificationResponse<T>> promise = this.responsePromises.remove(response.getPushNotification());
+ if (promise != null) {
+ promise.setSuccess(response);
+ } else {
+ log.trace("Would have tried to setSuccess on null promise");
+ }
}
Does this:
TRACE [2016-01-07 04:15:25,894] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,896] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,897] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,898] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,899] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,992] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,993] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,994] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,996] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,996] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:26,159] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:26,165] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:26,178] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:26,193] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:26,735] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
A fair bit. I don't really understand why yet, though.
WTF?
Ah—a Heisenbug! ;)
I'll try to figure out why.
Smells like a race condition to me, though I'm having a hard time cooking up a scenario where that would happen. I'll have to pull out That Scary Chapter About Memory Consistency Effects, but the scenario I can come up with so far is:
writeAndFlush
the notification from Thread A, which just puts the write operation in the event loop's to-do list.If that's true, we should be able to solve the problem just by wrapping the body of the if
block in sendNotification
with:
connectionReadyPromise.channel().eventLoop().submit(new Runnable() {
// Create the promise, add it to the map, and write the notification.
});
If that doesn't solve the problem, I'll have to ponder some more.
I guess I can think of a few other scenarios, too.
handlePushNotificationResponse
trigger too early (or twice). I'll look into this a bit more.Or:
3a. The gateway is somehow sending multiple end-of-stream packets on the same stream. We could learn about that by (figuratively) reverting a5f240692868b87f9d3a7baf271b669e7e8b8aee and adding a frame logger back into the mix.
Oh. Duh. We're modifying a not-thread-safe map from two different threads. Of COURSE weird things are going to happen. We're probably trying to read while the map is redistributing keys or something. Making sure everything happens inside the same event loop should do the job. I'll do that first thing tomorrow morning.
fyi, we're using OkHttp in production to talk HTTP/2 to google's servers and it's working very nicely, but I'd love to see Netty's HTTP/2 support up to speed too. One caveat is that OkHttp is threaded, not async.
I'll [fix the threading issue] first thing tomorrow morning.
Done in 1f9db44.
@jchambers I hacked my own workaround last night and we're up to 650k requests without issue.
@jmason Not much real advantage to async vs threads on a single connection client... :) Even if you ran a thread per CPU, you're probably looking at like 32 threads with OkHttp. Still nice to see netty etc.
I hacked my own workaround last night and we're up to 650k requests without issue.
Can you elaborate? Is it substantially similar to 1f9db44 (i.e. making sure that the map only gets touched by one thread)?
Now that we're not dropping connections all the time, are you more comfortable without connection pools and retry queues?
Yeah it's a similar thing (I switched to a concurrent structure instead, but whatever).
Re pools and retry: I'm not really sure. It's your project so I don't know what the goal is -- is it a low level APNS implementation over netty, or is it a more full featured 'drop in' thing where you call 'sendNotification' and magic happens. :)
I'm also experimenting with an OkHttp implementation -- the entire thing is like 20 lines of Scala. I'm cheating by using SimpleApnsPushNotifcation and ApnsPayloadBuilder. The nice thing about it is that it doesn't require ALPN/netty-tcnative and/or bootpath hacks. It's also sort of a black box which takes care of pooling and retries. It also seems to "load balance" somewhat over multiple connections:
tcp 0 0 ::ffff:108.168.199.X:54063 ::ffff:17.110.226.29:443 ESTABLISHED 26130/java
tcp 0 0 ::ffff:108.168.199.X:54064 ::ffff:17.110.226.29:443 ESTABLISHED 26130/java
tcp 0 0 ::ffff:108.168.199.X:54065 ::ffff:17.110.226.29:443 ESTABLISHED 26130/java
tcp 0 0 ::ffff:108.168.199.X:54072 ::ffff:17.110.226.29:443 ESTABLISHED 26130/java
tcp 0 0 ::ffff:108.168.199.X:35725 ::ffff:17.143.164.139:443 ESTABLISHED 26130/java
tcp 0 0 ::ffff:108.168.199.X:43368 ::ffff:17.172.234.15:443 ESTABLISHED 26130/java
tcp 0 0 ::ffff:108.168.199.X:54062 ::ffff:17.110.226.29:443 ESTABLISHED 26130/java
OTOH it's using 8% of a core and I'm not entirely clear on why yet. The netty version (albeit with a single connection) seems to use less CPU.
…is it a low level APNS implementation over netty, or is it a more full featured 'drop in' thing where you call 'sendNotification' and magic happens.
I think we're looking for "magic enough" ;)
For now, the idea is to get out something that's at least as reliable as what we had before, and I think we're there. We'll learn a lot after the initial release, and will add features to make things more robust as necessary. I'd rather not hold up the immediate parade by adding more complex error-proofing, but I'm very open to adding it later if there's a demonstrated need.
it doesn't require ALPN/netty-tcnative and/or bootpath hacks
Huh… that wasn't my impression when I read about it earlier. I'll have to dig deeper to learn how they're pulling that off.
EDIT: OkHttp defiintely depends on alpn-boot
being on the bootclasspath: https://github.com/square/okhttp/blob/master/pom.xml#L290. Is it possible that you just happened to have it there already because you're working with Pushy?
Yeah, I can confirm that OkHttp still requires alpn-boot and all that (unfortunately).
Hi. I'm sorry for disturbance :) Could I ask when one can expect support of the new APNS HTTP/2 API in master branch and in a more or less production-ready state?
Apple will have HTTP/2 API to send notification requests to APNs.
YAY! :tada:
Here is a little summary of more details:
The new HTTP/2 API for APNs will be available in “Summer 2015″ for the development environment and will be made available for production “later this year”. No exact dates were given.
Details: https://developer.apple.com/videos/wwdc/2015/?id=720