jchambers / pushy

A Java library for sending APNs (iOS/macOS/Safari) push notifications
https://pushy-apns.org/
MIT License
1.8k stars 451 forks source link

HTTP/2 support #184

Closed matzew closed 8 years ago

matzew commented 9 years ago

Apple will have HTTP/2 API to send notification requests to APNs.

YAY! :tada:

Here is a little summary of more details:

The new HTTP/2 API for APNs will be available in “Summer 2015″ for the development environment and will be made available for production “later this year”. No exact dates were given.

Details: https://developer.apple.com/videos/wwdc/2015/?id=720

omarkilani commented 8 years ago

The '2' error code corresponds to this:

INTERNAL_ERROR (0x2): The endpoint encountered an unexpected internal error.

omarkilani commented 8 years ago

I do find it a little strange that netty doesn't give you some sort of 'black box' http2 API where you just send POST requests or whatever and it handles pooling/GOAWAY/reconnection/etc by itself.

omarkilani commented 8 years ago

I do wonder if it's worth looking into https://github.com/square/okhttp instead of spending the time reinventing the wheel with netty... but maybe there's not that much work involved with pooling etc.

jchambers commented 8 years ago

My read of those logs is that we're actually generating that on our end and we're sending the GOAWAY to the server, not the other way around. Will investigate more carefully soon.

I do wonder if it's worth looking into https://github.com/square/okhttp

I think all approaches have tradeoffs. It seems like a good backup option if we just can't work out what's going wrong here, but I don't think it's that bad yet.

jchambers commented 8 years ago

And the messages that are attempted when the connection isn't connected are just dropped.

Can you elaborate? Does the Future report success, failure, or does it just never complete?

omarkilani commented 8 years ago

Sorry, what I meant was:

DEBUG [2016-01-06 21:37:25,503] [default-akka.actor.default-dispatcher-5] com.relayrides.pushy.apns.ApnsClient - Failed to send push notification because client is not connected: SimpleApnsPushNotification [token=X, payload={"aps":{"content-available":1},"rtm":{"ts":"1452116244000"}}, invalidationTime=Thu Jan 07 21:37:25 UTC 2016, priority=IMMEDIATE, topic=com.rememberthemilk.mobilertm]

That notification is not resent by pushy as it's in a reconnection phase. I need to code around that stuff myself (vs old pushy handling it).

omarkilani commented 8 years ago

I'm running again with more debug logging. :)

omarkilani commented 8 years ago

Here we go:

java.lang.NullPointerException
    at com.relayrides.pushy.apns.ApnsClient.handlePushNotificationResponse(ApnsClient.java:427)
    at com.relayrides.pushy.apns.ApnsClientHandler$ApnsClientHandlerFrameAdapter.onHeadersRead(ApnsClientHandler.java:117)
    at com.relayrides.pushy.apns.ApnsClientHandler$ApnsClientHandlerFrameAdapter.onHeadersRead(ApnsClientHandler.java:101)
    at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:316)
    at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:265)
    at io.netty.handler.codec.http2.DefaultHttp2FrameReader$2.processFragment(DefaultHttp2FrameReader.java:466)
    at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:474)
    at io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:243)
    at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:147)
    at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:102)
    at io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:515)
    at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:575)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:360)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
    at io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:83)
    at io.netty.channel.DefaultChannelHandlerInvoker.invokeChannelRead(DefaultChannelHandlerInvoker.java:163)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:155)
    at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1110)
    at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:963)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:360)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
    at io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:83)
    at io.netty.channel.DefaultChannelHandlerInvoker.invokeChannelRead(DefaultChannelHandlerInvoker.java:163)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:155)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:950)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:125)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:510)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:467)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:381)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:353)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:742)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
    at java.lang.Thread.run(Thread.java:745)

I've added more logging. Will report soon. :tada:

omarkilani commented 8 years ago

So this is pretty strange but...

I added this line to ApnsClient.java:

     protected void handlePushNotificationResponse(final PushNotificationResponse<T> response) {
         log.debug("Received response from APNs gateway: {}", response);
+        log.trace("handlePushNotificationResponse: responsePromises = {}", this.responsePromises);
         this.responsePromises.remove(response.getPushNotification()).setSuccess(response);
     }

And now I'm well into stream id 30,000 with no issues.

WTF?

omarkilani commented 8 years ago

Ok, so the issue is that the response promise doesn't exist at the point handlePushNotificationResponse is called:

TRACE [2016-01-07 02:28:36,951] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClientHandler - Received headers from APNs gateway on stream 42065: DefaultHttp2Headers[:status: 200, apns-id: B9BE075B-DEFE-9AD7-3A42-3A802AB2B07C]
TRACE [2016-01-07 02:28:36,951] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClientHandler - pushNotification for stream 42065: SimpleApnsPushNotification [token=X1, payload={"aps":{"content-available":1},"rtm":{"ts":"1452133715894"}}, invalidationTime=Fri Jan 08 02:28:36 UTC 2016, priority=IMMEDIATE, topic=com.rememberthemilk.mobilertm]
DEBUG [2016-01-07 02:28:36,951] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Received response from APNs gateway: SimplePushNotificationResponse [pushNotification=SimpleApnsPushNotification [token=X1, payload={"aps":{"content-available":1},"rtm":{"ts":"1452133715894"}}, invalidationTime=Fri Jan 08 02:28:36 UTC 2016, priority=IMMEDIATE, topic=com.rememberthemilk.mobilertm], success=true, rejectionReason=null, tokenExpirationTimestamp=null]
TRACE [2016-01-07 02:28:36,952] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - handlePushNotificationResponse: responsePromises = {SimpleApnsPushNotification [token=X2, payload={"aps":{"content-available":1},"rtm":{"ts":"1452133715926"}}, invalidationTime=Fri Jan 08 02:28:36 UTC 2016, priority=IMMEDIATE, topic=com.rememberthemilk.mobilertm]=DefaultPromise@53ac138b(incomplete), SimpleApnsPushNotification [token=X3, payload={"aps":{"content-available":1},"rtm":{"ts":"1452133715926"}}, invalidationTime=Fri Jan 08 02:28:36 UTC 2016, priority=IMMEDIATE, topic=com.rememberthemilk.mobilertm]=DefaultPromise@30e9c16e(incomplete), SimpleApnsPushNotification [token=X4, payload={"aps":{"content-available":1},"rtm":{"ts":"1452133715894"}}, invalidationTime=Fri Jan 08 02:28:36 UTC 2016, priority=IMMEDIATE, topic=com.rememberthemilk.mobilertm]=DefaultPromise@4c1e7977(incomplete)}
DEBUG [2016-01-07 02:28:36,961] [nioEventLoopGroup-2-1] io.netty.handler.codec.http2.Http2ConnectionHandler - Sent GOAWAY: lastStreamId '0', errorCode '2', debugData ''. Forcing shutdown of the connection.
DEBUG [2016-01-07 02:28:36,966] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Disconnected. Next automatic reconnection attempt in 1 seconds.
TRACE [2016-01-07 02:28:36,969] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClientHandler - onError received with cause: {}
java.lang.NullPointerException
        at com.relayrides.pushy.apns.ApnsClient.handlePushNotificationResponse(ApnsClient.java:428)
        at com.relayrides.pushy.apns.ApnsClientHandler$ApnsClientHandlerFrameAdapter.onHeadersRead(ApnsClientHandler.java:118)
        at com.relayrides.pushy.apns.ApnsClientHandler$ApnsClientHandlerFrameAdapter.onHeadersRead(ApnsClientHandler.java:101)

A notification for token X1 is not present in responsePromises so setSuccess is being called on null.

I'll try to figure out why.

omarkilani commented 8 years ago

So doing this:

--- a/src/main/java/com/relayrides/pushy/apns/ApnsClient.java
+++ b/src/main/java/com/relayrides/pushy/apns/ApnsClient.java
@@ -424,7 +424,12 @@ public class ApnsClient<T extends ApnsPushNotification> {

     protected void handlePushNotificationResponse(final PushNotificationResponse<T> response) {
         log.debug("Received response from APNs gateway: {}", response);
-        this.responsePromises.remove(response.getPushNotification()).setSuccess(response);
+        Promise<PushNotificationResponse<T>> promise = this.responsePromises.remove(response.getPushNotification());
+        if (promise != null) {
+          promise.setSuccess(response);
+        } else {
+          log.trace("Would have tried to setSuccess on null promise");
+        }
     }

Does this:

TRACE [2016-01-07 04:15:25,894] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,896] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,897] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,898] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,899] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,992] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,993] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,994] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,996] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:25,996] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:26,159] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:26,165] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:26,178] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:26,193] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise
TRACE [2016-01-07 04:15:26,735] [nioEventLoopGroup-2-1] com.relayrides.pushy.apns.ApnsClient - Would have tried to setSuccess on null promise

A fair bit. I don't really understand why yet, though.

jchambers commented 8 years ago

WTF?

Ah—a Heisenbug! ;)

I'll try to figure out why.

Smells like a race condition to me, though I'm having a hard time cooking up a scenario where that would happen. I'll have to pull out That Scary Chapter About Memory Consistency Effects, but the scenario I can come up with so far is:

  1. We add the promise to the map from Thread A.
  2. We writeAndFlush the notification from Thread A, which just puts the write operation in the event loop's to-do list.
  3. Thread B from the event loop grabs the notification and writes it to the wire.
  4. The APNs server responds really really quickly.
  5. Thread B gets the reply from the server and goes to notify the promise, but due to memory consistency voodoo, the change isn't visible to Thread B.

If that's true, we should be able to solve the problem just by wrapping the body of the if block in sendNotification with:

connectionReadyPromise.channel().eventLoop().submit(new Runnable() {
    // Create the promise, add it to the map, and write the notification.
});

If that doesn't solve the problem, I'll have to ponder some more.

jchambers commented 8 years ago

I guess I can think of a few other scenarios, too.

  1. We THINK we failed to write the push notification and remove the promise from the map, but it actually went through and the APNs gateway sends us a response. We should be able to spot that in the logs.
  2. The gateway is somehow double-responding to push notifications. We ought to be able to spot that in the logs, too.
  3. The gateway is responding appropriately, but we're making a mistake and pulling the handlePushNotificationResponse trigger too early (or twice). I'll look into this a bit more.
jchambers commented 8 years ago

Or:

3a. The gateway is somehow sending multiple end-of-stream packets on the same stream. We could learn about that by (figuratively) reverting a5f240692868b87f9d3a7baf271b669e7e8b8aee and adding a frame logger back into the mix.

jchambers commented 8 years ago

Oh. Duh. We're modifying a not-thread-safe map from two different threads. Of COURSE weird things are going to happen. We're probably trying to read while the map is redistributing keys or something. Making sure everything happens inside the same event loop should do the job. I'll do that first thing tomorrow morning.

jmason commented 8 years ago

fyi, we're using OkHttp in production to talk HTTP/2 to google's servers and it's working very nicely, but I'd love to see Netty's HTTP/2 support up to speed too. One caveat is that OkHttp is threaded, not async.

jchambers commented 8 years ago

I'll [fix the threading issue] first thing tomorrow morning.

Done in 1f9db44.

omarkilani commented 8 years ago

@jchambers I hacked my own workaround last night and we're up to 650k requests without issue.

@jmason Not much real advantage to async vs threads on a single connection client... :) Even if you ran a thread per CPU, you're probably looking at like 32 threads with OkHttp. Still nice to see netty etc.

jchambers commented 8 years ago

I hacked my own workaround last night and we're up to 650k requests without issue.

Can you elaborate? Is it substantially similar to 1f9db44 (i.e. making sure that the map only gets touched by one thread)?

Now that we're not dropping connections all the time, are you more comfortable without connection pools and retry queues?

omarkilani commented 8 years ago

Yeah it's a similar thing (I switched to a concurrent structure instead, but whatever).

Re pools and retry: I'm not really sure. It's your project so I don't know what the goal is -- is it a low level APNS implementation over netty, or is it a more full featured 'drop in' thing where you call 'sendNotification' and magic happens. :)

I'm also experimenting with an OkHttp implementation -- the entire thing is like 20 lines of Scala. I'm cheating by using SimpleApnsPushNotifcation and ApnsPayloadBuilder. The nice thing about it is that it doesn't require ALPN/netty-tcnative and/or bootpath hacks. It's also sort of a black box which takes care of pooling and retries. It also seems to "load balance" somewhat over multiple connections:

tcp        0      0 ::ffff:108.168.199.X:54063 ::ffff:17.110.226.29:443    ESTABLISHED 26130/java          
tcp        0      0 ::ffff:108.168.199.X:54064 ::ffff:17.110.226.29:443    ESTABLISHED 26130/java          
tcp        0      0 ::ffff:108.168.199.X:54065 ::ffff:17.110.226.29:443    ESTABLISHED 26130/java          
tcp        0      0 ::ffff:108.168.199.X:54072 ::ffff:17.110.226.29:443    ESTABLISHED 26130/java          
tcp        0      0 ::ffff:108.168.199.X:35725 ::ffff:17.143.164.139:443   ESTABLISHED 26130/java          
tcp        0      0 ::ffff:108.168.199.X:43368 ::ffff:17.172.234.15:443    ESTABLISHED 26130/java          
tcp        0      0 ::ffff:108.168.199.X:54062 ::ffff:17.110.226.29:443    ESTABLISHED 26130/java       

OTOH it's using 8% of a core and I'm not entirely clear on why yet. The netty version (albeit with a single connection) seems to use less CPU.

jchambers commented 8 years ago

…is it a low level APNS implementation over netty, or is it a more full featured 'drop in' thing where you call 'sendNotification' and magic happens.

I think we're looking for "magic enough" ;)

For now, the idea is to get out something that's at least as reliable as what we had before, and I think we're there. We'll learn a lot after the initial release, and will add features to make things more robust as necessary. I'd rather not hold up the immediate parade by adding more complex error-proofing, but I'm very open to adding it later if there's a demonstrated need.

it doesn't require ALPN/netty-tcnative and/or bootpath hacks

Huh… that wasn't my impression when I read about it earlier. I'll have to dig deeper to learn how they're pulling that off.

EDIT: OkHttp defiintely depends on alpn-boot being on the bootclasspath: https://github.com/square/okhttp/blob/master/pom.xml#L290. Is it possible that you just happened to have it there already because you're working with Pushy?

jmason commented 8 years ago

Yeah, I can confirm that OkHttp still requires alpn-boot and all that (unfortunately).

stIncMale commented 8 years ago

Hi. I'm sorry for disturbance :) Could I ask when one can expect support of the new APNS HTTP/2 API in master branch and in a more or less production-ready state?

jchambers commented 8 years ago

213 has been merged and I'm preparing a release to Maven Central now. Closing this issue.