Notifications sent via persistent connection don't arrive

joonty commented 9 years ago

I struggled with this issue for a long time, and finally discovered what was happening. I'm using version 2.2.1 of the gem.

I found that I could send notifications via the command line apn tool, but it was failing to send when integrating Houston into my code. After trying all sorts of different combinations I cut the code down until it was as simple as this:

apn = Houston::Client.development
apn.certificate = File.read("/path/to/apple_push_notification.pem")
token = "<ce8be627 2e43e855 16033e24 b4c28922 0eeda487 9c477160 b2545e95 b68b5969>"

notification = Houston::Notification.new(device: token)
notification.alert = "Hello, World!"

apn.push(notification)

I.e., just set a message and send. The notification would never arrive, and there was no error in notification.error. As a last attempt, I tried the Silent Notifications code in the README, meaning that I created a Notification with Houston::Notification.new(token: token, sound: ''), and it finally sent!

~~It seems like there's an issue with the default sound option. Maybe setting it to blank is the safest default?~~

Edit

Actually, I discovered in the end that it was down to invalid tokens in the database. I had registered some tokens from the iOS simulator, and in my Houston code (using the persistent connection format) was looping through all the devices sending out notifications, e.g.:

Device.each do |device|
  notification = Houston::Notification.new(device: device.token)
  notification.alert = "Hello, world"
  connection.write(notification.message)
end

It seems that all notifications sent after the one to the simulator would never arrive. After deleting the offending tokens from the database, the notifications would go through.

Is this an issue with the APN service? It seems like a bit of an issue though, as invalid tokens bring the whole thing to a halt. I don't know if you have any thoughts on this?

Thanks

cognitiveflux commented 9 years ago

That is how APNS works, from the APNS Documentation:

If you send a notification that is malformed or otherwise unintelligible, APNs returns an error-response packet and closes the connection. Any notifications that you sent after the malformed notification using the same connection are discarded, and must be resent.

In other words, as soon as an invalid frame is received, the connection is closed on Apple's systems. Unfortunately, connection.open? still returns true in Houston.

It works with the CLI because the connection is opened and closed, but in the case of persistent connections, the connection is broken. Retrying to send the same notification without re-opening the connection should yeild Errno::EPIPE from the SSL library.

As a result, if you receive an error response for a persistent connection, you must explicitly close and open the connection again. However, per #73, you may not get the error message immediately.

joonty commented 9 years ago

Ok, thanks, that makes sense.

However, I'm not really sure where to go from here. Reopening the connection is no trouble, but if an error message isn't received immediately, how can I know when to reopen the connection? All the push notifications sent after the one to the invalid device token apparently send without an error (no Errno::EPIPE error raised). If the error can't be retrieved immediately, is there some way of using the library to receive errors asynchronously? Not really sure how to design my code to accommodate for this.

Thanks

cognitiveflux commented 9 years ago

You can read and write from a socket simultaneously, so you can start a new (blocking read) thread that is always listening for errors while writing on a different thread (using the same socket); that way you don't have to write, check for errors, repeat (as in the example), you just keep writing until the reading thread receives an error. Since there can be a considerable delay between notifications that are sent and when an error is returned (up to 600ms!), the only way I've been able to address it is with queues that keep track of the last n messages sent so that if an error is received, the id parameter for the message with the error can be used to resend all messages in the queue after id. This is very similar and better explained here, although they don't mention the use of two separate IO threads (one for write, one for read), which eliminates any write,wait,read based approach, but I think it's implied in their article.

joonty commented 9 years ago

Hi,

I finally had a chance to try and implement your suggestion @cognitiveflux, but didn't have any luck receiving anything at all from the Apple feedback service. Here's what I did for testing purposes:

threads = []
message = "This is a test message"
certificate = File.read(certificate_path)
passphrase = nil

devices = [
  "<valid token 1>",
  "<invalid token>",
  "<valid token 2>",
]

threads << Thread.new do
  connection = Houston::Connection.new(HOUSTON_URL, certificate, passphrase)
  connection.open

  devices.each do |device_token|
    notification = Houston::Notification.new(device: device_token)
    notification.alert = message
    notification.sound = "sosumi.aiff"
    notification.category = "INVITE_CATEGORY"
    notification.content_available = true

    puts "Sending to device: #{device_token}"
    p connection.open?
    connection.write(notification.message)
    if notification.error
      puts "Error for device #{device_token}"
    end
  end
end

threads << Thread.new do
  Houston::Connection.open("apn://feedback.push.apple.com:2196", certificate, passphrase) do |connection|
    loop do
      while line = connection.read(38)
        p line      # Show me something, anything!!
      end
    end
  end
end

threads.each(&:join)

The first push notification would send correctly, the second one (to the invalid token) would silently fail, and the third, even though it's a valid token, would never be sent. You were right about connection.open? always returning true. If I removed the invalid token from the queue then both notifications would send.

However, the error reporting thread didn't receive anything from the feedback service - nothing is read from the socket, and nothing is printed out. Incidentally, sometimes (randomly) an Errno::EPIPE would be thrown, when Apple decided that they'd had enough. But it was rare enough that it didn't bother me as much as invalid tokens bringing down the whole queue!

I'm really hoping I'm doing something wrong here! I also tried using the unregistered_devices method in the Houston library, but nothing is received there either.

Any ideas? Thanks again.

lukasnagl commented 9 years ago

@joonty You could check out the suggestion I made over at https://github.com/nomad/houston/issues/85#issuecomment-71176096. It’s not as sophisticated as the solution in the blog post @cognitiveflux mentioned, but is based on what the houston client does as well as the assumption that invalid tokens don’t get encountered significantly often. It is able to catch and handle invalid tokens, though, and can be improved from there.

cognitiveflux commented 9 years ago

@joonty The issue is that you're using different connections on different threads, you need to use the same connection; two threads can simultaneously use the same connection as long one is strictly writing while the other is strictly reading. Errors are only sent back on the same connection that the invalid notification was sent over, the feedback service is not for realtime error reporting. You also don't need a separate thread for writing, in which case you do not want to join the reading threading.

High level example:

class ApnsRunner

 def initialize(...)
  @error_encountered = false
  @connection = Houston::Connection.new(HOUSTON_URL, certificate, passphrase)
  @connection.open
 end

 def start
  ... # make sure connection is open
  listen_for_errors
  send_notifications
 end

 def stop
  @thread.exit if @thread
  @connection.close
 end

 def listen_for_errors
  unless @thread && @thread.alive?
    @thread = Thread.new {
      read_socket = IO.select([@connection.ssl]) # Use the SAME CONNECTION
      if error = @connection.read(6) # Block indefinitely until and error is received, hence why it is on a new thread
      ... # read error, 
          # handle error (invalid token is just one type of error), 
          # capture ID of the failed notification, add to error queue, etc.
      @error_encountered = true # Let the writing thread know that it should stop sending
       ... # handle connection reset
      end
    end
   }  # Do not join on main thread!
 end

 def send_notifications
   @some_array_or_queue_with_notifications.each do |notification|
     break if @error_encountered
     ...
     @connection.write(notification.message) # Again, using the SAME CONNECTION
     ...
   end
 end
end

A couple things to note, first you need to make sure the notification objects have a unique identifier to be able to determine when an error was encountered. When an error is detected by the reading thread, it is possible for a couple notifications to still be sent by the writing thread, so that's where the notification IDs and queues come into play, you will need to "replay" the write from the point the notification failed. Second, the error thread will stay open as long as the connection is open and an error has not been received, so you're not spawning threads all the time. For many reasons it's in your best interest to use persistent long running connections (e.g. up to 24 hours).

@j4zz That will certainly work for a small number of notifications, although you can send a lot of notifications in one second. Just a word of caution, one connection per notification for a large number of notifications is handled like a DoS attack and all connection attempts from your server will be blocked until you contact Apple (yes they do track and monitor it); I don't know where they define the threshold, but as long as your volume is low I suppose you can slip under the radar.

lukasnagl commented 9 years ago

@cognitiveflux Just for clarification, my example is meant to send all notifications on a single connection, and afterwards checks the same connection for an error (that may have happened some time ago during sending). If so, resend all notifications after the erroneous one (since that one caused the connection to ignore all further notifications) on a new connection.

joonty commented 9 years ago

@cognitiveflux thanks very much for the detailed code - I'll try that. My first attempt was to use the same connection, but I must have been doing something wrong.

It seems that there are quite a few issues relating in some way to the inane way that Apple report on errors (and nothing to do with a problem with Houston), so feel free to close this issue. Then again, it may be helpful to others.

Thanks

ferrous commented 9 years ago

I'd got a solution (os dependent - linux) which validates connection status before sending the notification. It works great with Sidekiq, sharing one connection with all threads. Does not open the connection until needs to send something (sharing code with front and queue servers) https://gist.github.com/ferrous/d498971be300683c3bbd

nomad-cli / houston

Notifications sent via persistent connection don't arrive #83

Edit