node-apn / node-apn

:calling: Apple Push Notification module for Node.js
MIT License
4.38k stars 682 forks source link

read ETIMEDOUT after a few hours #100

Closed kartast closed 10 years ago

kartast commented 11 years ago

Hi,

I am encountering read ETIMEDOUT error after starting for a few hours. This happens all the time, im runing on node.js 0.8.x.

Any idea whats causing this?

kartast commented 11 years ago

Fri, 17 May 2013 12:11:18 GMT apn Socket drained [05/17 20:11:19 GMT+0800] Fri, 17 May 2013 12:11:18 GMT apn 0 left to send [05/17 20:11:19 GMT+0800] Fri, 17 May 2013 12:11:18 GMT apn Transmitting notification from buffer [05/17 20:11:19 GMT+0800] APNS SENT [05/17 20:16:46 GMT+0800] Fri, 17 May 2013 12:16:46 GMT apn Socket error occurred { [Error: read ETIMEDOUT] code: 'ETIMEDOUT', errno: 'ETIMEDOUT', syscall: 'read' } [05/17 20:16:47 GMT+0800] Fri, 17 May 2013 12:16:46 GMT apn Raising error: { [Error: read ETIMEDOUT] code: 'ETIMEDOUT', errno: 'ETIMEDOUT', syscall: 'read' } null undefined [05/17 20:16:47 GMT+0800] Fri, 17 May 2013 12:16:46 GMT apn Error occurred with trace: Error: read ETIMEDOUT [05/17 20:16:47 GMT+0800] at errnoException (net.js:770:11) [05/17 20:16:47 GMT+0800] at TCP.onread (net.js:425:21) [05/17 20:16:47 GMT+0800] Fri, 17 May 2013 12:16:46 GMT apn Destroying connection [05/17 20:16:47 GMT+0800] Fri, 17 May 2013 12:16:46 GMT apn Socket closed

argon commented 11 years ago

Does node-apn fail to reconnect when you send more notifications?

kartast commented 11 years ago

sometimes it does, sometimes it doesn't

On Fri, May 17, 2013 at 8:27 PM, Andrew Naylor notifications@github.comwrote:

Does node-apn fail to reconnect when you send more notifications?

— Reply to this email directly or view it on GitHubhttps://github.com/argon/node-apn/issues/100#issuecomment-18058544 .

kartast commented 11 years ago

It's very weird, when i start the app, it will send notifications very fine, but then after a few hours, it will stop sending notifications. some rare times it will send again.. it's a little unreliable for me, I am wondering if i missed any implementation.

argon commented 11 years ago

Are there any other apn messages after Socket closed?

kartast commented 11 years ago

no.. it ends there..

argon commented 11 years ago

Could you please let me know what output you see after the connection times out and you try to send another notification, this should indicate where it's tripping up.

I have yet to see my connection timeout, but if you have any sample code you're using that you could send I can take a look.

argon commented 11 years ago

Having said that, I literally just noticed the socket close. I'll setup a test to see what happens.

kartast commented 11 years ago

The this is I missed the previous outputs, and now I have to wait a few hours before this happens again. But when it happen again, I will try to get the outputs.

argon commented 11 years ago

I have only been able to replicate graceful socket closures after 2.0h, no timeouts, at which point messages were successfully sent as the connection re-established itself. If you have some sample code that can replicate the problem or a system you can give me access to debug on, I will be happy to provide assistance.

kartast commented 11 years ago

It just happened again, and this time, it will not succesfully send anymore..

The code that i use to send apns is here https://gist.github.com/5603118

kartast commented 11 years ago

oh wait, it succesfully sent on 4th try, will post logs soon

kartast commented 11 years ago

so after the app running a few hours, and try send apns, it fail and the log something like this:

[05/18 11:26:11 GMT+0800] Sat, 18 May 2013 03:26:11 GMT apn Transmitting notification from buffer [05/18 11:26:11 GMT+0800] Sat, 18 May 2013 03:26:11 GMT apn Socket drained [05/18 11:26:11 GMT+0800] Sat, 18 May 2013 03:26:11 GMT apn 0 left to send

Then I send again and the log is timeout..

Then I send again ,the log similar to first one...

Then 4th time i send again, it goes through..

Unfortunately I lost the logs, I will try again next time..

argon commented 11 years ago

I'll need to see the logs to understand what's going on because the code you sent me looks as thought it would work fine. You almost certainly don't need to use ucs2 as your encoding, however.

argon commented 11 years ago

I eventually saw an ETIMEDOUT error and node-apn continued sending notifications successfully

https://gist.github.com/argon/974e0fb0e17860d027c6

I will leave my test running to see if an unsuccessful send happens.

kartast commented 11 years ago

Okay, It just happened again, and a few notifications went missing. after a few times it went through again..

[05/19 21:38:18 GMT+0800] Sun, 19 May 2013 13:38:17 GMT apn Raising error: { [Error: read ETIMEDOUT] code: 'ETIMEDOUT', errno: 'ETIMEDOUT', syscall: 'read' } null undefined [05/19 21:38:18 GMT+0800] Sun, 19 May 2013 13:38:17 GMT apn Error occurred with trace: Error: read ETIMEDOUT [05/19 21:38:18 GMT+0800] at errnoException (net.js:770:11) [05/19 21:38:18 GMT+0800] at TCP.onread (net.js:425:21) [05/19 21:38:18 GMT+0800] Sun, 19 May 2013 13:38:17 GMT apn Destroying connection [05/19 21:38:18 GMT+0800] Sun, 19 May 2013 13:38:17 GMT apn Socket closed [05/19 21:38:18 GMT+0800] Sun, 19 May 2013 13:38:17 GMT apn Socket error occurred { [Error: read ETIMEDOUT] code: 'ETIMEDOUT', errno: 'ETIMEDOUT', syscall: 'read' }

This tells anything?

kartast commented 11 years ago

I think between the start of the app, there might be no notifications for a few hours (my app is in testing phase).. and after a few hours, try send notification, this happened..

argon commented 11 years ago

Unfortunately that log tells me very little because its not in context. You will see in the log I posted that it shows the point at which a message is passed to the library and then the stages it goes through, just seeing the socket error doesn't help as I need to see at which point in the process the error occurs. I have a theory that you may have found an edge case which I will try to replicate but it would be very helpful if you would post a full log of apn activity.

kartast commented 11 years ago

Sure, How can I print more detail logs? — Sent from Mailbox for iPhone

On Mon, May 20, 2013 at 12:41 AM, Andrew Naylor notifications@github.com wrote:

Unfortunately that log tells me very little because its not in context. You will see in the log I posted that it shows the point at which a message is passed to the library and then the stages it goes through, just seeing the socket error doesn't help as I need to see at which point in the process the error occurs. I have a theory that you may have found an edge case which I will try to replicate but it would be very helpful if you would post a full log of apn activity.

Reply to this email directly or view it on GitHub: https://github.com/argon/node-apn/issues/100#issuecomment-18120504

argon commented 11 years ago

That level of logging is ok, the problem is you're not posting enough, there will be lines of logging before (and probably after) which show the code path followed. There is at least one line I would expect to see before an error being shown which isn't there, even if the lines before are seconds, minutes or hours before.

kartast commented 11 years ago

I see, I will try to save the logs more..right now I use nodejitsu and can only see recent logs..  — Sent from Mailbox for iPhone

On Mon, May 20, 2013 at 12:48 AM, Andrew Naylor notifications@github.com wrote:

That level of logging is ok, the problem is you're not posting enough, there will be lines of logging before (and probably after) which show the code path followed. There is at least one line I would expect to see before an error being shown which isn't there, even if the lines before are seconds, minutes or hours before.

Reply to this email directly or view it on GitHub: https://github.com/argon/node-apn/issues/100#issuecomment-18120646

argon commented 11 years ago

I have pushed a possible fix here: https://github.com/argon/node-apn/tree/issue100. As I am unable to replicate the situation you are seeing, could you please try out the code and let me know whether it works. The only file changed is connection.js so you could just download that and replace the one in your installation.

kartast commented 11 years ago

I will try it, and update you on the result! thanks!

karta sutanto Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

On Friday, 24 May, 2013 at 3:27 PM, Andrew Naylor wrote:

I have pushed a possible fix here: https://github.com/argon/node-apn/tree/issue100. As I am unable to replicate the situation you are seeing, could you please try out the code and let me know whether it works. The only file changed is connection.js so you could just download that and replace the one in your installation.

— Reply to this email directly or view it on GitHub (https://github.com/argon/node-apn/issues/100#issuecomment-18390339).

JiboStore commented 11 years ago

I am having the same problem as @kartast as well... but I'm new to nodejs and can't get as comprehensive log as his... I'll try @argon 's new connection.js and post my result. Thanks!

JiboStore commented 11 years ago

would anyone please advice how to get the #issue100 branch to my nodejitsu server? I've tried:

  1. modified the package.json "apn": "git@github.com:argon/node-apn.git#issue100"
  2. modified the package.json apn to point to the zipball: https://codeload.github.com/argon/node-apn/zip/issue100
  3. modified connection.js on my local node_module folder, and use jitsu deploy
  4. extract apn from node_module, modified connection.js and change all the require() script to point to the external files... all in vain...
JiboStore commented 11 years ago

I managed to get the branch in.., by modifying my package.json apn to point to: "apn" : "https://github.com/argon/node-apn/tarball/196f6882686da46e1eed1b90dc9096e87c1f1987" I'll post my result, thanks everyone!

JiboStore commented 11 years ago

negative... still ain't working... :(

argon commented 11 years ago

You mean you are seeing the error and unable to send any notifications after?

liebo commented 10 years ago

I met this problem, too. After some hours, sending notifications results in following error raised after several seconds, the nofication cannot be received by device. when the error raised, the connection will not reconnect util next notification is sent, then everything goes ok.

apn Socket error occurred +15.0m { [Error: read ETIMEDOUT] code: 'ETIMEDOUT', errno: 'ETIMEDOUT', syscall: 'read' } apn Raising error: +1ms { [Error: read ETIMEDOUT] code: 'ETIMEDOUT', errno: 'ETIMEDOUT', syscall: 'read' } null apn Error occurred with trace: +0ms Error: read ETIMEDOUT at errnoException (net.js:900:11) at TCP.onread (net.js:555:19)

numbcoder commented 10 years ago

I have the same problem. when I start the app, it will send notifications very fine, but then after a few hours, it will stop sending notifications.

I have tryed to add timeout option, but still not working. The apn module version is 1.3.8

reesemclean commented 10 years ago

I'm having the same problem, push notifications work fine after restarting server. Time passes, I get a disconnected message and then notifications won't be sent. A little time after the notifications aren't sent they start getting sent again... but time passes again and the problem starts again.

argon commented 10 years ago

Hi Reese,

I'm sorry to hear you're having this problem. Unfortunately I have not been able to replicate the problem and no one who is seeing this problem has supplied me with debug logs I can use to identify the problem. If you are able and willing to reproduce the problem and send me the entirety of the log output then I will try to help.

reesemclean commented 10 years ago

Got Debug installed and the environment variable set. Will check it in an hour or so and report back with what the logs show.

reesemclean commented 10 years ago

Alright here's what the logs show. First a breakdown of what was done to get these logs.

(Logs as a gist: https://gist.github.com/reesemclean/6617854 )

From a fresh restart, push notification sent and received. 3 more successfully sent between Wed. 23:35 and 23:53. At 00:41 another push notifications sent but this one is not received. After this, I attempted another 2 pushes (00:47) and they were received successfully:

Also maybe relevant is that I am running this on nodejitsu.

Wed, 18 Sep 2013 23:35:08 GMT apn Initialising connection
Wed, 18 Sep 2013 23:35:08 GMT apn Initialising module
Wed, 18 Sep 2013 23:35:08 GMT apn 1 left to send
Wed, 18 Sep 2013 23:35:09 GMT apn Connection established
Connected
Wed, 18 Sep 2013 23:35:09 GMT apn Transmitting notification from buffer
Wed, 18 Sep 2013 23:35:49 GMT apn 0 left to send
Wed, 18 Sep 2013 23:36:05 GMT apn 0 left to send
Wed, 18 Sep 2013 23:36:05 GMT apn Socket drained 0
Wed, 18 Sep 2013 23:36:26 GMT apn Transmitting notification from buffer
Wed, 18 Sep 2013 23:53:35 GMT apn 0 left to send
Thu, 19 Sep 2013 00:41:21 GMT apn 0 left to send
Thu, 19 Sep 2013 00:41:21 GMT apn Socket drained 0
Thu, 19 Sep 2013 00:46:45 GMT apn Error occurred with trace: Error: read ETIMEDOUT at errnoException (net.js:901:11) at TCP.onread (net.js:556:19)
Disconnected from APNS
Thu, 19 Sep 2013 00:46:45 GMT apn Destroying connection 0
Thu, 19 Sep 2013 00:46:45 GMT apn Socket closed 0
Thu, 19 Sep 2013 00:47:23 GMT apn 1 left to send
Thu, 19 Sep 2013 00:47:23 GMT apn Connection established
Thu, 19 Sep 2013 00:48:51 GMT apn 0 left to send
Thu, 19 Sep 2013 00:48:51 GMT apn Socket drained 1
argon commented 10 years ago

At this point I am quite confident in saying this is a problem with Apples' servers. I worked with some developers seeing a similar problem on a large scale deployment where a notification sent after a 15 minute break was lost but subsequent ones sent. Here is my best explanation of what I think is happening.

TL;DR: This appears to be a bug on Apple's end. I haven't filed a Radar but probably should. By setting a connectionTimeout of ~10 minutes the problem shouldn't manifest.

When a notification is sent the module will only receive data from Apple if there is a problem with it. It is this data which is passed back to your application as a transmissionError. Under-the-covers node has to check whether the socket has data available from Apple by attempting to read from it, this is why the error includes "read" in "Error: read ETIMEDOUT".

When Apple receives a notification I believe it gets put on a queue for processing and the read buffer is flushed so node-apn can continue to send further notifications, while a separate system verifies the notification (probably so it can happen in a distributed fashion). If there is a problem with the notification then the aforementioned error data gets sent back to node-apn, usually 200-500ms later.

In some of my testing I see an ECONNRESET after ~300ms, in other cases, as above, an ETIMEDOUT happens after ~5 minutes (though I have not been able to replicate directly) What appears to be happening is that something is happening on Apple's end which is causing the receiving system to drop the notification and abandon the connection.

Because nothing at all is received from Apple the module is not aware that a notification has been dropped. Even if I added logic to recognise these errors, there is no way of knowing whether it is only the most recent notification, or multiple, that have been lost so it's not possible to guarantee a notification will only be sent once. Plus I don't like this kind of guesswork.

It seems to me that after a connection has been open/idle for a long period of time the server on Apples end does not handle this well. People have only started reporting this problem quite recently, but I tried a much older version of the code and was able to replicate the same problem so I do not believe this is a node-apn regression.

The client I was working with solved the problem by timing out/recycling connections after 10 minutes, which is a fairly clean way of solving the problem. The best way of accomplishing this is probably by setting the connectionTimeout parameter (which is in milliseconds) when establishing the connection.

Incidentally in one of my production environments I actually see the connections get closed by Apple after 4 minutes and subsequent notifications are sent correctly.

2k13yr commented 10 years ago

I've set connectionTimeout to 10 minutes and 1 minute, but it doesn't work.

j0rd commented 9 years ago

I'm running into this problem and not using node-apn at all. If you search around for people doing many http requests in node, they eventually run into this problem.

For me, I'm doing roughly 160k connections in a loop. After 45 mins or so, the connections stop working. They're all to different servers, so has nothing to do with the server and most likely everything to do with node (or some library).

matteocontrini commented 8 years ago

This is a Node.js issue, it seems http://stackoverflow.com/questions/15011239/node-js-app-begins-timing-out-etimedout-when-http-connecting-to-other-sites-af