mwkirk / javapns

Test import of svn javapns repo from Google Code
3 stars 0 forks source link

The queue lost message sometimes #115

Open mwkirk opened 11 years ago

mwkirk commented 11 years ago

Original author: minkon1...@gmail.com (April 12, 2012 08:11:09)

What steps will reproduce the problem? 1.User push queue with two thread, the code is

PushQueue queue = Push.queue(keystore, password, production, MAX_POOL_SIZE); queue.start();

  1. when send message,use the code queue.add(payload, token);

3.most time it work fine, but some time it will lost some message.

What is the expected output? What do you see instead? some message was lost.

What version of the product are you using? On what operating system? 2.2,amazon linux

From the debug log, i fond for some message it was not send out.After adding sound it will begun to add next message. The current message is not send. This will recover after several message.

Below is the log

012-04-12 07:14:29,332 DEBUG [javapns.notification.Payload] - Adding custom Dictionary [msgbcast] = [***] 012-04-12 07:14:29,332 DEBUG [javapns.notification.Payload] - Adding badge [4] 012-04-12 07:14:29,332 DEBUG [javapns.notification.Payload] - Adding sound [default] 012-04-12 07:15:06,207 DEBUG [javapns.notification.Payload] - Adding custom Dictionary [msgbcast] = [******_] 012-04-12 07:15:06,207 DEBUG [javapns.notification.Payload] - Adding badge [5] 012-04-12 07:15:06,207 DEBUG [javapns.notification.Payload] - Adding sound [default] 012-04-12 07:15:23,666 DEBUG [javapns.notification.Payload] - Adding custom Dictionary [msgbcast] = [******_] 012-04-12 07:15:23,666 DEBUG [javapns.notification.Payload] - Adding badge [6] 012-04-12 07:15:23,666 DEBUG [javapns.notification.Payload] - Adding sound [default] 012-04-12 07:15:37,712 DEBUG [javapns.notification.Payload] - Adding custom Dictionary [msgbcast] = [*******] 012-04-12 07:15:37,712 DEBUG [javapns.notification.Payload] - Adding badge [7] 012-04-12 07:15:37,712 DEBUG [javapns.notification.Payload] - Adding sound [default]

012-04-12 07:16:20,771 DEBUG [javapns.notification.Payload] - Adding badge [1] 012-04-12 07:16:20,771 DEBUG [javapns.notification.Payload] - Adding sound [default] 012-04-12 07:16:20,772 DEBUG [javapns.notification.PushNotificationManager] - Building Raw message from deviceToken and payload 012-04-12 07:16:20,772 DEBUG [javapns.notification.PushNotificationManager] - Built raw message ID 16777260 of total length 191 012-04-12 07:16:20,772 DEBUG [javapns.notification.PushNotificationManager] - Attempting to send notification: {"aps":{"sound":"default","badge":1},"msgbcast":"***"} 012-04-12 07:16:20,772 DEBUG [javapns.notification.PushNotificationManager] - to device:_**** 012-04-12 07:16:20,772 DEBUG [javapns.notification.PushNotificationManager] - Flushing 012-04-12 07:16:20,772 DEBUG [javapns.notification.PushNotificationManager] - At this point, the entire 191-bytes message has been streamed out successfully through the SSL connection

Original issue: http://code.google.com/p/javapns/issues/detail?id=115

mwkirk commented 11 years ago

From sype...@gmail.com on April 13, 2012 17:25:46 Could you please be more specific about what you mean by "lost messages"? Do you mean that the messages added to the queue are never sent by the library, or that the library does send the messages but they never reach the devices?

mwkirk commented 11 years ago

From minkon1...@gmail.com on April 17, 2012 06:42:57 the messages added to the queue but never reach the devices.

I fixed this problem by close the connection(to apn server) after idle for more then 5 minutes. I found when the connection idle for a long time. the new message maybe can't reach the devices.

mwkirk commented 11 years ago

From sype...@gmail.com on April 21, 2012 16:47:18 I am unable to reproduce the issue. Could you please attach a stack trace dump for when the queue has messages to deliver but apparently is not sending them? This will help understand what the library is doing... Thank you!

mwkirk commented 11 years ago

From oli...@crowdplayce.com on April 26, 2012 07:50:09 I have the same problem (I am using a Queue. Sometimes my messages are lost with no explanation, and everything seems to be working fine.

mwkirk commented 11 years ago

From sype...@gmail.com on April 26, 2012 21:47:40 Unfortunately, without more details, I cannot reproduce the issue. Could you please attach a stack trace dump for when the queue has messages to deliver but apparently is not sending them?

mwkirk commented 11 years ago

From fredrik....@gmail.com on May 03, 2012 08:06:47 I have the exact same problem. The problem is NOT that the message gets stuck in the queue. The queue always send the message according to the log. That is, I get

DEBUG [javapns.notification.PushNotificationManager] - At this point, the entire 191-bytes message has been streamed out successfully through the SSL connection

but the message never reaches the device. No errors ever show up in the log.

This seems to happen after the connection has been idle for some time. A restart of the queue fixes the problem.

If I do

PushedNotifications notifications = queue.getPushedNotifications(); Iterator iter = notifications.iterator(); while(iter.hasNext()) { PushedNotification pn = iter.next(); log.debug("success: " + pn.isSuccessful()); log.debug("PN exception: ", pn.getException()); }

every PushedNotification gets success: true and no exceptions.

mwkirk commented 11 years ago

From oli...@crowdplayce.com on May 03, 2012 08:42:01

8 thats exactly what happens to me. The connection becomes idle and it stops working

mwkirk commented 11 years ago

From jcost...@gmail.com on May 07, 2012 08:55:42

Once I see this issue (where the message is reported delivered but never actually makes it) I get the following for successive notifications:

2012-05-07 08:14:04 PushNotificationManager [DEBUG] Building Raw message from deviceToken and payload 2012-05-07 08:14:04 PushNotificationManager [DEBUG] Built raw message ID 16777262 of total length 171 2012-05-07 08:14:04 PushNotificationManager [DEBUG] Attempting to send notification:

2012-05-07 08:14:04 PushNotificationManager [DEBUG] to device:

..and that's the last I hear from that push. No failures, nothing.

mwkirk commented 11 years ago

From sype...@gmail.com on May 07, 2012 13:29:18 As I asked a few times in this thread, a stack trace dump is needed to investigate the issue. Otherwise nothing is being done on this issue.

mwkirk commented 11 years ago

From fredrik....@gmail.com on May 15, 2012 20:15:11 hi sype!

i'm not sure i understand what you want me to provide. could you be more specific? the messages do get sent so the queue is empty and there are no error stack traces. in fact, i get a positive response saying that notification was sent on first attempt.

mwkirk commented 11 years ago

From sype...@gmail.com on May 15, 2012 20:41:24 a stack trace dump, which is a basic debugging operation for capturing the state of all threads in a Java program... (http://java.sun.com/developer/technicalArticles/Programming/Stacktrace/)

mwkirk commented 11 years ago

From hguve...@gmail.com on May 25, 2012 00:24:58 i have the exact same problem on the same server (amazon linux). i am also looking for a solution. @sype: i would have provided the dump stacktrace you ask if you could tell us at what moment you want it (before adding into queue, during the push -i don't know what to look for-, after the push) ??

mwkirk commented 11 years ago

From hguve...@gmail.com on May 25, 2012 00:46:30 I have found something about the dropping connections regarding servers on Amazon;

https://forums.aws.amazon.com/thread.jspa?threadID=33427&start=0&tstart=0

Long story short: We cannot use PushQueue as it is implemented at the moment. It needs to send heartbeat data if it is inactive for less then every 60 seconds.

mwkirk commented 11 years ago

From hguve...@gmail.com on May 25, 2012 01:14:48 i have looked into this connection drop issue and want to confirm it on my server Amazon Linux on micro instance EC2 ... this is the result

[ec2-user@XXX ~]$ lsof | grep TCP java 14893 ec2-user 375u IPv6 375102 0t0 TCP :8181 (LISTEN) java 14893 ec2-user 376u IPv6 375103 0t0 TCP :webcache (LISTEN) java 14893 ec2-user 381u IPv6 375107 0t0 TCP :appserv-http (LISTEN) java 14893 ec2-user 385u IPv6 375109 0t0 TCP :lrs-paging (LISTEN) java 14893 ec2-user 389u IPv6 375111 0t0 TCP :imqbrokerd (LISTEN) java 14893 ec2-user 462u IPv6 478553 0t0 TCP XXX.compute-1.internal:50020->st11p01st-interface003-bz.push.apple.com:2195 (CLOSE_WAIT) java 14893 ec2-user 465u IPv6 478554 0t0 TCP XXX.compute-1.internal:50021->st11p01st-interface003-bz.push.apple.com:2195 (CLOSE_WAIT) java 14893 ec2-user 468u IPv6 376504 0t0 TCP :sun-as-jmxrmi (LISTEN) java 14893 ec2-user 500u IPv6 442333 0t0 TCP XXX.compute-1.internal:52537->st11p01st-interface010-bz.push.apple.com:2195 (CLOSE_WAIT) java 14893 ec2-user 503u IPv6 442347 0t0 TCP XXX.compute-1.internal:36621->st11p01st-interface009-bz.push.apple.com:2195 (CLOSE_WAIT) java 14893 ec2-user 519u IPv6 455136 0t0 TCP XXX.compute-1.internal:40070->st11p01st-interface009-bz.push.apple.com:2195 (CLOSE_WAIT) java 14893 ec2-user 520u IPv6 477479 0t0 TCP XXX.compute-1.internal:42686->st11p01st-interface001-bz.push.apple.com:2195 (CLOSE_WAIT) java 14893 ec2-user 521u IPv6 477480 0t0 TCP XXX.compute-1.internal:42687->st11p01st-interface001-bz.push.apple.com:2195 (CLOSE_WAIT) java 14893 ec2-user 522u IPv6 477495 0t0 TCP XXX.compute-1.internal:51542->nk11p01st-interface003-bz.push.apple.com:2195 (CLOSE_WAIT) java 14893 ec2-user 527u IPv6 455120 0t0 TCP XXX.compute-1.internal:34270->st11p01st-interface015-bz.push.apple.com:2195 (CLOSE_WAIT) java 14893 ec2-user 530u IPv6 455121 0t0 TCP XXX.compute-1.internal:34271->st11p01st-interface015-bz.push.apple.com:2195 (CLOSE_WAIT) java 14893 ec2-user 534u IPv6 455135 0t0 TCP XXX.compute.compute-1.internal:40069->st11p01st-interface009-bz.push.apple.com:2195 (CLOSE_WAIT) java 14893 ec2-user 535u IPv6 478594 0t0 TCP XXX.compute-1.internal:34723->st11p01st-interface015-bz.push.apple.com:2195 (CLOSE_WAIT) java 14893 ec2-user 543u IPv6 478673 0t0 TCP XXX.compute-1.internal:57461->nk11p01st-interface014-bz.push.apple.com:2195 (CLOSE_WAIT) java 14893 ec2-user 544u IPv6 478686 0t0 TCP XXX.compute-1.internal:33197->st11p01st-interface012-bz.push.apple.com:2195 (CLOSE_WAIT)

mwkirk commented 11 years ago

From hguve...@gmail.com on May 25, 2012 01:38:33 this is really irritating to say that but the Queue implementation doesn't provide production requirements for any project...

this is the bottom line issue of the implementation of flushing data into CLOSE_WAIT connection; http://code.google.com/p/javapns/source/browse/tags/2.2/src/javapns/notification/PushNotificationManager.java#416

and this is the issue has been talked on stackoverflow http://stackoverflow.com/questions/2028620/java-sockets-and-dropped-connections

the connection has to be closed and open on every push notification is being released.

I hoped this can help anyone who suffered with this...

mwkirk commented 11 years ago

From hguve...@gmail.com on May 25, 2012 01:41:50 the stackoverflow link is about an issue for apns on github... different projects same issue...

mwkirk commented 11 years ago

From oli...@crowdplayce.com on June 20, 2012 14:37:09 My server just stopped sending.. Here is stack trace (parts of it).. No deadlocks

Thread 1207: (state = BLOCKED)

Thread 1206: (state = BLOCKED)

Thread 28448: (state = BLOCKED)

Thread 28447: (state = IN_NATIVE)

Thread 28446: (state = BLOCKED)

Thread 28441: (state = BLOCKED)

Thread 28436: (state = BLOCKED)

Thread 28435: (state = BLOCKED)

Thread 28434: (state = BLOCKED)

Thread 28433: (state = BLOCKED)

Thread 28426: (state = IN_NATIVE)

mwkirk commented 11 years ago

From jerome.b...@gmail.com on June 20, 2012 14:47:10 My little 5cents: I have the same problems on production servers. The problems typically arise after a given (long) uptime of the server.

My analysis: There is probably on the datacenter a network equipment that silently close the socket between my server and the apple servers. Because there is no ack on the push message sent to apple servers, it takes a reda on the socket or a close on the socket for the javapns queue to discover that the socket is actually closed. Alas, the messages is already sent and confirmed sent (xxx bytes has been flushed blablabla). This is confirmed because, while reading the logs while sending various messages, the loss never happens when the server connection is detected as down and renewed.

In order to solve this issue I have implemented a "reopen the socket for each messages" policy (however this must slow down the queue to a crawl, I don't know, I have no load).

    AppleNotificationServer server = new AppleNotificationServerBasicImpl(config.getKeystore(), config.getPassword(), ConnectionToAppleServer.KEYSTORE_TYPE_JKS , isProduction());
    List<NotificationThread> allThreads = new ArrayList<>();
    for (int i=0;i<config.getNumberOfThreads();i++) {
        PushNotificationManagerWithSocketReopenEachTime notifManager = new PushNotificationManagerWithSocketReopenEachTime(server);
        NotificationThread thread = new NotificationThread(null,notifManager,server);
        allThreads.add(thread);
    }
    queue = new NotificationThreads(server, null, allThreads);

config.isProduction(), config.getNumberOfThreads()); queue.start(); }

public static class PushNotificationManagerWithSocketReopenEachTime extends PushNotificationManager {

    AppleNotificationServer server;

    public PushNotificationManagerWithSocketReopenEachTime(AppleNotificationServer server) {
        super();
        this.server=server;
    }

    @Override
    public PushedNotification sendNotification(Device device, Payload payload, boolean closeAfter, int identifier) throws CommunicationException {
        try {
            restartConnection(server);
        } catch (KeystoreException e) {
            throw new FizRuntimeException("cannot connect to apns server with keystore",e);
        }
        return super.sendNotification(device, payload, closeAfter, identifier);
    }

}
mwkirk commented 11 years ago

From oli...@crowdplayce.com on June 21, 2012 06:29:50 Hey Jerome

Cool solution.. How about some time policy. Because it seems that this only happens when a connection is unused. So in order to preserve performance we reopen unused connections?. Opening a connections takes several seconds. I discovered that when I tried todo the simple one connection pr message

I have quite a bit of traffic and the connections stopped working one by one until I got no push notifications sent at all. But I need the queue in order to be able to scale my solution.

Btw Jerome... Whats this. I could not figure out where it belonged and there was a } too much? config.isProduction(), config.getNumberOfThreads());

public class PushNotificationManagerWithSocketReopenEachTime extends javapns.notification.PushNotificationManager {

 AppleNotificationServer server;
 public static final long unusedConnectionToleranceMs = 5000;
 private long lastUsage = 0;

 public PushNotificationManagerWithSocketReopenEachTime(AppleNotificationServer server) {
         super();
         this.server=server;
 }

 @Override
 public PushedNotification sendNotification(Device device, Payload payload, boolean closeAfter, int identifier) throws CommunicationException {
        if(System.currentTimeMillis() - lastUsage  > unusedConnectionToleranceMs) { 
            try {

                        restartConnection(server);
             } catch (KeystoreException e) {
                     throw new RuntimeException("cannot connect to apns server with keystore",e);
             }
        }
         lastUsage = System.currentTimeMillis();
         return super.sendNotification(device, payload, closeAfter, identifier);
 }

}

mwkirk commented 11 years ago

From manusd...@gmail.com on August 31, 2012 22:32:33 I have the same problem and I'm not sure when it would be neccesary to reopen a connection. To reopen connections it's a very expensive operation and depending the kind of the message sent, it's possible the time spent were unacceptable.

I don't need to send too many messages per hour, but I do need they arrive very fast to the device.

I'll try minkon1's solution.

mwkirk commented 11 years ago

From dksid...@gmail.com on March 01, 2013 13:11:50 Any update on this ?

We are facing similar issue in production too. Do let us if solution provided in threads, fixes the issue

mwkirk commented 11 years ago

From olive...@gmail.com on March 01, 2013 15:13:40 My fix worked, stable for 6 months