koush / AndroidAsync

Asynchronous socket, http(s) (client+server) and websocket library for android. Based on nio, not threads.
Other
7.52k stars 1.56k forks source link

Keep-Alive sockets closed by server are not cleaned up, and reused ceaselessly #132

Open markshiz opened 10 years ago

markshiz commented 10 years ago

After an Ice Cream Sandwich device goes to sleep and becomes active again, it appears as if the cached keep-alive sockets are no longer valid. This results in the app using them basically being non-functional. I'm not sure what the fix for this is, but there may be a way to detect the state of the socket before reuse. See verbose logs below:

 1750                    ION  D  (0 ms) https://redacted.com/api/core/v3/inbox?fields=%40all&filter=type%28annou
                               ncement%2Cdiscussion%2Cdocument%2Cpost%2Cpoll%2Cupdate%2Cvideo%2Cfile%2Cmessage%2Ccomment
                               %2Ctask%2Cshare%2Cdm%2Cidea%2Cevent%2Cacclaim%29&count=25&before=2014-04-02T04%3A00%3A15.
                               019%2B0000: preparing request
  1750                    ION  I  (0 ms) https://redacted.com/api/core/v3/inbox?fields=%40all&filter=type%28annou
                               ncement%2Cdiscussion%2Cdocument%2Cpost%2Cpoll%2Cupdate%2Cvideo%2Cfile%2Cmessage%2Ccomment
                               %2Ctask%2Cshare%2Cdm%2Cidea%2Cevent%2Cacclaim%29&count=25&before=2014-04-02T04%3A00%3A15.
                               019%2B0000: Using loader: com.koushikdutta.ion.loader.HttpLoader@4167e178
  1750                    ION  D  (0 ms) https://redacted.com/api/core/v3/inbox?fields=%40all&filter=type%28annou
                               ncement%2Cdiscussion%2Cdocument%2Cpost%2Cpoll%2Cupdate%2Cvideo%2Cfile%2Cmessage%2Ccomment
                               %2Ctask%2Cshare%2Cdm%2Cidea%2Cevent%2Cacclaim%29&count=25&before=2014-04-02T04%3A00%3A15.
                               019%2B0000: Executing request.
  1750                    ION  D  (2 ms) https://redacted.com/api/core/v3/inbox?fields=%40all&filter=type%28annou
                               ncement%2Cdiscussion%2Cdocument%2Cpost%2Cpoll%2Cupdate%2Cvideo%2Cfile%2Cmessage%2Ccomment
                               %2Ctask%2Cshare%2Cdm%2Cidea%2Cevent%2Cacclaim%29&count=25&before=2014-04-02T04%3A00%3A15.
                               019%2B0000: Reusing keep-alive socket
  1750                    ION  V  (3 ms) https://redacted.com/api/core/v3/inbox?fields=%40all&filter=type%28annou
                               ncement%2Cdiscussion%2Cdocument%2Cpost%2Cpoll%2Cupdate%2Cvideo%2Cfile%2Cmessage%2Ccomment
                               %2Ctask%2Cshare%2Cdm%2Cidea%2Cevent%2Cacclaim%29&count=25&before=2014-04-02T04%3A00%3A15.
                               019%2B0000: socket connected
  1750                    ION  V  (13 ms) https://redacted.com/api/core/v3/inbox?fields=%40all&filter=type%28anno
                               uncement%2Cdiscussion%2Cdocument%2Cpost%2Cpoll%2Cupdate%2Cvideo%2Cfile%2Cmessage%2Ccommen
                               t%2Ctask%2Cshare%2Cdm%2Cidea%2Cevent%2Cacclaim%29&count=25&before=2014-04-02T04%3A00%3A15
                               .019%2B0000:
  1750                    ION  V  GET /api/core/v3/inbox?fields=%40all&filter=type%28announcement%2Cdiscussion%2Cdocument%2
                               Cpost%2Cpoll%2Cupdate%2Cvideo%2Cfile%2Cmessage%2Ccomment%2Ctask%2Cshare%2Cdm%2Cidea%2Ceve
                               nt%2Cacclaim%29&count=25&before=2014-04-02T04%3A00%3A15.019%2B0000 HTTP/1.1
  1750                    ION  V  Host: redacted.com
  1750                    ION  V  User-Agent: Dalvik/1.6.0 (Linux; U; Android 4.0.4; Galaxy Nexus Build/IMM76I)
  1750                    ION  V  Accept-Encoding: gzip, deflate
  1750                    ION  V  Connection: keep-alive
  1750                    ION  V  Accept: */*
  1750                    ION  V  X-Jive-Client: eyJBcHAtSUQiOiJKaXZlIGZvciBBbmRyb2lkIFNESyIsIkFwcC1TcGVjIjp7ImNvbm5lY3Rpb2
                               5UeXBlIjoiV0lGSSIsImRldmljZVBsYXRmb3JtIjoic2Ftc3VuZyBHYWxheHkgTmV4dXMiLCJkZXZpY2VWZXJzaW9
                               uIjoiNC4wLjQiLCJyZXF1ZXN0T3JpZ2luIjoiQW5kcm9pZCBOYXRpdmUifSwiQXBwLVZlcnNpb24iOiIzLjAuMCJ9
  1750                    ION  V  Cookie: BIGipServerredacted-pool="1263536650.36895.0000";$Path="";$Domain="redacted.jiv
                               eland.com";$Port="443"; redactedRegularLoginUserCookie="";$Path="";$Domain="redacted.redactedlan
                               d.com";$Port="443"; JSESSIONID="77B48BE80532C9E14DE18FEBEFA0CB09.node03";$Path="";$Domain
                               ="redacted.com";$Port="443"; redacted.user.loggedIn="true";$Path="";$Domain="brewsp
                               ace.redactedland.com";$Port="443"; redacted.login.ts="1396906406570";$Path="";$Domain="redacted.
                               redactedland.com";$Port="443"; redacted.server.info="serverName=redacted.com:serverPort
                               =443:contextPath=:localName=localhost:localPort=9001:localAddr=127.0.0.1";$Path="";$Domai
                               n="redacted.com";$Port="443"; redactedSSOLoginUserCookie="true";$Path="";$Domain="b
                               rewspace.redactedland.com";$Port="443"; redacted.security.context="uZL2Gpbp2bQ9Lams1M3J2gAAAAAAT6
                               1jxZZZQKqcANTA3MDsUYIXqXHzsEqPGnzZ1eGZ";$Path="";$Domain="redacted.com";$Port="
                               443"; noredirect=true; JSESSIONID=FDCCBE774157964F2531553EE9FF6655.node03; X-JCAPI-Token=
                               2PiH7vzX; redacted.security.context=faA2JqQasEUBfmEqrRewNAAAAAAAT61jvWbyy4J5vIV1K+KyAoMtIcybQ
                               sLCbqGsgEKL; BIGipServerredacted-pool=1246759434.36895.0000
  1750                    ION  V  (20 ms) https://redacted.com/api/core/v3/inbox?fields=%40all&filter=type%28anno
                               uncement%2Cdiscussion%2Cdocument%2Cpost%2Cpoll%2Cupdate%2Cvideo%2Cfile%2Cmessage%2Ccommen
                               t%2Ctask%2Cshare%2Cdm%2Cidea%2Cevent%2Cacclaim%29&count=25&before=2014-04-02T04%3A00%3A15
                               .019%2B0000: request completed
  1750               dalvikvm  D  GC_CONCURRENT freed 952K, 17% free 15793K/18823K, paused 3ms+3ms
  1750                    ION  D  (0 ms) https://redacted.com/api/core/v3/activities: preparing request
  1750                    ION  I  (0 ms) https://redacted.com/api/core/v3/activities: Using loader: com.koushikdu
                               tta.ion.loader.HttpLoader@4167e178
  1750                    ION  D  (0 ms) https://redacted.com/api/core/v3/activities: Executing request.
  1750                    ION  D  (0 ms) https://redacted.com/api/core/v3/activities: Reusing keep-alive socket
  1750                    ION  V  (1 ms) https://redacted.com/api/core/v3/activities: socket connected
  1750                    ION  V  (4 ms) https://redacted.com/api/core/v3/activities:
  1750                    ION  V  GET /api/core/v3/activities HTTP/1.1
  1750                    ION  V  Host: redacted.com
  1750                    ION  V  User-Agent: Dalvik/1.6.0 (Linux; U; Android 4.0.4; Galaxy Nexus Build/IMM76I)
  1750                    ION  V  Accept-Encoding: gzip, deflate
  1750                    ION  V  Connection: keep-alive
  1750                    ION  V  Accept: */*
  1750                    ION  V  X-Jive-Client: eyJBcHAtSUQiOiJKaXZlIGZvciBBbmRyb2lkIFNESyIsIkFwcC1TcGVjIjp7ImNvbm5lY3Rpb2
                               5UeXBlIjoiV0lGSSIsImRldmljZVBsYXRmb3JtIjoic2Ftc3VuZyBHYWxheHkgTmV4dXMiLCJkZXZpY2VWZXJzaW9
                               uIjoiNC4wLjQiLCJyZXF1ZXN0T3JpZ2luIjoiQW5kcm9pZCBOYXRpdmUifSwiQXBwLVZlcnNpb24iOiIzLjAuMCJ9
  1750                    ION  V  Cookie: BIGipServerredacted-pool="1263536650.36895.0000";$Path="";$Domain="redacted.com
                               ";$Port="443"; redactedRegularLoginUserCookie="";$Path="";$Domain="redacted.redactedlan
                               d.com";$Port="443"; JSESSIONID="77B48BE80532C9E14DE18FEBEFA0CB09.node03";$Path="";$Domain
                               ="redacted.com";$Port="443"; redacted.user.loggedIn="true";$Path="";$Domain="brewsp
                               ace.redactedland.com";$Port="443"; redacted.login.ts="1396906406570";$Path="";$Domain="redacted.
                               redactedland.com";$Port="443"; redacted.server.info="serverName=redacted.com:serverPort
                               =443:contextPath=:localName=localhost:localPort=9001:localAddr=127.0.0.1";$Path="";$Domai
                               n="redacted.com";$Port="443"; redactedSSOLoginUserCookie="true";$Path="";$Domain="b
                               rewspace.redactedland.com";$Port="443"; redacted.security.context="uZL2Gpbp2bQ9Lams1M3J2gAAAAAAT6
                               1jxZZZQKqcANTA3MDsUYIXqXHzsEqPGnzZ1eGZ";$Path="";$Domain="redacted.com";$Port="
                               443"; noredirect=true; JSESSIONID=FDCCBE774157964F2531553EE9FF6655.node03; X-JCAPI-Token=
                               2PiH7vzX; redacted.security.context=faA2JqQasEUBfmEqrRewNAAAAAAAT61jvWbyy4J5vIV1K+KyAoMtIcybQ
                               sLCbqGsgEKL; BIGipServerredacted-pool=1246759434.36895.0000
  1750                    ION  V  (6 ms) https://redacted.com/api/core/v3/activities: request completed
  1216             MtpService  D  addStorageLocked 65537 /mnt/sdcard
  1216             MtpService  D  starting MTP server in PTP mode
  1216             MtpService  D  addStorageLocked 65537 /mnt/sdcard
  286                dalvikvm  D  GC_CONCURRENT freed 450K, 23% free 14677K/19015K, paused 2ms+6ms
 32025         iu.Environment  I  update battery state; isPlugged? true*
 32025      iu.UploadsManager  I  num queued entries: 0
 32025      iu.UploadsManager  I  num updated entries: 0
 32025         iu.SyncManager  I  NEXT; no task
 32025   iu.FingerprintMana..  I  Start processing all media
 32025   iu.FingerprintMana..  I  Start processing media store URI: content://media/external/images/media
 32025   iu.FingerprintMana..  I  Start processing media store URI: content://media/external/video/media
 32025   iu.FingerprintMana..  I  Start processing media store URI: content://media/phoneStorage/images/media
  286                 PowerUI  W  unknown intent: Intent { act=android.intent.action.ACTION_POWER_CONNECTED flg=0x10000010 
                               }
 32025   iu.FingerprintMana..  I  Start processing media store URI: content://media/phoneStorage/video/media
 32025   iu.FingerprintMana..  I  Finished generating fingerprints; 0.023 seconds
 32025   iu.FingerprintMana..  I  numSeen=1 numGenerated=0 numDeleted=0 numFailed=0
 32025               dalvikvm  D  GC_CONCURRENT freed 39K, 48% free 15116K/28551K, paused 3ms+4ms
  1750                    ION  E  (30036 ms) https://redacted.com/api/core/v3/inbox?fields=%40all&filter=type%28a
                               nnouncement%2Cdiscussion%2Cdocument%2Cpost%2Cpoll%2Cupdate%2Cvideo%2Cfile%2Cmessage%2Ccom
                               ment%2Ctask%2Cshare%2Cdm%2Cidea%2Cevent%2Cacclaim%29&count=25&before=2014-04-02T04%3A00%3
                               A15.019%2B0000: Connection error
  1750                    ION  E  null
  1750                    ION  E  java.util.concurrent.TimeoutException
  1750                    ION  E  at com.koushikdutta.async.http.AsyncHttpClient$2.run(AsyncHttpClient.java:186)
  1750                    ION  E  at com.koushikdutta.async.AsyncServer.lockAndRunQueue(AsyncServer.java:683)
  1750                    ION  E  at com.koushikdutta.async.AsyncServer.runLoop(AsyncServer.java:694)
  1750                    ION  E  at com.koushikdutta.async.AsyncServer.run(AsyncServer.java:601)
  1750                    ION  E  at com.koushikdutta.async.AsyncServer.access$700(AsyncServer.java:41)
  1750                    ION  E  at com.koushikdutta.async.AsyncServer$14.run(AsyncServer.java:549)
  1750   com.redactedsoftware.a..  E  Error in GET request to https://redacted.com/api/core/v3/inbox?fields=%40all&fi
                               lter=type%28announcement%2Cdiscussion%2Cdocument%2Cpost%2Cpoll%2Cupdate%2Cvideo%2Cfile%2C
                               message%2Ccomment%2Ctask%2Cshare%2Cdm%2Cidea%2Cevent%2Cacclaim%29&count=25&before=2014-04
                               -02T04%3A00%3A15.019%2B0000 failed
  1750   com.redactedsoftware.a..  E  java.util.concurrent.TimeoutException
  1750   com.redactedsoftware.a..  E  at com.koushikdutta.async.http.AsyncHttpClient$2.run(AsyncHttpClient.java:186)
  1750   com.redactedsoftware.a..  E  at com.koushikdutta.async.AsyncServer.lockAndRunQueue(AsyncServer.java:683)
  1750   com.redactedsoftware.a..  E  at com.koushikdutta.async.AsyncServer.runLoop(AsyncServer.java:694)
  1750   com.redactedsoftware.a..  E  at com.koushikdutta.async.AsyncServer.run(AsyncServer.java:601)
  1750   com.redactedsoftware.a..  E  at com.koushikdutta.async.AsyncServer.access$700(AsyncServer.java:41)
  1750   com.redactedsoftware.a..  E  at com.koushikdutta.async.AsyncServer$14.run(AsyncServer.java:549)
  1750                    ION  E  (30019 ms) https://redacted.com/api/core/v3/activities: Connection error
  1750                    ION  E  null
  1750                    ION  E  java.util.concurrent.TimeoutException
  1750                    ION  E  at com.koushikdutta.async.http.AsyncHttpClient$2.run(AsyncHttpClient.java:186)
  1750                    ION  E  at com.koushikdutta.async.AsyncServer.lockAndRunQueue(AsyncServer.java:683)
  1750                    ION  E  at com.koushikdutta.async.AsyncServer.runLoop(AsyncServer.java:694)
  1750                    ION  E  at com.koushikdutta.async.AsyncServer.run(AsyncServer.java:601)
  1750                    ION  E  at com.koushikdutta.async.AsyncServer.access$700(AsyncServer.java:41)
  1750                    ION  E  at com.koushikdutta.async.AsyncServer$14.run(AsyncServer.java:549)
  1750   com.redactedsoftware.a..  E  Error in GET request to https://redacted.com/api/core/v3/activities failed
  1750   com.redactedsoftware.a..  E  java.util.concurrent.TimeoutException
  1750   com.redactedsoftware.a..  E  at com.koushikdutta.async.http.AsyncHttpClient$2.run(AsyncHttpClient.java:186)
  1750   com.redactedsoftware.a..  E  at com.koushikdutta.async.AsyncServer.lockAndRunQueue(AsyncServer.java:683)
  1750   com.redactedsoftware.a..  E  at com.koushikdutta.async.AsyncServer.runLoop(AsyncServer.java:694)
  1750   com.redactedsoftware.a..  E  at com.koushikdutta.async.AsyncServer.run(AsyncServer.java:601)
  1750   com.redactedsoftware.a..  E  at com.koushikdutta.async.AsyncServer.access$700(AsyncServer.java:41)
  1750   com.redactedsoftware.a..  E  at com.koushikdutta.async.AsyncServer$14.run(AsyncServer.java:549)
  1750               dalvikvm  D  GC_CONCURRENT freed 1487K, 19% free 15261K/18823K, paused 5ms+5ms
:%s/

Not sure if it's related or not, but if I scroll up far enough in the logs, I can see a connection being reset.

  1750                    NIO  E  Unhandled exception
  1750                    NIO  E  java.net.SocketException: recvfrom failed: ECONNRESET (Connection reset by peer)
  1750                    NIO  E  at libcore.io.IoBridge.maybeThrowAfterRecvfrom(IoBridge.java:542)
  1750                    NIO  E  at libcore.io.IoBridge.recvfrom(IoBridge.java:518)
  1750                    NIO  E  at java.nio.SocketChannelImpl.readImpl(SocketChannelImpl.java:305)
  1750                    NIO  E  at java.nio.SocketChannelImpl.read(SocketChannelImpl.java:267)
  1750                    NIO  E  at com.koushikdutta.async.SocketChannelWrapper.read(SocketChannelWrapper.java:24)
  1750                    NIO  E  at com.koushikdutta.async.AsyncNetworkSocket.onReadable(AsyncNetworkSocket.java:153)
  1750                    NIO  E  at com.koushikdutta.async.AsyncServer.runLoop(AsyncServer.java:739)
  1750                    NIO  E  at com.koushikdutta.async.AsyncServer.run(AsyncServer.java:601)
  1750                    NIO  E  at com.koushikdutta.async.AsyncServer.access$700(AsyncServer.java:41)
  1750                    NIO  E  at com.koushikdutta.async.AsyncServer$14.run(AsyncServer.java:549)
  1750                    NIO  E  Caused by: libcore.io.ErrnoException: recvfrom failed: ECONNRESET (Connection reset by pe
                               er)
  1750                    NIO  E  at libcore.io.Posix.recvfromBytes(Native Method)
  1750                    NIO  E  at libcore.io.Posix.recvfrom(Posix.java:126)
  1750                    NIO  E  at libcore.io.BlockGuardOs.recvfrom(BlockGuardOs.java:159)
  1750                    NIO  E  at libcore.io.IoBridge.recvfrom(IoBridge.java:515)
  1750                    NIO  E  ... 8 more
koush commented 10 years ago

Which device in particular? Are you on wifi or mobile network?

markshiz commented 10 years ago

This device was on Wi-Fi. Galaxy Nexus.

markshiz commented 10 years ago

Basically the case to reproduce is to let a device sit on Wi-Fi long enough for the radio to power down. Keep the existing app in memory. At some point later, perhaps the next day or so, pick up the device again and begin using it. My guess is AsyncSocketMiddleware is reusing a socket it shouldn't for some reason. I haven't exactly been able to pin it down yet.

koush commented 10 years ago

Alright. TCP is weird in that you can't actually see if a socket has died unless the other end sends a fin packet or you attempt to write to the socket and it fails. Socket.io has a ping built into the protocol to address this exact situation. I can work around this by forcing sockets that are too old to be force closed.

markshiz commented 10 years ago

Yes. Doing a little more research, it looks as if the code is not properly handling keep-alive sockets after they have been closed by the server. I'm not seeing a socket TimeoutException() ever removing a socket from consideration for being recycled. It's also not closing the socket out. Looks like you require a response for that currently:

   private void reportConnectedCompleted(FutureAsyncHttpResponse cancel, Exception ex, AsyncHttpResponseImpl response, AsyncHttpRequest request, final HttpConnectCallback callback) {
        assert callback != null;
        boolean complete;
        if (ex != null) {
            request.loge("Connection error", ex);
            complete = cancel.setComplete(ex);
        }
        else {
            request.logd("Connection successful");
            complete = cancel.setComplete(response);
        }
        if (complete) {
            callback.onConnectCompleted(ex, response);
            assert ex != null || response.getSocket() == null || response.getDataCallback() != null || response.isPaused();
            return;
        }

        if (response != null) {
            // the request was cancelled, so close up shop, and eat any pending data
            response.setDataCallback(new NullDataCallback());
            response.close();
        }
    }

There are options -

  1. You could close the socket here. That isn't really a solution for this problem, however. It's more complex than that. You'd probably have to do a shorter read request the first time to see if there's anything wrong when reusing the socket, as it's entirely unreasonable to have the client wait upwards of timeout * n (where n=the number of recycled sockets) before making a request.
  2. You can also force the sockets closed after a time period, but my guess is the timeout period might be device or at least OS dependent.
  3. You probably don't want to or can't keep them alive yourself given Android's app constraints. But there's the _TCPKEEPIDLE and _TCPKEEPINTVL socket options to trigger the OS to keep the sockets alive. Nevertheless, it seems like you might be draining people's battery doing that.
  4. Some combination of the above.
koush commented 10 years ago

The connect timeout itself will close the socket (whether or not if a response was received)

            // set connect timeout
            cancel.timeoutRunnable = new Runnable() {
                @Override
                public void run() {
                    // we've timed out, kill the connections
                    if (data.socketCancellable != null) {
                        data.socketCancellable.cancel();
                        if (data.socket != null)
                            data.socket.close();
                    }
                    reportConnectedCompleted(cancel, new TimeoutException(), null, request, callback);
                }
            };

Since it is nio, it is always "reading". A read event will come in and with a zero length data indicating failure. I've heard that another viable option is to do a 0 byte write. I can maybe try that.

Though, as I mentioned, the easiest fix is to just not hang onto keep-alive sockets indefinitely and time them out after some period. Holding onto them forever is not the best idea anyways. Furthermore, while I'm in there, I can respect server responses with a keep-alive header (which specifies connection reuse parameters).

markshiz commented 10 years ago

I agree that part of the solution would be to not hold onto the sockets forever. You certainly don't want to do that.

The tricky part will be determining the timeout period. The header will inform you as to the host's constraints, which may be different from the Android device's. If idle long enough, the Android OS may kill the socket, too. Determining what that period is may be possible or impossible from device to device. I really don't know. That is why I suggested the other socket options or initiating a socket read/write to see if the socket works before using it to mitigate that risk.

koush commented 10 years ago

Yeah, I agree that working a zero byte write in there is a good plan.

koush commented 10 years ago

Part 1 of the fix... time them out

3168f78

swatigoel commented 10 years ago

@koush can you please update on this issue also?

https://github.com/koush/AndroidAsync/issues/117

nschwermann commented 9 years ago

I think this may be related to my issue https://github.com/koush/ion/issues/374

kaizenx commented 7 years ago

This bug just re-surfaced in 2.1.8 whereby Reusing keep-alive socket is stuck forever.