koush / AndroidAsync

Asynchronous socket, http(s) (client+server) and websocket library for android. Based on nio, not threads.
Other
7.52k stars 1.56k forks source link

BufferedDataSink will not write out the contents of mPeningWrites when using WebSocket.send() over TLS. #315

Open vladislavdonchev opened 9 years ago

vladislavdonchev commented 9 years ago

Hello again,

After I managed to get the TLS encryption working I am now facing another issue - the connection is not very stable. Shortly after it is established and I begin sending messages it disconnects. Usually it takes about 15-20 seconds, during which everything appears to be working normally. When the disconnection occurs there are no errors or exceptions thrown, so I am not sure how to properly debug the issue.

The same setup works flawlessly without TLS (same server application / machine and client).

I will greatly appreciate it if someone could suggest how to debug this on the client side, as I am fairly certain that the root cause for the issue lies there.

Also, please let me know if I need to post any code or additional information.

Regards, Vlad

vladislavdonchev commented 9 years ago

Hello,

This is definitely an issue with the library as I've tested the connection extensively with other frameworks (Grizzly and Netty) and the issue does not reproduce at all.

I've identified two problems after debugging with Wireshark:

  1. When the TLS connection is negotiated I can see that TLS 1.0 is used (I have specifically requested 'TLSv1.2' on both the server and client sides. Again, this does not reproduce with other libraries. From the server side it seems like there's a fallback to 1.0 as the client reports that 1.2 is not supported.
  2. After about 30 seconds following the initial handshake, the client (running AndroidAsync) stops sending messages. There are no exception or error messages thrown from the Java code and everything seems to be in working order, but the messages are no longer coming out of the device (I have set up tcpdump on Android as well and I can clearly see the message flow being interrupted).

The latter is true for text, binary and ping messages. As a result, after some time the connection times out and the server closes the socket.

The interesting part is that the inbound communication continues normally - although messages are no longer sent, data can still be received over the WebSocket connection on the client device...

Please suggest how to further debug and correct this issue, as it renders the TLS encryption functionality of the library useless.

Best Regards, Vlad

vladislavdonchev commented 9 years ago

I've finally identified the cause for this issue - when using TLS encryption for some reason the BufferedDataSink will become clogged. Here's what is happening:

protected void write(ByteBufferList bb, boolean ignoreBuffer) {
    if (!mPendingWrites.hasRemaining())
        mDataSink.write(bb);

    if (bb.remaining() > 0) {
        int toRead = Math.min(bb.remaining(), mMaxBuffer);
        if (ignoreBuffer)
            toRead = bb.remaining();
        if (toRead > 0) {
            bb.get(mPendingWrites, toRead);
        }
    }
}

If there are indeed bytes remaining in the mPendingWrites buffer, the write() method will simply continue adding to it instead of flushing it's contents. Could someone please explain why this doesn't happen when using a non-encrypted connection?

I've applied a quick and dirty fix to the BufferedDataSink in order to resolve the issue (I am pretty sure that it is not the correct solution, as I do not fully understand why this is happening in the first place):

protected void write(ByteBufferList bb, boolean ignoreBuffer) {
    // If there are no pending writes flush the current buffer down the sink.
    if (!mPendingWrites.hasRemaining()) {
        mDataSink.write(bb);
    } else {
        // Start flushing the entire pending writes buffer. Pull data from the current buffer
        // after each write until there is none remaining.
        while (mPendingWrites.hasRemaining()) {
            writePending();
            if (bb.remaining() > 0) {
                int toRead = Math.min(bb.remaining(), mMaxBuffer);
                if (ignoreBuffer)
                    toRead = bb.remaining();
                if (toRead > 0) {
                    bb.get(mPendingWrites, toRead);
                }
            }
        }
    }
}

Any comments will be greatly appreciated.

koush commented 9 years ago

It seems that the underlying AsyncSSLSocketWrapper is not triggering a writable event, so it gets stuck buffering. Let me see what is causing that.

Are you linking to Google Play Services or no?

What version of Android?

vladislavdonchev commented 9 years ago

I am linking to a Java WebSocket server implemented with Grizzly. So far, I have only tested this on KitKat 4.4.4 (Nexus 5 AOSP and Note 3 CM 11), but I can try it on other devices and OS versions as well. My target platform is 4.4+, but I have a bunch of phones and tablets laying around, running pretty much every release since Gingerbread.

I am more worried about the other issue described in my second post - Wireshark seems to report that TLSv1.0 is used, where I have specifically requested v1.2 on both the server and client. I am not sure how to approach this one, but it looks like there might be a problem with the protocol negotiation in the handshake. If I dig out more details I will post another issue about that.

Thanks and please let me know if I you need any additional information.

koush commented 9 years ago

How are you requesting TLS1.2 on the client?

vladislavdonchev commented 9 years ago

Here's my client code:

SSLContext sslContext = null;
TrustManagerFactory tmf = null;

try {
    KeyManagerFactory kmf = KeyManagerFactory.getInstance("X509");
    KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType());

    ks.load(getResources().openRawResource(R.raw.keystore), "pass".toCharArray());
    kmf.init(ks, "pass".toCharArray());

    tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
    KeyStore ts = KeyStore.getInstance(KeyStore.getDefaultType());
    ts.load(getResources().openRawResource(R.raw.keystore), "pass".toCharArray());
    tmf.init(ts);

    sslContext = SSLContext.getInstance("TLSv1.2");
    sslContext.init(kmf.getKeyManagers(), tmf.getTrustManagers(), null);
} catch (Exception e){
    Log.d("SSLCONFIG", e.toString(), e);
}

String address = preferences.getString("server", "192.168.0.104:8021/tp");

AsyncHttpClient.getDefaultInstance().getSSLSocketMiddleware().setSSLContext(sslContext);
AsyncHttpClient.getDefaultInstance().getSSLSocketMiddleware().setTrustManagers(tmf.getTrustManagers());
AsyncHttpClient.getDefaultInstance().getSSLSocketMiddleware().setConnectAllAddresses(true);
AsyncHttpClient.getDefaultInstance().websocket("wss://" + address, null, webSocketCallback);
chemary commented 9 years ago

Hello vladislavdonchev, after days investigating I arrived to the same conclusion and similar fix some months ago, I found disturbing anyone facing this same problem problem. In my case the connection is a WSS connection to a Tornado webserver, I tried direct connection, behind an Amazon ELB, behind nginx proxy and multiple configurations without any improvement until I applied the fix. The problem affects all Android versions and devices I have tested.

Even after applying the fix sometimes the connection becomes 1 packet out of sync (when the server sends a the packet to the mobile the mobile caches the response and will be sent when has next packet to output), I have not got time to investigate this issue, maybe the problem is not on the library.

koush commented 9 years ago

I'll tackle this tonight. It's probably something obvious.

vladislavdonchev commented 9 years ago

That's great news! I can also confirm the issue metioned by chemary, where the ACK message is sometimes delayed by the client. In my tests this causes the Grizzly backend to drop the connection and close the WS. However I am not sure whether the two problems are connected, so I will investigate further.

koush commented 9 years ago

Took a look, and didn't find anything wrong with the code. I'll try getting it to buffer a bunch of data in a test case and see what happens.

gcoutinho commented 9 years ago

Hello, I found this same issue in my chat app. In my case, my app must be connected full-time, and when I sent a large number of messages in same Thread of my Service class, the websocket not wrote them all, and in debug mode, I could see the "mPendingWrites" with a lot of messages. My solution was implement the Producer/Consumer pattern using BlockingQueue in a separated Thread, removing the responsability of the library. My code is look like this:

public class MessageService extends Service {

    private final BlockingQueue<String> sharedQueue = new LinkedBlockingQueue<>();

    private void sendMessage(String json) {

        sharedQueue.put(json); //PRODUCER
    }

    @Override
    public void onCreate() {
        super.onCreate();
        new Thread(new Consumer(sharedQueue)).start();
    }

    class Consumer implements Runnable {

        private final BlockingQueue<String> sharedQueue;

        public Consumer (BlockingQueue<String> sharedQueue) {
            this.sharedQueue = sharedQueue;
        }

        @Override
        public void run() {
            while(true){
                try {
                    websocket.send(sharedQueue.take());
                } catch (InterruptedException ex) {
                    ex.printStackTrace();
                }
            }
        }
    }
}

And works fine, in my case, the problem was not on library.

vladislavdonchev commented 9 years ago

@gcoutinho That's an interesting approach, but in my case the clients can be sending up to 30-40 messages per second (I am streaming live video through the WebSocket) and I am not sure how well will the producer / consumer pattern translate in this case. I am quite sure that the problem is somewhere in the SSL socket wrapper, as without the TLS configuration everything is working smoothly.

I will continue investigating this in a couple of days time, as I have been busy with other things. In the meanwhile, I am hoping that this turns out to be an easily reproducible problem, so koush can have a take on it as well.

TomWangTW commented 7 years ago

I found this same issue but it only happened on x86 device e.g : asus zenfone2, it looks every data stuck it mPendingWrites...