koush / AndroidAsync

Asynchronous socket, http(s) (client+server) and websocket library for android. Based on nio, not threads.
Other
7.52k stars 1.56k forks source link

Occasional response timeout due to stuck in select #674

Open acheung-harmonicinc opened 4 years ago

acheung-harmonicinc commented 4 years ago

We use AndroidAsync to serve DASH stream. Occasionally, HTTP request to the AndroidAsync server stuck and end with timeout error.

We found that before the stuck, our application calls "write" method of AsyncHttpServerResponse.

val response: AsyncHttpServerResponse
response.write(ByteBufferList(data))  // data is ByteBuffer

With futher tracing, we found that the problem is caused by race condition in the select and wakeup logic in AsyncServer.java. In selector thread, "lockAndRunQueue" is called to complete scheduled tasks, then "selectNow" is called to check if there any ready selection key from the selector. Then if the there is ready key, the blocking "select" method is called. https://github.com/koush/AndroidAsync/blob/master/AndroidAsync/src/com/koushikdutta/async/AsyncServer.java#L821

However, if there is any enqueue and wakeup operation in-between "lockAndRunQueue" and the "synchronized" block of "selectNow", the effect of wakeup will be cleared by "selectNow" call. Then when heading to the "select" call in https://github.com/koush/AndroidAsync/blob/master/AndroidAsync/src/com/koushikdutta/async/AsyncServer.java#L838, the selector thread will block indefinitely. Consequently the write call of "AsyncHttpServerResponse" hangs and the request ends with timeout.

acheung-harmonicinc commented 4 years ago

In addition, I wonder if we really need the semaphore and blocking wakeup logic in SelectorWrapper.

To my understanding, the code prevent selector wakeup to be called before select is called.

However, from the Java documentation, it seems it is not necessary to wait for the select.

If no selection operation is currently in progress then the next invocation of one of these methods will return immediately unless the selectNow() method is invoked in the meantime.

https://docs.oracle.com/javase/7/docs/api/java/nio/channels/Selector.html#wakeup()

I tried to replace SelectorWrapper with NIO selector, and replace the selectNow+select logic with a simple select. Then the server seems to work fine as well, and I haven't observed any request stuck so far.

koush commented 4 years ago

This may have been an artifact of buggy behavior on older versions of Android.

acheung-harmonicinc commented 4 years ago

No. I am already using emulator image of Android 10.0

koush commented 4 years ago

Sorry, there's a misunderstanding. I believe the selector wrapper code is a holdover from very old android versions where it was necessary because selector did not behave as it should. Like Android v2, 3, and 4. I can likely be safely removed now.

acheung-harmonicinc commented 4 years ago

I see. I would like to see the usage of SelecterWrapper be replaced with plain Selector, this will simpliy the select and wakeup code (e.g. the blocking wait in wakeup). Thanks.