koush / AndroidAsync

Asynchronous socket, http(s) (client+server) and websocket library for android. Based on nio, not threads.
Other
7.52k stars 1.56k forks source link

Unknown WebSocket opcode sent by client causes connection close (maybe caused by malformed message). #341

Open vladislavdonchev opened 9 years ago

vladislavdonchev commented 9 years ago

This one is a bit tricky - very often when my application has been running for a while (say a couple of hours), sending binary (byte array) frames over the WebSocket will cause the server to close the connection because of a protocol error (1002).

I did some packet monitoring with Wireshark and saw that immediately before the protocol error and server-initiated socket close there is a WebSocket message sent from the client that contains unrecognized (at least by Wireshark) opcodes.

The value can be 4, 5, 11, 12 or 13 - very often there are multiple opcodes in the same WebSocket message (indicating multiple frames).

I looked in the RCC 6455 specification and saw that there are several standard opcodes defined:

 |Opcode  | Meaning                             | Reference |
-+--------+-------------------------------------+-----------|
 | 0      | Continuation Frame                  | RFC 6455  |
-+--------+-------------------------------------+-----------|
 | 1      | Text Frame                          | RFC 6455  |
-+--------+-------------------------------------+-----------|
 | 2      | Binary Frame                        | RFC 6455  |
-+--------+-------------------------------------+-----------|
 | 8      | Connection Close Frame              | RFC 6455  |
-+--------+-------------------------------------+-----------|
 | 9      | Ping Frame                          | RFC 6455  |
-+--------+-------------------------------------+-----------|
 | 10     | Pong Frame                          | RFC 6455  |
-+--------+-------------------------------------+-----------|

Could you please explain what the meaning of the additional codes that I am seeing is?

I am suspecting that for some reason the client is sending a malformed message and other data (maybe payload) is overlapping in the opcode bits for the frames causing these seemingly random opcodes to appear.

If the device is restarted the problem goes away - in fact this is the only way I can resolve it for now. I have tested this on Nexus 5 and Note 3, both running AOSP 4.4.4 and it only seems to reproduce on the Nexus...

I will perform tests with other devices when I get the chance. In the meanwhile I will be very grateful if someone can assist with debugging this in the library code.

Regards, Vlad

EntrepotJulienDurand commented 8 years ago

Hello some news about that ?

lazerwalker commented 8 years ago

I've been running into similar issues — I'm running a WebSocket server on an Android device, and my (node.js-based) WebSocket client is very unhappy that the Android server is sending it messages with non-standard opcodes.

Would be curious if anyone has run into this and managed to figure out whether this is due to malformed messages/frames, a bug in this repo's server code, or something else entirely.

vladislavdonchev commented 8 years ago

From what I've been able to dig out (although I haven't been working on that project for over an year now) is that some of the message frames are indeed malformed. Also the issue was more likely to reproduce on specific devices (like the above mentioned Nexus 5). Since I stopped working on the project I've encountered numerous other Android networking issues in different versions of the OS, ranging from routing bugs to socket implementation faults (especially with the relatively new multi network-IF mechanic)... So my (best) guess is that the behavior is somehow kernel-related.

Over the years Android has proven to be the absolute worst when it comes to supporting networking features and debugging them is what I imagine hell would be like...

Good luck!

lazerwalker commented 8 years ago

Ugh, thanks for sharing your experience / the warning. It's rare that posting in a years-old GH issue actually yields any response :)

Will update on the off-chance I have any luck on my end.

lazerwalker commented 8 years ago

As a brief update: I tried switching to https://github.com/TooTallNate/Java-WebSocket. While it's waaaay less full-featured than this library, it's working with my client test suite without any failures.

While the underlying cause of this bug might very well be something kernel-related, it does seem to be specific to this library rather than a universal truth of dealing with WebSockets on Android.

koush commented 8 years ago

You have a WebSocket unit test I can run the failure against @lazerwalker?

lazerwalker commented 8 years ago

@Koush I had a hard time coming up with an isolated test suite of my own — I have a test suite it consistently fails against, but I couldn't concretely find evidence the failure was in AndroidAsync instead of in the npm ws module that was being used as a client.

Instead, I tried to run AndroidAsync's WebSocket server implementation against the Autobahn testsuite, a pretty standard automated test suite for conformance with the WebSocket protocol. There were a whole bunch of failures; some were probably due to my unfamiliarity with the exact flavor of WebSocket echo server the tests expect, but many were clearly not.

There are results, and the example project used to generate them, here: https://github.com/lazerwalker/AndroidAsync-Autobahn-Test (my results are in wstest/results/server/index.html)

It'd be great to hear if there are plans to update the library to conform to the full letter of the WebSocket RFC. If it's useful, I'm happy to chip in to the effort!

(Even if it turns out I'm wrong and the failures are all due to implementation issues with my example project, I'd be happy to help fix them and help advertise that the library passes the spec!)

lazerwalker commented 8 years ago

(Also let me know if this should perhaps be its own GH Issue...)