googollee / go-socket.io

socket.io library for golang, a realtime application framework.
Other
5.63k stars 826 forks source link

Fix(ish) for unicode encoding in packet #608

Closed grahamjenson closed 10 months ago

grahamjenson commented 10 months ago

So this is a horrible problem and I am pretty sure this is not a 100% fix, but does fix some of the problems.

The packet encoding which includes the length of the packet data is incorrect because Javascript.length is not byte length and not Rune length, there is no direct analog in Golang but Javascript length is always less than either of those. Add to that, we must do this encoding/decoding in a stream makes it even harder.

I fixed this in

  1. Decode: increasing the limit of the limit reader while reading the packet based on the unicode header we just saw, this means the packet reader will keep reading further
  2. Encode: increasing the calculated length by scanning for unicode header bytes and adding to the length based on their byte value.

Both these solution are not 100% because the utf-8 header bytes are not a 1-1 value with the UCS-2 that JS uses. The proper solution would be to upgrade to socket.io version 4 (which fixes this using seperator bytes), but I need version 3 to at least kind-of work.

I have not performance tested either decoding or encoding, but I am pretty sure there wont be a massive overhead.

I added some tests, and fixed the old tests that were incorrect and errored when talking to a JS server with.

Without this fix you get some knarly errors where readers end half way though a message, or send to Javascript the incorrect amount of bytes which can cause server errors.

Reading more here: https://mathiasbynens.be/notes/javascript-encoding https://socket.io/blog/engine-io-4-release/#packet-encoding https://socket.io/docs/v4/engine-io-protocol/#from-v3-to-v4

grahamjenson commented 10 months ago

Not sure why these benchmarks are breaking, but this should be ready for merging and testing.

erkie commented 10 months ago

This is a great fix. Nice that you were able to hunt it down and get to the bottom of it. Code looks great, and it doesn't look like a breaking change. So merging this 👍