No way to efficiently send binary data (Base64 may be unnecessary)

Blaizer commented 5 years ago

Currently the data field of a MatchDataSend message is processed bybtoa(JSON.stringify(data)) (and the inverse is done for receiving match data). If I want to send data efficiently, I'd like that data field to be as few bytes as possible. The problem is that both JSON.stringify and btoa are increasing the size of the message, for example:

As you can see, when I'm sending string data JSON.stringify adds an extra pair of "" onto the string, and adds some extra escape characters for non-printable characters. Then btoa increases the size of the string by another ~25%. But, before btoa comes along, the data is already "cleaned" of any non-printable characters by the JSON.stringify.

So that's why I'm unsure why btoa is even necessary, it seems to be only adding extra overhead. Furthermore, it actually breaks if you try to send non-ASCII/Latin1 data:

So I think at the very least atob/btoa should be considered for removal (although this would be a backcompat-breaking change) and we should think about some way of sending binary data efficiently (although I'm not much of a JavaScript guru so I don't' have the answers here).

Blaizer commented 5 years ago

I forgot to mention that another JSON.stringify is performed on the whole message so the result is closer to JSON.stringify(btoa(JSON.stringify(data))). The results are still the same, although the btoa form isn't always a flat 25% increase in this case:

And in some contrived examples the btoa form can actually make the message shorter:

So there's no best answer here, but I'd expect the non-btoa form to be better the majority of the time.

novabyte commented 5 years ago

Hi @Blaizer. Thanks for starting a discussion on this area of the realtime multiplayer API.

The design of the API is not accidental but instead follows the conversion between bytes to a compatible representation that can be sent from JavaScript with protocol buffers. Internally the game server takes all messages off the socket and if they come from the socket in "text mode" then it assumes JSON and converts the JSON to a protobuf message so that it can be handled in the same way as the "binary mode" of the game server socket.

We do this to provide the maximum compatibility with the game server across languages and game engines. I'm open to how we could make this better/more efficient. Do you have any suggestions on how best we could improve this approach?

Blaizer commented 5 years ago

I understand what you're saying, but I still disagree. JSON.stringify takes a value and returns a UTF-16 encoded string. That should be enough for it to no longer be bytes as you say, but instead text.

btoa expects a string that only has codepoints in the 0-255 range, but JSON.stringify returns a string that can contain printable chracters outside that range, hence why you get an error when trying to send data that contains codepoints outside the Latin1 range. The return value of btoa is a UTF-16 string, too, all strings in JavaScript are UTF-16, but now the codepoints are of a further restricted range.

The websocket takes the UTF-16 string and converts it one more time to UTF-8 before sending it, I think, based on my reading of JavaScript websockets. I think you have to configure the websocket manually to make it send bytes and not UTF-8.

The fact that the content of Channel Messages are only processed by JSON.stringify and not btoa at all further leads me to the conclusion that it's unnecessary. A good course of action would be to write some tests that attempt to send some messages with codepoints outside the normal range.

novabyte commented 4 years ago

@Blaizer I wanted to track this issue to investigate further. Any reason you closed the issue? Is it resolved?

Dimon4eg commented 4 years ago

As I understood, he wants to send binary data over socket but server supports only text. Websocket supports binary mode. @novabyte can server also support binary payload?

novabyte commented 4 years ago

@Dimon4eg Yes. The server supports both text and binary mode with messages but in binary mode the realtime protocol must be used. This requires that the client sdk use protocol buffers.

I'm not sure if it's possible to use protobuf in the browser at the moment.

Dimon4eg commented 4 years ago

I see.

I think we can do in following way:

serialize MatchData message in binary mode here: https://github.com/heroiclabs/nakama-js/blob/master/dist/nakama-js.umd.js#L1594-L1596 for example: opcode ; data
check if first byte is not { then process this message as MatchData in binary mode. https://github.com/heroiclabs/nakama-js/blob/master/dist/nakama-js.umd.js#L1438

novabyte commented 4 years ago

@Dimon4eg I think we can move the discussion about what to do with different message flows that could contain binary data off this closed issue.

Dimon4eg commented 4 years ago

👍

heroiclabs / nakama-js

No way to efficiently send binary data (Base64 may be unnecessary) #46