rawpython / remi

Python REMote Interface library. Platform independent. In about 100 Kbytes, perfect for your diet.
Apache License 2.0
3.48k stars 401 forks source link

REMI does not properly handle websocket messages split into multiple frames #480

Closed rf closed 2 years ago

rf commented 2 years ago

It appears that when the browser chunks a large callback message into multiple frames, REMI does not properly reassemble the frames, resulting in a truncated message.

In particular I have an app with a button and a large TextInput. It's used for a tool where someone can import a large dataset into my application. When they paste a large csv into the box and click the button, we use the value of the textbox to do some processing. Sometimes the value has been truncated, which makes the input unparseable.

So far I've traced this to the read_next_message function. If I print out the length of the received message, sometimes I get a value that's around the size of the csv that was pasted in the box like 72202. Then, if I make a few edits causing a few onchange events to fire, I'll see two messages come through: one with length 44855 and one with length 27350, the sum of which is 72205. So the message has been split into two chunks.

If we follow this into the parsing logic, we get to the on_message function. On line 267, we check to see if there are more than 3 chunks after splitting by /. If the data has some / in it, that will be true. The code then checks to see if the first chunk is 'callback'. This will be false, as that chunk is actually just part of the data. So, the function will simply exit with no error.

The result is that the value of the input textbox ends up randomly truncated when it's changed to a large amount of data. This seems to happen once you get to around 65kb of data in the box. I'm almost certain this is because the browser is chunking websocket message sent by REMI. I'm working on confirming this assumption before writing a fix, as it would be somewhat cumbersome.

// edit: attached a PR, feedback would be appreciated!

dddomodossola commented 2 years ago

Hello @rf ,

Thank you a lot for collaborating on this project. I will try to reproduce the problem this late evening and check your pull request. The problem is of course to be fixed. However for your specific project, I suggest you to use FileUploader widget instead of copy and paste in a TextInput.

rf commented 2 years ago

We actually have a wrapped text input that allows either, so we have a workaround. Thanks for the quick response!

dddomodossola commented 2 years ago

Hello @rf , You made me not to sleep tonight :-D . Please look at the last commit. I fixed the problem by using the websocket fin bit. That bit indicates when the message is completely received and can be parsed. Please give it a try and let me know if works for you. Kind Regads

rf commented 2 years ago

Haha, thank you so much for taking a look at this!

Websocket fin bit definitely sounds like the right solution. I'm not super familiar with WS protocol.

I'm still able to get an error when editing my input with a bunch of data, though. Now I'm seeing this:

Traceback (most recent call last):
  File "/Users/rf/.pyenv/versions/3.10.0/envs/magic-3.10.0/lib/python3.10/site-packages/remi/server.py", line 283, in on_message
    param_dict = parse_parametrs(params)
  File "/Users/rf/.pyenv/versions/3.10.0/envs/magic-3.10.0/lib/python3.10/site-packages/remi/server.py", line 312, in parse_parametrs
    l = int(s[0])  # length of param field
ValueError: invalid literal for int() with base 10: 'Ã!Ú\'\x96#Â?\x95s\x91"ÚwÆ"\x95+\x93+ÄtÀq\x947ÅQß]³W¹7Å"¦SÞ7Å"»w\x9a}\x99#ÚjÚ!Ú ¢BÒ ´!Ò ´+Ãv\x94%Î Ï?ÇpÅ!Ú&\x91t\x92?Ïp\x95sÚ#\x93&ÅqÁvÏ"\x91\'\x927ÅQß]³W¹7Å"¦SÞ7Å"¤f\x96u\x9e'

So maybe data is not getting properly unmasked?

dddomodossola commented 2 years ago

Hello @rf,

can you please provide me a file containing the data that causes the problem?

rf commented 2 years ago

Sure, use this:

import remi.gui as gui
from remi import start, App

class MyApp(App):
    def __init__(self, *args):
        super(MyApp, self).__init__(*args)

    def main(self):
        container = gui.VBox(width=120, height=100)
        self.input = gui.TextInput(single_line=False)
        container.append(self.input)
        return container

start(MyApp)

And paste > 65k of data into the text box (e.g. use https://www.lipsum.com/ to generate 80kb of lorem ipsum). Then, make a few edits to the text (just add spaces a few times) and check console. After about 4 edits I see the bug. Browser is Version 98.0.4758.102 (Official Build) (x86_64). Browser version probably matters as it's up to the browser whether it would like to chunk the WS message.

remi.server.ws   ERROR    error parsing websocket
Traceback (most recent call last):
  File "/Users/rf/oden/remi/showbug/../remi/server.py", line 288, in on_message
    param_dict = parse_parametrs(params)
  File "/Users/rf/oden/remi/showbug/../remi/server.py", line 317, in parse_parametrs
    l = int(s[0])  # length of param field
ValueError: invalid literal for int() with base 10: 'zJoLcMg\x1d8\x08nMc\x16/\x08K\x1d:yZJk]y]dL/\n:[eUgWnW/\n:NoK~QhMfMg\x1d8\x08kV~]/\n:Y/\n:]\x7fQyUe\\$\x1d8\x08\\Q'
rf commented 2 years ago

I figured it out, it was indeed because of an issue with the unmasking.