stefanw / channels-yroom

Django Channels WebSocket consumer and worker for synchronizing Yjs clients
https://channels-yroom.readthedocs.io/en/latest/
MIT License
15 stars 5 forks source link

Convert memoryview to bytes in create_room_from_snapshot #13

Closed zswaff closed 1 year ago

zswaff commented 1 year ago

Background

When the create_room_from_snapshot function is triggered and the snapshot is retrieved, it comes back from the DB as a memoryview rather than a bytes object. These objects are mostly compatible but not always and something that happens in room_manager.connect_with_data requires a bytes specifically. We were getting Caught exception: argument 'data': 'bytes' object cannot be interpreted as an integer every time from the connect_with_data call. The wrapping bytes() call duplicates the memory, which is slightly inefficient, but worth it to solve the bug.

Separately, I removed a return statement to match the annotated types.

Repro

For us, these repro steps worked:

  1. Configure everything with a DB storage using a PosgreSQL DB
  2. Run a server and the yroom worker
  3. Start a websocket client and ensure it is connected
  4. Restart the yroom worker but maintain the client <> server websocket connection
  5. Do something on the client

This should cause the worker to attempt to restore from snapshot and then trigger the 'bytes' object cannot be interpreted as an integer exception.

zswaff commented 1 year ago

@stefanw per our messages on the YJS forum, here's a PR for some of the issues we were seeing. Let me know if anything can be improved.

stefanw commented 1 year ago

Ah interesting, I didn't run into this! Django's BinaryField says it accepts bytes, bytearray, or memoryview but it doesn't state what it returns from the database! The code looks like it could be database dependent what type is returned? What database were you using?

zswaff commented 1 year ago

I added a few clarifying comments in the main PR description. We are using a Posgres DB, and it definitely returns a memoryview. What are you using, and what is the type of snapshot for you (e.g. if you don't use the bytes() call I added and just print(type(snapshot))? If this is bytes for you I guess that will explain the difference in what we are seeing. Perhaps Django only guarantees a 'bytes-like' object.

If this is the case I guess we have a few options that I see:

  1. Use this change
  2. Move the call to bytes() to be inside the DB implementation of get_snapshot()
  3. Add more conditions (e.g. only do the conversion if type(snapshot) != 'bytes')
  4. Debug deeper into connect_with_data() to see why it doesn't like the memoryview (I already tried this a little, more could be done but I probably don't have the bandwidth for it myself)

What do you think?

stefanw commented 1 year ago

The tests run on sqlite and there the db returns bytes on BinaryField but I checked and Postgres returns memoryview.

I implemented a check and conversion in the manager's get_snapshot in e6af1e08bd8051a4258d68f637b90fa2cf99b403.

The connect_with_data goes into Rust where it is expecting Vec<u8> and memoryview is just not compatible.