jupyterlab / jupyter-collaboration

A Jupyter Server Extension Providing Support for Y Documents
https://jupyterlab-realtime-collaboration.readthedocs.io/en/latest/
Other
157 stars 30 forks source link

Time-to-live support not working anymore? #315

Closed asteppke closed 2 months ago

asteppke commented 4 months ago

For notebooks that create a lot of output the .jupyter_ystore.db can get rather large and unfortunately overflows our users' quota easily.

In this example here

import time
for i in range(100_000_000):
    print(f"{i}, ", end="")
    time.sleep(0.05)

the size of the notebook file after a runtime of a few minutes grows to 1.5 MB. On the other hand the corresponding .jupyter_ystore.db grows to 580 MB.

I have read parts of the discussions around the database and I have a rough understanding of the the complications that makes solving this quite challenging. For now the time-to-live option seems like a suitable workaround to limit the growth to some extend. At the moment this does not seem to work (anymore?) though.

When starting a new session with

jupyter lab --SQLiteYStore.document_ttl=600

I only receive the following error messages:

[I 2024-05-17 16:08:01.659 ServerApp] Creating new notebook in
[I 2024-05-17 16:08:01.722 ServerApp] Request for Y document 'Untitled10.ipynb' with room ID: 780564de-e0da-492a-9d14-af545441c896
[I 2024-05-17 16:08:01.913 YDocExtension] Creating FileLoader for: Untitled10.ipynb
[I 2024-05-17 16:08:01.914 YDocExtension] Watching file: Untitled10.ipynb
[I 2024-05-17 16:08:01.915 ServerApp] Initializing room json:notebook:780564de-e0da-492a-9d14-af545441c896
[I 2024-05-17 16:08:01.935 ServerApp] Content in room json:notebook:780564de-e0da-492a-9d14-af545441c896 loaded from file Untitled10.ipynb
[E 2024-05-17 16:08:01.937 ServerApp] Error initializing: Untitled10.ipynb
    TypeError("'>' not supported between instances of 'int' and 'DeferredConfigString'")
    Traceback (most recent call last):
      File "C:\tools\miniconda3\envs\data\Lib\site-packages\jupyter_collaboration\handlers.py", line 233, in open
        await self.room.initialize()
      File "C:\tools\miniconda3\envs\data\Lib\site-packages\jupyter_collaboration\rooms.py", line 151, in initialize
        await self.ystore.encode_state_as_update(self.ydoc)
      File "C:\tools\miniconda3\envs\data\Lib\site-packages\pycrdt_websocket\ystore.py", line 145, in encode_state_as_update
        await self.write(update)
      File "C:\tools\miniconda3\envs\data\Lib\site-packages\pycrdt_websocket\ystore.py", line 473, in write
        if self.document_ttl is not None and diff > self.document_ttl:
                                             ^^^^^^^^^^^^^^^^^^^^^^^^
    TypeError: '>' not supported between instances of 'int' and 'DeferredConfigString'
[I 2024-05-17 16:08:01.940 ServerApp] Deleting Y document from memory: json:notebook:780564de-e0da-492a-9d14-af545441c896
[I 2024-05-17 16:08:01.940 ServerApp] Room json:notebook:780564de-e0da-492a-9d14-af545441c896 deleted
[I 2024-05-17 16:08:01.941 ServerApp] Deleting file Untitled10.ipynb
[E 2024-05-17 16:08:01.943 ServerApp] Exception in callback functools.partial(<function WebSocketProtocol._run_callback.<locals>.<lambda> at 0x0000023308FE4A40>, <Task finished name='Task-734' coro=<YDocWebSocketHandler.on_message() done, defined at C:\tools\miniconda3\envs\data\Lib\site-packages\jupyter_collaboration\handlers.py:277> exception=AttributeError("'YDocWebSocketHandler' object has no attribute 'room'")>)
    Traceback (most recent call last):
      File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\ioloop.py", line 750, in _run_callback
        ret = callback()
              ^^^^^^^^^^
      File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\websocket.py", line 640, in <lambda>
        self.stream.io_loop.add_future(result, lambda f: f.result())
                                                         ^^^^^^^^^^
      File "C:\tools\miniconda3\envs\data\Lib\site-packages\jupyter_collaboration\handlers.py", line 286, in on_message
        changes = self.room.awareness.get_changes(message[1:])
                  ^^^^^^^^^
    AttributeError: 'YDocWebSocketHandler' object has no attribute 'room'
[E 2024-05-17 16:08:01.945 ServerApp] Uncaught exception GET /api/collaboration/room/json:notebook:780564de-e0da-492a-9d14-af545441c896?sessionId=19a409eb-52ee-46c9-9d32-d39d007e0a9a (::1)
    HTTPServerRequest(protocol='http', host='localhost:8888', method='GET', uri='/api/collaboration/room/json:notebook:780564de-e0da-492a-9d14-af545441c896?sessionId=19a409eb-52ee-46c9-9d32-d39d007e0a9a', version='HTTP/1.1', remote_ip='::1')
    Traceback (most recent call last):
      File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\web.py", line 1790, in _execute
        result = await result
                 ^^^^^^^^^^^^
      File "C:\tools\miniconda3\envs\data\Lib\site-packages\jupyter_collaboration\handlers.py", line 209, in get
        return await super().get(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\websocket.py", line 273, in get
        await self.ws_connection.accept_connection(self)
      File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\websocket.py", line 863, in accept_connection
        await self._accept_connection(handler)
      File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\websocket.py", line 946, in _accept_connection
        await self._receive_frame_loop()
      File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\websocket.py", line 1102, in _receive_frame_loop
        await self._receive_frame()
      File "C:\tools\miniconda3\envs\data\Lib\site-packages\tornado\websocket.py", line 1193, in _receive_frame
        await handled_future
    AttributeError: 'YDocWebSocketHandler' object has no attribute 'room'
Traceback (most recent call last):
  File "C:\tools\miniconda3\envs\data\Lib\collections\__init__.py", line 449, in _make
    result = tuple_new(cls, iterable)
             ^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: pycrdt::map::Map is unsendable, but is being dropped on another thread
Traceback (most recent call last):
  File "C:\tools\miniconda3\envs\data\Lib\collections\__init__.py", line 449, in _make
    result = tuple_new(cls, iterable)
             ^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: pycrdt::map::Map is unsendable, but is being dropped on another thread
Traceback (most recent call last):
  File "C:\tools\miniconda3\envs\data\Lib\collections\__init__.py", line 449, in _make
    result = tuple_new(cls, iterable)
             ^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: pycrdt::doc::Doc is unsendable, but is being dropped on another thread
Traceback (most recent call last):
  File "C:\tools\miniconda3\envs\data\Lib\collections\__init__.py", line 449, in _make
    result = tuple_new(cls, iterable)
             ^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: pycrdt::array::Array is unsendable, but is being dropped on another thread

Is the the ttl-option still supported or is there another or better way to limit the size of the database?

asteppke commented 3 months ago

After looking around a bit it turns out that this is connected to the special treatment of traitlets within jupyter-collaboration. Traitlets processes arguments in two steps and the second step is not executed here, so instead of an Integer traitlet we only obtain a DeferredConfigString.

As a workaround one can cast this simply with a

self.document_tll = int(self.document_ttl) if self.document_ttl is not None else None

without negative consequences. I do not know what the plan is regarding the traitlets integration, but for now that restores the document time-to-live functionality.

davidbrochart commented 3 months ago

Thanks @asteppke for following up on this. Would you like to send a PR?

asteppke commented 3 months ago

@davidbrochart: I put together a small PR that addresses this issue.

krassowski commented 3 months ago

this is connected to the special treatment of traitlets within jupyter-collaboration

do you mean the below lines?

https://github.com/jupyterlab/jupyter-collaboration/blob/86ef807c45658b91d865fc821de54839fd2522ba/projects/jupyter-server-ydoc/jupyter_server_ydoc/app.py#L98-L100

This looks like an incorrect usage of traitlets at first glance. Instead passing a partial with config= could help.

asteppke commented 3 months ago

@krassowski Yes, these are the lines that I mean. As far as I can see traitlets is indeed not meant to be used like that. I did not want to change more than absolutely necessary in this pull request here though.

krassowski commented 2 months ago

I opened https://github.com/jupyterlab/jupyter-collaboration/pull/322 with a clean fix.