marimo-team / marimo

A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.
https://marimo.io
Apache License 2.0
5.35k stars 156 forks source link

marimo edit stops working when left open for multiple days (fd limit too low) #1610

Closed akshayka closed 3 weeks ago

akshayka commented 3 weeks ago

Describe the bug

A user left marimo edit (homepage) running for multiple days. There were 8 notebooks open in that edit session, each of which had also been running for multiple days.

The server eventually became degraded, in the following ways:

  1. some notebooks wouldn't open, claiming that the another client was connected (it wasn't)
  2. creating a new notebook failed with "Too many open files".

This suggests a possible file descriptor leak, somewhere in the server, but I'm not sure where. The server had to be force quit.

Snippet of logs attached. (The server was continually emitting error logs).

errors.txt

Logs when trying to create a new notebook:

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/uvicorn/protocols/websockets/websockets_impl.py", line 240, in run_asgi
    result = await self.app(self.scope, self.asgi_receive, self.asgi_send)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/middleware/errors.py", line 151, in __call__
    await self.app(scope, receive, send)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/marimo/_server/api/auth.py", line 201, in __call__
    return await super().__call__(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/middleware/sessions.py", line 85, in __call__
    await self.app(scope, receive, send_wrapper)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/middleware/authentication.py", line 49, in __call__
    await self.app(scope, receive, send)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/middleware/cors.py", line 77, in __call__
    await self.app(scope, receive, send)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/marimo/_server/api/middleware.py", line 64, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/routing.py", line 485, in handle
    await self.app(scope, receive, send)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/routing.py", line 373, in handle
    await self.app(scope, receive, send)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/routing.py", line 96, in app
    await wrap_app_handling_exceptions(app, session)(scope, receive, send)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/starlette/routing.py", line 94, in app
    await func(session)
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/marimo/_server/api/endpoints/ws.py", line 71, in websocket_endpoint
    await WebsocketHandler(
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/marimo/_server/api/endpoints/ws.py", line 307, in start
    await get_session()
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/marimo/_server/api/endpoints/ws.py", line 294, in get_session
    new_session = mgr.create_session(
                  ^^^^^^^^^^^^^^^^^^^
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/marimo/_server/sessions.py", line 524, in create_session
    self.sessions[session_id] = Session.create(
                                ^^^^^^^^^^^^^^^
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/marimo/_server/sessions.py", line 295, in create
    queue_manager = QueueManager(use_multiprocessing)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/delenn/.pyenv/versions/notebook/lib/python3.11/site-packages/marimo/_server/sessions.py", line 92, in __init__
    ] = context.Queue() if context is not None else queue.Queue()
        ^^^^^^^^^^^^^^^
  File "/Users/delenn/.pyenv/versions/3.11.4/lib/python3.11/multiprocessing/context.py", line 103, in Queue
    return Queue(maxsize, ctx=self.get_context())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/delenn/.pyenv/versions/3.11.4/lib/python3.11/multiprocessing/queues.py", line 43, in __init__
    self._rlock = ctx.Lock()
                  ^^^^^^^^^^
  File "/Users/delenn/.pyenv/versions/3.11.4/lib/python3.11/multiprocessing/context.py", line 68, in Lock
    return Lock(ctx=self.get_context())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/delenn/.pyenv/versions/3.11.4/lib/python3.11/multiprocessing/synchronize.py", line 162, in __init__
  File "/Users/delenn/.pyenv/versions/3.11.4/lib/python3.11/multiprocessing/synchronize.py", line 57, in __init__
OSError: [Errno 24] Too many open files

Environment

marimo 0.6.13, macOS.

Code to reproduce

No response

akshayka commented 3 weeks ago

Some more digging.

I opened 8 notebooks on macOS, without closing any. This led to > 256 open file descriptors, exceeding macOS' default limit of 256 open files. This caused marimo new to fail. In this case, the failure was not due to a leak but just due to for some reason we open a lot of files in the server.