marimo-team / marimo

A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.
https://marimo.io
Apache License 2.0
7.66k stars 263 forks source link

BlockingIO Error #1568

Closed andrewhill157 closed 1 month ago

andrewhill157 commented 5 months ago

Describe the bug

I apologize in advance that this might be a bit annoying to reproduce.

When loading an app like the example given below, some fraction of the time it will fail to run and yield the following type of error (after which point the user has to refresh the app to get things going again):

Exception in callback <bound method Distributor._on_change of <marimo._utils.distributor.Distributor object at 0x7fbf4d755c60>>
handle: <Handle Distributor._on_change>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
  File "/ceph/home/andrew/temp/absel/env/lib/python3.10/site-packages/marimo/_utils/distributor.py", line 54, in _on_change
    response = self.input_connection.recv()
  File "/ceph/home/andrew/miniconda3/envs/test_env/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/ceph/home/andrew/miniconda3/envs/test_env/lib/python3.10/multiprocessing/connection.py", line 426, in _recv_bytes
    return self._recv(size)
  File "/ceph/home/andrew/miniconda3/envs/test_env/lib/python3.10/multiprocessing/connection.py", line 384, in _recv
    chunk = read(handle, remaining)
BlockingIOError: [Errno 11] Resource temporarily unavailable
Exception in callback <bound method Distributor._on_change of <marimo._utils.distributor.Distributor object at 0x7fbf4d755c60>>

handle: <Handle Distributor._on_change>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
  File "/ceph/home/andrew/temp/absel/env/lib/python3.10/site-packages/marimo/_utils/distributor.py", line 54, in _on_change
    response = self.input_connection.recv()
  File "/ceph/home/andrew/miniconda3/envs/test_env/lib/python3.10/multiprocessing/connection.py", line 256, in recv
    return _ForkingPickler.loads(buf.getbuffer())
_pickle.UnpicklingError: invalid load key, '\x07'.
Exception in callback <bound method Distributor._on_change of <marimo._utils.distributor.Distributor object at 0x7fbf4d755c60>>
handle: <Handle Distributor._on_change>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
  File "/ceph/home/andrew/temp/absel/env/lib/python3.10/site-packages/marimo/_utils/distributor.py", line 54, in _on_change
    response = self.input_connection.recv()
  File "/ceph/home/andrew/miniconda3/envs/test_env/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/ceph/home/andrew/miniconda3/envs/test_env/lib/python3.10/multiprocessing/connection.py", line 426, in _recv_bytes
    return self._recv(size)
  File "/ceph/home/andrew/miniconda3/envs/test_env/lib/python3.10/multiprocessing/connection.py", line 384, in _recv
    chunk = read(handle, remaining)
BlockingIOError: [Errno 11] Resource temporarily unavailable

but other times it loads totally fine.

I initially noticed the error above on a linux server I'm using to deploy the app, but I've also observed something similar locally on my mac but seemingly much less frequently (I had to sit there refreshing for a good bit, whereas is much easier to catch on the server):

Exception in callback <bound method Distributor._on_change of <marimo._utils.distributor.Distributor object at 0x15d9f67a0>>
handle: <Handle Distributor._on_change>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
  File "/Users/andrewhill/Desktop/absel/env/lib/python3.10/site-packages/marimo/_utils/distributor.py", line 54, in _on_change
    response = self.input_connection.recv()
  File "/Users/andrewhill/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/Users/andrewhill/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/connection.py", line 426, in _recv_bytes
    return self._recv(size)
  File "/Users/andrewhill/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/connection.py", line 384, in _recv
    chunk = read(handle, remaining)
BlockingIOError: [Errno 35] Resource temporarily unavailable
Exception in callback <bound method Distributor._on_change of <marimo._utils.distributor.Distributor object at 0x15d9f67a0>>
handle: <Handle Distributor._on_change>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
  File "/Users/andrewhill/Desktop/absel/env/lib/python3.10/site-packages/marimo/_utils/distributor.py", line 54, in _on_change
    response = self.input_connection.recv()
  File "/Users/andrewhill/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/Users/andrewhill/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/connection.py", line 426, in _recv_bytes
    return self._recv(size)
  File "/Users/andrewhill/.pyenv/versions/3.10.4/lib/python3.10/multiprocessing/connection.py", line 384, in _recv
    chunk = read(handle, remaining)
BlockingIOError: [Errno 35] Resource temporarily unavailable

Environment

Linux

{
  "marimo": "0.6.11",
  "OS": "Linux",
  "OS Version": "6.1.80-1.el9.elrepo.x86_64",
  "Processor": "x86_64",
  "Python Version": "3.10.4",
  "Binaries": {
    "Browser": "--",
    "Node": "--"
  },
  "Requirements": {
    "click": "8.1.7",
    "importlib-resources": "missing",
    "jedi": "0.19.1",
    "markdown": "3.6",
    "pymdown-extensions": "10.8.1",
    "pygments": "2.18.0",
    "tomlkit": "0.12.5",
    "uvicorn": "0.30.1",
    "starlette": "0.37.2",
    "websocket": "missing",
    "typing-extensions": "4.12.1",
    "black": "24.4.2"
  }
}

Mac

{
  "marimo": "0.6.11",
  "OS": "Darwin",
  "OS Version": "23.3.0",
  "Processor": "arm",
  "Python Version": "3.10.4",
  "Binaries": {
    "Browser": "--",
    "Node": "v21.7.1"
  },
  "Requirements": {
    "click": "8.1.3",
    "importlib-resources": "missing",
    "jedi": "0.19.1",
    "markdown": "3.6",
    "pymdown-extensions": "10.8.1",
    "pygments": "2.17.2",
    "tomlkit": "0.12.4",
    "uvicorn": "0.29.0",
    "starlette": "0.37.2",
    "websocket": "missing",
    "typing-extensions": "4.11.0",
    "black": "24.4.2"
  }
}

Code to reproduce

import marimo

__generated_with = "0.6.11"
app = marimo.App(width="full")

@app.cell
def __():
    import pandas as pd
    import numpy as np
    import marimo as mo
    import polars as pl
    import altair as alt
    alt.data_transformers.enable("marimo_csv")

    def create_random_dataset(N):
        x = np.arange(1, N+1)
        y = np.random.rand(N)
        labels = map(str, y)

        return pl.DataFrame({'x': x, 'y': y, 'label': labels})

    # Define the number of points
    N = 200000  # You can change this value to any desired number of points

    def make_plot(n):
        diagonal = (
            alt.Chart(pd.DataFrame({"x": [1, 10000], "y": [1, 10000]}))
            .mark_line(color="red", strokeDash=[10, 10])
            .encode(x="x", y="y")
        )

        dataset = create_random_dataset(n)
        return diagonal + (
            alt.Chart(dataset.to_pandas())
            .mark_point()
            .encode(
                x=alt.X(
                    f"x:Q",
                    title=f"log10(UMI + 1) for",
                    scale=alt.Scale(type="log", base=10),
                ),
                y=alt.Y(
                    f"y:Q",
                    title=f"log10(UMI + 1) for",
                    scale=alt.Scale(type="log", base=10),
                ),
                tooltip=["x", "y", "label"],
            )
        )

    # Display the plots
    interface_elements = [mo.md("#My App"), mo.accordion({"Help": "Help text"})]
    interface_elements.append(mo.hstack([make_plot(N) for _ in range(0, 3)]))
    interface_elements.append(mo.hstack([make_plot(N) for _ in range(0, 3)]))
    mo.vstack(interface_elements)

    return (
        N,
        alt,
        create_random_dataset,
        interface_elements,
        make_plot,
        mo,
        np,
        pd,
        pl,
    )

if __name__ == "__main__":
    app.run()
andrewhill157 commented 5 months ago

Note that increasing the number of points also seems to increase the chances that this will happen, which is why I've set it to a high value here.

andrewhill157 commented 5 months ago

Also the times I've caught this have all been when running the app with marimo run and refreshing, etc.

ross-at-finix commented 4 months ago

I have seen this as well with loading data into tables in a kubernetes app deployment. No good insights to really offer (yet), but I'm suspicious it's related to full buffers and EAGAIN/EWOULDBLOCK signals when trying to write data to a socket for the "frontend". I "customized" our table view to forcibly paginate, limiting how much data would be sent I presume, as a workaround/debugging step and the issue 100% disappeared.

If this is hinting at what the actual cause is, it's not really in marimo but perhaps there are socket usage settings that can avoid it?

CedrusZhao commented 3 months ago

I meet the same problem

akshayka commented 3 months ago

Thanks everyone for documenting this issue.

PR #1822 is an attempt to mitigate this issue by just retrying the socket recv() after a short wait.

Re @ross-at-finix's hypothesis: we do use a TCP socket to facilitate communication between the kernel process and the server. Right now that socket is created and managed by a multiprocessing.connection.Listener object, which as far as I can tell doesn't allow for increasing the socket buffer size. Perhaps we should just deal with a socket directly.

More context: In marimo run mode, the kernel(s) and server are actually in the same process, so in theory we could just use a simpler communication method in run mode to get around this problem. The socket would still be needed for edit mode, during which the kernel is run in a separate process (so that its execution can be easily interrupted).

ross-at-finix commented 3 months ago

Thanks @akshayka ; I've done a very surface level check with our app by building that PR and using it in our QA deployment and the standard table elements are working great.

akshayka commented 3 months ago

Thanks @akshayka ; I've done a very surface level check with our app by building that PR and using it in our QA deployment and the standard table elements are working great.

That's great, thank you for checking! Version 0.7.8 includes the fix, and should be available on PyPI soon.

Sonali-bapte commented 3 months ago

after upgrading to Version 0.7.8, I am facing same issue but with different error. Exception in callback <bound method Distributor._on_change of <marimo._utils.distributor.Distributor object at 0x7f7728327090>> handle: Traceback (most recent call last): File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run File "/root/.cache/pypoetry/virtualenvs/justask-4h5SbXZT-py3.11/lib/python3.11/site-packages/marimo/_utils/distributor.py", line 56, in _on_change response = self.input_connection.recv() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/multiprocessing/connection.py", line 250, in recv return _ForkingPickler.loads(buf.getbuffer()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ _pickle.UnpicklingError: invalid load key, '\x07'. [W 240719 11:19:59 distributor:59] BlockingIOError in distributor receive: [Errno 11] Resource temporarily unavailable

TedSinger commented 1 month ago

I have the same error as @Sonali-bapte with version 0.8.12, uvicorn version 0.30.6

andrewhill157 commented 1 month ago

just to update, I also pretty regularly see the @Sonali-bapte mentioned

akshayka commented 1 month ago

@andrewhill157 @TedSinger @Sonali-bapte can you share more context of when you see this error?

@andrewhill157 does your original reproduction still surface this issue?

andrewhill157 commented 1 month ago

The example I provided above doesn't seem to reproduce the issue reliably for me at this point (current marimo release) in either my linux or mac environments. I have different app that is more intensive but tricky to share that regularly produces the same error message as @Sonali-bapte mentioned above (this was the app that led me to try and make the simpler example in the first place). It looks like you have a PR in progress, but let me know if trying to boil it down to something simpler and usable on your end would still be helpful

akshayka commented 1 month ago

Thanks for writing back @andrewhill157.

I've merged a fix, and can release it later today. It solves the issue by using an in-memory queue instead of a socket, which is not needed for run mode. So there's no pickling or TCP connection involved.

akshayka commented 1 month ago

Version 0.8.19 is available on PyPI and contains the fix.

andrewhill157 commented 1 month ago

Thank you, much appreciated! Seems much more reliable for the app I mentioned so far

akshayka commented 1 month ago

Thank you, much appreciated! Seems much more reliable for the app I mentioned so far

That's great! Thanks for letting me know.