Closed andrewhill157 closed 1 month ago
Note that increasing the number of points also seems to increase the chances that this will happen, which is why I've set it to a high value here.
Also the times I've caught this have all been when running the app with marimo run
and refreshing, etc.
I have seen this as well with loading data into tables in a kubernetes
app deployment. No good insights to really offer (yet), but I'm suspicious it's related to full buffers and EAGAIN
/EWOULDBLOCK
signals when trying to write data to a socket for the "frontend". I "customized" our table view to forcibly paginate, limiting how much data would be sent I presume, as a workaround/debugging step and the issue 100% disappeared.
If this is hinting at what the actual cause is, it's not really in marimo
but perhaps there are socket usage settings that can avoid it?
I meet the same problem
Thanks everyone for documenting this issue.
PR #1822 is an attempt to mitigate this issue by just retrying the socket recv()
after a short wait.
Re @ross-at-finix's hypothesis: we do use a TCP socket to facilitate communication between the kernel process and the server. Right now that socket is created and managed by a multiprocessing.connection.Listener
object, which as far as I can tell doesn't allow for increasing the socket buffer size. Perhaps we should just deal with a socket directly.
More context: In marimo run
mode, the kernel(s) and server are actually in the same process, so in theory we could just use a simpler communication method in run mode to get around this problem. The socket would still be needed for edit mode, during which the kernel is run in a separate process (so that its execution can be easily interrupted).
Thanks @akshayka ; I've done a very surface level check with our app by building that PR and using it in our QA deployment and the standard table elements are working great.
Thanks @akshayka ; I've done a very surface level check with our app by building that PR and using it in our QA deployment and the standard table elements are working great.
That's great, thank you for checking! Version 0.7.8 includes the fix, and should be available on PyPI soon.
after upgrading to Version 0.7.8, I am facing same issue but with different error.
Exception in callback <bound method Distributor._on_change of <marimo._utils.distributor.Distributor object at 0x7f7728327090>>
handle:
I have the same error as @Sonali-bapte with version 0.8.12, uvicorn version 0.30.6
just to update, I also pretty regularly see the @Sonali-bapte mentioned
@andrewhill157 @TedSinger @Sonali-bapte can you share more context of when you see this error?
@andrewhill157 does your original reproduction still surface this issue?
The example I provided above doesn't seem to reproduce the issue reliably for me at this point (current marimo release) in either my linux or mac environments. I have different app that is more intensive but tricky to share that regularly produces the same error message as @Sonali-bapte mentioned above (this was the app that led me to try and make the simpler example in the first place). It looks like you have a PR in progress, but let me know if trying to boil it down to something simpler and usable on your end would still be helpful
Thanks for writing back @andrewhill157.
I've merged a fix, and can release it later today. It solves the issue by using an in-memory queue instead of a socket, which is not needed for run mode. So there's no pickling or TCP connection involved.
Version 0.8.19 is available on PyPI and contains the fix.
Thank you, much appreciated! Seems much more reliable for the app I mentioned so far
Thank you, much appreciated! Seems much more reliable for the app I mentioned so far
That's great! Thanks for letting me know.
Describe the bug
I apologize in advance that this might be a bit annoying to reproduce.
When loading an app like the example given below, some fraction of the time it will fail to run and yield the following type of error (after which point the user has to refresh the app to get things going again):
but other times it loads totally fine.
I initially noticed the error above on a linux server I'm using to deploy the app, but I've also observed something similar locally on my mac but seemingly much less frequently (I had to sit there refreshing for a good bit, whereas is much easier to catch on the server):
Environment
Linux
Mac
Code to reproduce