ipython / ipykernel

IPython Kernel for Jupyter
https://ipykernel.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
633 stars 362 forks source link

Async widget methods deadlock due to on_msg not being called #646

Open ilyabo opened 3 years ago

ilyabo commented 3 years ago

We are developing a Jupyter widget which makes it possible to embed interactive Unfolded maps into notebooks:

image

Our widget is embedded via an iframe and so the communications with the JavaScript side are asynchronous. Some of the functions we want to add to the widget need to return data asynchronously. To communicate between Jupyter and JS we are using widget's send and on_msg methods. Our methods return futures. It works fine - the messages are being sent and received, but only if we do it in two steps:

image

When we are trying to use await so that we can have the results and do something with them within one cell, the execution is blocked forever:

image

We tried to investigate the cause. It turns out that the callback we register with widget.on_msg only gets called once the execution of the cell is finished. Hence, if we wait until we obtain the result of the calculation within the same cell where the method was called, we end up in a deadlock.

So far we've had no success in finding a workaround, despite trying many things. For instance, here we run the communication in a separate thread and wait until the query response is added to a blocking queue: image

Calling the above function fails (after the 2 seconds timeout) with an error saying that the queue is empty: image And we don't see the _receive function being called.

However, if we check the queue immediately after getting this error, we can see that the queue is not empty - the response was added to it, apparently, just after the error caused the cell execution to be cancelled: image

It might be the same issue as this one: https://github.com/ipython/ipython/issues/12786 but I am not completely sure. I tried installing and running the kernel with the fix from this PR, but I was getting errors related to traitlet messaging when trying to instantiate our widget.

MSeal commented 3 years ago

I believe this issue is root caused by the mentioned issue / PR and that threading against messages won't solve the issue because the Comms message buffer is deadlocked with the cell execution await.

I also was digging through some issues one of our clients had around widgets and they eventually accidentally solved a similar in-line result wait problem by separating the widget value fetch and the request into two cell like above in a more complicated setting. However it's easy to fall back into the deadlock if their notebook widget use is ever refactored or shared in a new notebook.

@SylvainCorlay @jasongrout @Carreau Since this overlaps with widget interactions a bit. Can we determine if the two issues are the same / what in the proposed PR linked needs additional attention to potentially address the root problem? If there's testing or communication patterns that need to be evaluated / documented around particular widgets that might be impacted I could put a little time to contribute there or in ipykernel PRs if it helps.

SylvainCorlay commented 3 years ago

This possibility of a deadlock is somewhat intrinsic to the protocol. You won't process a comm message until you are finished processing an execution request...

In #589, @sonthonaxrk proposed that comm messages could be handled right away, even before other shell messages are finished processing. However, this is probably too disruptive of the protocol.

We had a couple of discussions about this lately, and another idea that was envisioned was to use the same mechanism as the input request to make really blocking calls to the front-end.

ilyabo commented 3 years ago

Thanks @SylvainCorlay! Could you elaborate on how really blocking calls would work? Would it solve our issue? For our API we would actually prefer if the calls were blocking until the results are there so that the users don't need to use await.

SylvainCorlay commented 3 years ago

They would use the same principle as when you type input() in a cell currently.

ilyabo commented 3 years ago

Would this be applied to any function returning a future?

sonthonaxrk commented 3 years ago

@ilyabo what were the traitlet errors?

I’m a bit busy with real work right now, but I’ve been thinking of a way this could be solved with greenlet, and just doing a hard switch of the stack when the comm message returns.

ilyabo commented 3 years ago

Thanks @sonthonaxrk!

Here's the error:

You can try for yourself, our widget is public:

pip install unfolded.map-sdk
from unfolded.map_sdk import UnfoldedMap
unfolded_map = UnfoldedMap(
  mapUUID='ae4f5345-8507-49ec-85f6-a1f8c7bc2219'
)
unfolded_map
sonthonaxrk commented 3 years ago

Cheers, I'll probably have the chance to take a look this week if it's related to the async kernel changes.

ilyabo commented 3 years ago

@sonthonaxrk Did you have a chance to look into it?

saraedum commented 2 years ago

@sonthonaxrk could you elaborate a bit on how issues like this might be worked around with greenlet?

ilyabo commented 2 years ago

This library offers a solution which can be used until the issue is solved in ipykernel: https://github.com/Kirill888/jupyter-ui-poll