jupyter / jupyter_client

Jupyter protocol client APIs
https://jupyter-client.readthedocs.io
BSD 3-Clause "New" or "Revised" License
390 stars 284 forks source link

Fix InvalidStateError in manager.py#in_pending_state caused by race condition #967

Open a3626a opened 1 year ago

a3626a commented 1 year ago

Background

In my jupyter system, kernel start, restart and shutdown are relatively frequent, comparing to other environment. It is due to additional features like online judge.

In this environment, Sentry reported InvliadStateError. It is very common error, occurs hundreds times a day. image

Introduction

in_pending_state is a decorator on async functions. If someone invokes a decorated function, .ready is instantiated and finishes(done) after the function invokation.

I think we can isolate in_pending_state from kernel management, and consider this as a general coroutine management service.

The Problem and The Solution

Original implementation does not consider multiple executing functions. So the implementation can cause a single .ready coroutine to set_result twice. This causes InvalidStateError. I have coded a simple test case for this.

To fix this, more general mechanism is needed. Additionally, we don't have to associate this mechanism to kernel lifecycle. Previous ._attempted_start adds unnecessary coupling to the code. Therefore I isolated the mechanism from the details of the decorated function, so introduced ._ready_count and removed .attempted_start.

Help Needed

However, I'm not sure about how to associate this new mechanism with owns_kernel.

Might be related

https://github.com/jupyter-server/jupyter_server/issues/1247