ipython / ipykernel

IPython Kernel for Jupyter
https://ipykernel.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
654 stars 369 forks source link

Infinite loop when using threadpool in notebook #1269

Open jacksonriley opened 1 month ago

jacksonriley commented 1 month ago

Hi there, I've come across an infinite loop inside ipykernel which I can reliably reproduce when using multiple concurrent.futures.ThreadPoolExecutors inside a notebook with ipykernel >= 6.29.0. This results in the execution of the relevant cell hanging forever.

This looks to have been introduced in https://github.com/ipython/ipykernel/pull/1186, which added the handling around self._thread_to_parent. I believe this will still be present on main.

The infinite loop occurs here: https://github.com/ipython/ipykernel/blob/0fa543936d3f9ee69f6b9f8bd5a316692fde9aaa/ipykernel/iostream.py#L538-L539

when there's a cycle in self._thread_to_parent.

Origin of cycle

ipykernel uses threading.get_ident() to grab a (temporarily!) unique identifier for a threading.Thread thread when it runs, and stash it in a self._thread_to_parent dict on stdout/stderr: https://github.com/ipython/ipykernel/blob/0fa543936d3f9ee69f6b9f8bd5a316692fde9aaa/ipykernel/ipkernel.py#L747

The threading.get_ident() function states

Thread identifiers may be recycled when a thread exits and another thread is created.

Imagine a thread A with ident 1 having a child thread B with ident 2. Once those threads exit, there's nothing preventing a thread C with ident 2 having a child thread D with ident 1.

As the stdout and stderr streams are long-lived, you can therefore end up with a cycle in self._thread_to_parent, which causes this infinite loop. I've proved that this is the case with some hacked logging in the relevant places:

                while identity in self._thread_to_parent:
+                   import datetime
+                   with open("debug.log", "a") as f:
+                       f.write(f"{datetime.datetime.now()}: {identity} is in self._thread_to_parent. Value is {self._thread_to_parent[identity]}\n")
                    identity = self._thread_to_parent[identity]

and

                    else:
+                       import datetime
+                       is_stdout = stream is stdout
+                       is_stderr = stream is stderr
+                       with open("debug.log", "a") as f:
+                           f.write(f"{datetime.datetime.now()}: Writing {self.ident}: {parent} into stream._thread_to_parent. Stream is {stream}. {is_stdout=} {is_stderr=} stream._thread_to_parent is {stream._thread_to_parent}\n")
                        stream._thread_to_parent[self.ident] = parent

This gives something like the below, where we end up with an infinite loop due to the cycle of [140278168012352, 140277658416704]:

The first link (stderr) ----------> 2024-10-02 13:34:26.043559: Writing 140277658416704: 140278168012352 into stream._thread_to_parent. Stream is <ipykernel.iostream.OutStream object at 0x7f9527f63be0>. is_stdout=True is_stderr=False stream._thread_to_parent is {140278159619648: 140278168012352, 140278151226944: 140278168012352, 140278142834240: 140278168012352, 140277658416704: 140278168012352, 140277650024000: 140278168012352}
                                    2024-10-02 13:34:26.043658: Writing 140278142834240: 140278168012352 into stream._thread_to_parent. Stream is <ipykernel.iostream.OutStream object at 0x7f9527f63a30>. is_stdout=False is_stderr=True stream._thread_to_parent is {140278159619648: 140278168012352, 140278151226944: 140278168012352, 140278142834240: 140278168012352, 140277658416704: 140278168012352, 140277650024000: 140278168012352}
                                    2024-10-02 13:34:26.043746: Writing 140278151226944: 140278168012352 into stream._thread_to_parent. Stream is <ipykernel.iostream.OutStream object at 0x7f9527f63be0>. is_stdout=True is_stderr=False stream._thread_to_parent is {140278159619648: 140278168012352, 140278151226944: 140278168012352, 140278142834240: 140278168012352, 140277658416704: 140278168012352, 140277650024000: 140278168012352}
                                    2024-10-02 13:34:26.043984: 140278159619648 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.044021: Writing 140277650024000: 140278168012352 into stream._thread_to_parent. Stream is <ipykernel.iostream.OutStream object at 0x7f9527f63be0>. is_stdout=True is_stderr=False stream._thread_to_parent is {140278159619648: 140278168012352, 140278151226944: 140278168012352, 140278142834240: 140278168012352, 140277658416704: 140278168012352, 140277650024000: 140278168012352}
                                    2024-10-02 13:34:26.044236: Writing 140278151226944: 140278168012352 into stream._thread_to_parent. Stream is <ipykernel.iostream.OutStream object at 0x7f9527f63a30>. is_stdout=False is_stderr=True stream._thread_to_parent is {140278159619648: 140278168012352, 140278151226944: 140278168012352, 140278142834240: 140278168012352, 140277658416704: 140278168012352, 140277650024000: 140278168012352}
The first link (stderr) ----------> 2024-10-02 13:34:26.044457: Writing 140277658416704: 140278168012352 into stream._thread_to_parent. Stream is <ipykernel.iostream.OutStream object at 0x7f9527f63a30>. is_stdout=False is_stderr=True stream._thread_to_parent is {140278159619648: 140278168012352, 140278151226944: 140278168012352, 140278142834240: 140278168012352, 140277658416704: 140278168012352, 140277650024000: 140278168012352}
                                    2024-10-02 13:34:26.044587: Writing 140277650024000: 140278168012352 into stream._thread_to_parent. Stream is <ipykernel.iostream.OutStream object at 0x7f9527f63a30>. is_stdout=False is_stderr=True stream._thread_to_parent is {140278159619648: 140278168012352, 140278151226944: 140278168012352, 140278142834240: 140278168012352, 140277658416704: 140278168012352, 140277650024000: 140278168012352}
                                    2024-10-02 13:34:26.044749: 140278142834240 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.045033: 140277658416704 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.045135: 140278142834240 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.045221: 140278151226944 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.045318: 140278159619648 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.045531: 140277650024000 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.045678: 140277658416704 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.045846: 140277650024000 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.045929: 140278151226944 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.147930: 140277658416704 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.148022: 140277658416704 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.148076: 140277658416704 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.148108: 140277658416704 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.155890: 140278151226944 is in self._thread_to_parent. Value is 140278168012352
Completing the cycle (stdout) ----> 2024-10-02 13:34:26.155942: Writing 140278168012352: 140277658416704 into stream._thread_to_parent. Stream is <ipykernel.iostream.OutStream object at 0x7f9527f63be0>. is_stdout=True is_stderr=False stream._thread_to_parent is {140278159619648: 140278168012352, 140278151226944: 140278168012352, 140278142834240: 140278168012352, 140277658416704: 140278168012352, 140277650024000: 140278168012352}
Completing the cycle (stderr) ----> 2024-10-02 13:34:26.156027: Writing 140278168012352: 140277658416704 into stream._thread_to_parent. Stream is <ipykernel.iostream.OutStream object at 0x7f9527f63a30>. is_stdout=False is_stderr=True stream._thread_to_parent is {140278159619648: 140278168012352, 140278151226944: 140278168012352, 140278142834240: 140278168012352, 140277658416704: 140278168012352, 140277650024000: 140278168012352}
                                    2024-10-02 13:34:26.156131: 140278168012352 is in self._thread_to_parent. Value is 140277658416704
                                    2024-10-02 13:34:26.156171: 140278168012352 is in self._thread_to_parent. Value is 140277658416704
                                    2024-10-02 13:34:26.156212: 140277658416704 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.156244: 140278168012352 is in self._thread_to_parent. Value is 140277658416704
                                    2024-10-02 13:34:26.156281: 140277658416704 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.156308: 140278168012352 is in self._thread_to_parent. Value is 140277658416704
                                    2024-10-02 13:34:26.156334: 140277658416704 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.156358: 140278168012352 is in self._thread_to_parent. Value is 140277658416704
                                    2024-10-02 13:34:26.156392: 140277658416704 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.156459: 140277658416704 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.156501: 140278168012352 is in self._thread_to_parent. Value is 140277658416704
                                    2024-10-02 13:34:26.156525: 140277658416704 is in self._thread_to_parent. Value is 140278168012352
                                    2024-10-02 13:34:26.156547: 140278168012352 is in self._thread_to_parent. Value is 140277658416704
                                    2024-10-02 13:34:26.156570: 140277658416704 is in self._thread_to_parent. Value is 140278168012352
...infinite loop

Reproduction

Apologies, it's probably not the most minimal reproduction, but this cell will reliably reproduce this infinite loop. It might take a couple of minutes to do so, but you'll see the output stop when it happens (and if you use the above added debugging, will see it clearly start infinite-looping)

import time
import threading
from concurrent import futures
import datetime

def func(*args, **kwargs):
    # Need this first layer of indirection to get into the `else` of https://github.com/ipython/ipykernel/blob/0fa543936d3f9ee69f6b9f8bd5a316692fde9aaa/ipykernel/ipkernel.py#L742
    print(f"Launching with {threading.get_ident()}")
    with futures.ThreadPoolExecutor(5) as executor:
        print(list(executor.map(innermost_func, [1, 2, 3, 4, 5])))

def innermost_func(*args, **kwargs):
    print(f"Hello from thread {threading.get_ident()}")
    time.sleep(0.1)

count = 0
while True:
    count += 1
    print(f"{datetime.datetime.now()} Iteration number {count}")
    threads = 1
    print(f"Launching with {threading.get_ident()}")
    with futures.ThreadPoolExecutor(threads) as executor:
        print(list(executor.map(func, [1])))

My environment

Python 3.10.12
ipykernel==6.29.4