Need public opt-out API for output routing from threads

We need an easier opt-out of the behavior introduced in #1186, because most output I'm seeing come from threads is really confusing or even lost, where the previous behavior of sending all thread output to the current cell would be more appropriate. Either a kernel-level opt-out or a per-thread opt-out or both should be required, and we should perhaps reconsider the default behavior.

Two examples that I don't think are that unusual, demonstrating where the current behavior is incorrect and currently impossible to avoid with public APIs:

ipyparallel

For example, IPython Parallel runs its IO in a background thread, and sometimes this produces output (e.g. when streaming output from engines). It produces output in response to direct, blocking action in the main thread, but the output is produced in a long-running background thread. Sending this output to the cell that created the Client is not desirable. Producing the output in the main thread is not particularly feasible either, because it's used in e.g. streaming output in user code:

with async_result.stream_output(): <- this instructs a background thread to start streaming to sys.stdout, etc.
    do_blocking_things() <- blocks the main thread, not controlled by ipyparallel

In all situations, the right thing to do for output produced by this thread is to go to the current cell.

The same is true for some log output, e.g. stopping clusters - which again produces output in a background thread as a direct result of synchronized action taken on the main thread, where placing the output near the action taken that triggered the output is less surprising than the far disconnected initial launch of the thread.

ThreadPoolExecutor

Another, fully standard library situation is ThreadPoolExecutor, where outputs from all tasks will go to the initial thread-spawning Cell. It's not just the cell that creates the pool, since spawning threads may actually be deferred until the first task submission requiring the thread, producing this very surprising output order:

which may be related to the joblib issue in #402 which is closed, but appears to actually be unresolved.

I think there is an assumption in #1186 that threads, once spawned, are long running and do not interact with the main thread. This holds for some examples of "fire and forget" type threads, but is definitely not true in general, and I'm not even sure it's true more often than not.

We at least need a way for packages/libraries to indicate that a thread producing output shouldn't be routed to the originating cell.

If we want to be really fiddly and try to guess the right thing to do (as we are doing now with threads, where the guess is often incorrect, if clear and predictable), we could assume that threaded output should go to the current cell if the current cell is blocking while the output is produced. This would definitely do the wrong thing sometimes in the cases where a background thread should route to the originating cell. That seems to be the rare exception, however.

While the async routing is nice and I suspect more robust, I think the guess for threads is more often incorrect than correct, so I think perhaps it should be made opt-in instead of opt-out. At the very least, I think we should make sure that the current thread output routing does not apply to threads created by ThreadPoolExecutor or similar.

Looking at https://github.com/jupyter-widgets/ipywidgets/issues/2358 which uses OutputWidget, allowing OutputWidget to set the thread-local parent header in a sticky way would still solve the motivating issue, even if default print statements were routed to the latest cell, which I think is probably the better default behavior. I think #1186 gave us what we need to allow with OutputWIdget() to work in a way that persists for the current context (via stream._parent_header.set).

I don't actually understand how with output_widget captures output anymore, since apparently setting self.msg_id is all it does, but presumably with output_widget should set the parent_header in a way that's persistent for the context within the current thread and not overridden by concurrent executions in the main thread.

What do you think is the right path forward, @krassowski?

I think perhaps a public API to set the global parent as one or more ContextVars (ideally one);

@contextmanager
def parent_header_setter(parent):
    token = parent_context_var.set(parent)
    try:
        yield
    finally:
        parent_context_var.reset(token)

OutStream.set_parent is almost this, but it also sets a permanent persistent _parent_header_global. One way to ensure the current parent is persisted as a ContextVar as a current workaround is:

sys.stdout.parent_header = sys.stdout.parent_header

which looks weird, but it stores the current resolved value (which may come from up to 3 different places) in a contextvar, so it overrides the global and the thread-local lookup, so it's a one-time set of the global default, but a persistent set of the thread-local value. It would be nice to have the possibility to do this also without overriding the global, but that can only currently be done with the private:

sys.stdout._parent_header.set(sys.stdout.parent_header)

I think we should:

change the default so thread output is not associated with its launching Cell
move the thread-local parent from the OutStream to the Kernel and use a single value
add a public API that sets the thread-local header only
update OutputWidget.__enter__ and __exit__ to set and reset the parent_header contextvar for redirection

because while #1186 fixed OutputWidget capturing print statements from threads, it actually doesn't capture display output from threads because only OutStream has this behavior, not Kernel.parent_header more broadly.

ipython / ipykernel

Need public opt-out API for output routing from threads #1289

ipyparallel

ThreadPoolExecutor