ipython / ipyparallel

IPython Parallel: Interactive Parallel Computing in Python
https://ipyparallel.readthedocs.io/
Other
2.59k stars 1.01k forks source link

Packages and definitions loaded in the regular kernel are not known by engines and vice versa #897

Open skwde opened 1 month ago

skwde commented 1 month ago

Importing modules or defining functions in regular cells (without the %%px magic) are not known in cells where the %%px macro is used.

The same is true for the other way around.

To reproduce this try

# Cell 1
import ipyparallel as ipp
from mpi4py import MPI

# Cell 2
cluster = ipp.Cluster(engines="mpi", n=4)
clients = cluster.start_and_connect_sync()

# Cell 3
%%px
import sys

def abc():
    print('abc')

# Cell 4
import os
def cde():
    print('cde')

# Cell 5:
%%px
os.getpid()
cde()

# Cell 6:
sys.getsizeof(0.0)
abc()

Cell 5 and Cell 6 will fail on both statements in each.

A global %autopx would ensure that everything runs on the engines, which makes development kind of inconvenient (e.g. prints are done on all engines and so on). Defining different views and activating them according to how many engines are needed, e.g. one for development with a single engine, seems overkill.

What is the recommended approach to solve the outlined problem?

minrk commented 1 month ago

%%px has a --local flag to also run on the local kernel in addition to the remote engines, for exactly this situation:

%%px --local
def foo():
    ....

foo is defined everywhere, locally and remotely.

skwde commented 1 month ago

@minrk ahh ok thanks!

I haven't seen that in the docs and still can't. Would you mind pointing me to the docs as well?

skwde commented 1 month ago

@minrk

I now checked again and try to use

%pxconfig --local --verbose
%autopx

as global config. I however than have the problem that I cannot execute local cells only.

Here are my two tests:


The idea is to use something like

%autopx # disable global
# do local stuff for testing
%autopx # enable again

in a single cell.

Problems:


Another way would be to

%pxconfig --targets 0

Problems:

minrk commented 1 month ago

%autopx is a relic of long ago when there was only terminal IPython and most execution was line by line, so there was a high value in reducing the number of characters to type. If I wrote it today in a world with Jupyter, I wouldn't have included it. %autopx doesn't provide much benefit and has substantial downsides in notebooks where you can put %%px on parallel cells and you aren't re-typing executions all the time like you do in a terminal. If you are using a notebook, my recommendation is to not use %autopx in a notebook at all, and put %%px on the cells you want to execute remotely.

To answer specific questions: %autopx won't really work unless it's the only thing (or last thing) in a cell. IPython executes whole cells at a time. %autopx changes how run_cell behaves, so it has no way of modifying subsequent statements within the same cell that's already in the middle of executing. Having %autopx twice in a cell (anywhere) should have the same effect as not having it in the cell at all.

%pxconfig seems unknown by the engines.

That's right. %pxconfig configures the %px magics associated with an IPython parallel client in the current IPython session. Each engine is its own IPython session and the engines do not have clients (unless you decide to create them), so they do not have these magics.

How to undo configuration, to use again all targets

you can use:

%pxconfig --targets all

to use all targets again.

You can run %pxconfig in between cells with %%px (not %autopx), but the notebook will probably be clearer and more consistent if you pass these arguments to %%px, instead because it will be clear what each cell is doing. You can see the arguments they take by inspecting it with %%px? (same as any magic or other Python object). The way I would do it is to skip %pxconfig (except possibly for setting one, most common default at the very beginning) and %autopx and using almost exclusively %%px with arguments. but if I were doing it, I would write a notebook like this, where each choice to run in a specific context is clearly defined on a cell:

%%px --local
def use_this_everywhere_even_client():
    ....
%%px
all_engines_setup # (not local)
%%px --targets 0
# this cell is only on 0

I haven't seen that in the docs and still can't. Would you mind pointing me to the docs as well?

This is not well documented, sorry! I'll try to write up a new tutorial notebook with this kind of thing in mind. But the short-term answer is: interactive documentation always works: run %%px?, and I would generally recommend against using stateful magics like %autopx or %pxconfig more than once in a notebook (if at all).

skwde commented 1 month ago

Thanks a lot for this detailed reply.

I do agree that a explicit %%px on top of the cells is the nicest solution. In particular it is quite nice to simply switch the number of targets for a particular cell!

I however experience problems with linter / formatter no longer working when the magic is included at the top of the cell. That's really unfortunate and until I found a fix for that I have to resort to

%pxconfig --local --verbose
%autopx

globally and turning autopx off when doing things locally or re-configuring.

No problem for the documentation! Waiting to see your tutorial on that!

minrk commented 4 weeks ago

I however experience problems with linter / formatter no longer working when the magic is included at the top of the cell.

what linter are you using?

I know ruff doesn't understand most cell magics, but nbqa can and so can black.

skwde commented 4 weeks ago

I actually use ruff now. In the past I used black. I already read about nbqa but didn't look into it so far.

The ruff issue you linked is thus highly relevant to me! Thanks for creating it!

skwde commented 4 weeks ago

I know that the now recommended approach is to not use %autopx but use %%px on every cell. For reasons linter / formatting reasons I have to use it though.

I just noticed, that in

import ipyparallel as ipp

cluster = ipp.Cluster(engines="MPI", n=4, cluster_id="ppb", profile="MPI")
clients = cluster.start_and_connect_sync()

%pxconfig --local --verbose
# enable
%autopx

the --local seems to be ignored.

def foo():
    print("bar")

foo()

works just fine. But as soon as I want to use it locally via

# disable
%autopx
foo()

I get

NameError: name 'foo' is not defined

Any idea what I miss here?

minrk commented 3 weeks ago

My guess is there's something missing in autopx, perhaps not handling --local. I'm tempted to put a "please don't use this" warning on every invocation of autopx, because I really feel very strongly that it should ~never be used. Certainly never in a notebook.

minrk commented 3 weeks ago

Looking at the code, --local is a feature of %%px specifically, not %px or %autopx. The bug is really that %pxconfig accepts it despite not supporting it. I'll fix that, but I'm not currently inclined to add new features to %autopx since I really do want to discourage its use.

skwde commented 3 weeks ago

Thanks!. I see your point though. I am happy to stop using %autopx once ruff works with the magics.

minrk commented 3 weeks ago

You can see the updated parallel magics doc here.