jupyter-incubator / contentmanagement

Jupyter Content Management Extensions
Other
78 stars 26 forks source link

Failure for ipyparallel loading #32

Closed abjer closed 8 years ago

abjer commented 8 years ago

Is there any scheduled implentation of getting this module to work with ipyparallel? I have tried to import notebooks through the 'load_notebook' method but it fails.

I have created a notebook called 'simple_test.ipynb' with a single cell:

# <api>
def squared(x):
    return x**2

The way I import the notebook is as follows:

import ipyparallel as ipp
rc = ipp.Client()
with rc[:].sync_imports():
    from jupyter_cms.loader import load_notebook
    simple = load_notebook('simple_test.ipynb')

When trying to use the 'squared' fct, I get an error:

rc[:].map_async(simple.sq,range(5)).get()

AttributeError Traceback (most recent call last)

in () ----> 1 rc[:].map_async(simple.sq,range(5)).get() AttributeError: module 'da39a3ee5e6b4b0d3255bfef95601890afd80709' has no attribute 'sq'

Am I doing something wrong or is this a general issue?

parente commented 8 years ago

Are you using ipyparallel with remote workers or all on the same machine? If they're remote, I doubt the import will work. If they're local ... I'm not sure what happens. Suffice to say you're probably the first person to test the feature with ipyparallel. 👍

abjer commented 8 years ago

The workers I imported were local.

minrk commented 8 years ago

How does load_notebook compute the module name?

parente commented 8 years ago

It takes the path to the notebook and passes it directly to the NotebookLoader constructor here: https://github.com/jupyter-incubator/contentmanagement/blob/master/jupyter_cms/loader.py#L57

It looks like it might be loading fine considering a module reference comes back, but the code to evaluate the API cells in the notebook depends on an interactive shell instance, and I'm wondering if that behaves the same in the workers as it would in the main notebook kernel.

Also, there's a few other bugs open here that we haven't had time to focus on with respect to loading problems (#8 , #25) that might also be coming into play.

parente commented 8 years ago

Using the import hook approach works, most likely because the module winds up in sys.module. Using load_notebook, it probably does not.

In test.ipynb:

# <api>
def squared(x):
    return x**2

In another notebook:

# cell 0
import os
import sys
import ipyparallel as ipp

rc = ipp.Client()

# cell 1
%%px
%load_ext jupyter_cms
import mywb.test as test
test.squared(5)
parente commented 8 years ago

This also works:

import os
import ipyparallel as ipp
rc = ipp.Client()
from jupyter_cms.loader import load_notebook
test = load_notebook('./test.ipynb')
%%px
from jupyter_cms.loader import load_notebook
test = load_notebook('./test.ipynb')
rc[:].map_async(test.squared, range(5)).get()

The output from the sync_imports() call in the original bug and in my attempts seems to suggest that the import of load_notebook occurs on all engines plus the local shell, but the following indirect import triggered by load_notebook() is only run in the local shell, not on the engines. Skipping the shortcut and calling load_notebook explicitly in both the local shell and on the engines works fine as shown above.

I don't think we can expect sync_imports to work considering the "module" here is synthetic and comes from a notebook file parsed, evaluated, and jammed into sys.modules.

minrk commented 8 years ago

sync_imports has extremely limited scope: it's an import hook, so anything that "creates" modules rather than simply importing them will probably not work.

parente commented 8 years ago

I'll write up the long-hand approach somewhere in the README and close this out.

abjer commented 8 years ago

Thanks!!