jupyter / notebook

Jupyter Interactive Notebook
https://jupyter-notebook.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
11.73k stars 4.97k forks source link

Cell dependency graph #1175

Open nvdv opened 8 years ago

nvdv commented 8 years ago

At present all Notebook cells are executed linearly:

Cell 1
   |
Cell 2
   |
Cell 3

but sometimes there's no need to calculate Cell 2 in order to get result from Cell 3 and calculating Cell 2 might be time-consuming. Setting cell dependency graph somehow would resolve this issue.

takluyver commented 8 years ago

Have a look at ipycache if you have long-running cells that you don't always want to re-run. I don't think we want to get into defining a DAG of cells.

Carreau commented 8 years ago

There is a long thread we had a few years[*] ago about that on the mailing list.

[*] OMG I'm old now.

JamiesHQ commented 7 years ago

@nvdv : We're doing a little housekeeping on our issue log and noticed this thread from 2016. Has this issue been resolved to your satisfaction and can it be closed? thanks!

nvdv commented 7 years ago

It is feature request. I am not sure it was implemented, but its up to you to close it if you think its out of scope.

On Apr 27, 2017 04:05, "JamieW" notifications@github.com wrote:

@nvdv https://github.com/nvdv : We're doing a little housekeeping on our issue log and noticed this thread from 2016. Has this issue been resolved to your satisfaction and can it be closed? thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jupyter/notebook/issues/1175#issuecomment-297582787, or mute the thread https://github.com/notifications/unsubscribe-auth/AAtf16APooTt3sBAP6TLQPtctfkAultEks5rz-nXgaJpZM4HqAPF .

adam-m-jcbs commented 6 years ago

The long thread discussing this, linked above by @Carreau , is unreachable for me. So apologies if I'm rehashing things discussed there.

I certainly agree managing a DAG of cells is not desirable. But it would be cool if there was a built-in cell magic for stating cells to be automatically run first before running the current cell. Naively, this doesn't seem to be too burdensome a feature to implement, but I'm mostly a Jupyter notebook user, not developer, so I could be wrong. Does there exist any such cell magic, or a cell magic that could be used for this purpose?

mxxun commented 6 years ago

For future reference: the long thread was moved.

nickurak commented 5 years ago

Conversely, while a dependency graph might tell you you don't need to evaluate/re-evaluate cell B just because A changed, it might also tell you that you're going to have a bad time trying to evaluate C if C depends on A.

In accordance with https://jupyter-notebook.readthedocs.io/en/stable/security.html , if someone tried to execute a cell that depended on another, I wonder if it would make sense to do so automatically?

At a minimum, it might be helpful to have some visual feedback to indicate that the cell isn't runnable until some particular cell above satisfies its dependencies.

pedrovgp commented 4 years ago

@takluyver, is there any reason for a DAG of cells to be out of question? Visualising cells in a graph would certainly allow both cell dependency to become clearer as well as improve story telling capabilities, since non-linear (branching) stories are hard to tell within today's notebooks.

For a simple concrete example: imagine a notebook to evaluate three real estate expansion plans for a given city. The first node of cells loads the current real estate data and describes the current state of affairs. From there, you get three branches, each of them following similar logic but following different scenario premisses and arriving to comparable (but different) end results.

Today, this analysis could be done using a chapter for each scenario, but that still requires rolling up and down to compare, maybe unclear settings of which cell to run before scenario A, maybe (accidentally) re-running scenario A before B (run all is sooo easy to click on), etc.

jasongrout commented 4 years ago

I think using a magic (or cell metadata) to explicitly define dependencies for a DAG of cells is a very interesting idea. I think automatically coming up with the DAG on the front end is probably prohibitively hard, given that we have a number of kernels of different languages. There was some work from a CalPoly group of students on a kernel that would keep track of a DAG, IIRC, somewhat like ObservableHQ.

nickurak commented 4 years ago

Because it's been a year, and this idea has been bouncing around my head a little -- here's a sketch of a thought in this area:

I'd be really interested in a world where the cells run in actual scopes, and cells were more explict about what they were pulling in from each other. This might be reasonably easy in python, but maybe tricky in different languages.

label_cell("utility")
def func_that_makes_a_df():
   <code>
<Some markdown explaining that function>
label_cell("get_pf")
from cell("utilty") import func that_makes_a_df()
df = func_that_makes_a_df()
<Some markdown that talk about a dataframe>
from cell("get_pf") import df as plotttable_df
import plotly

plotly.plot_something(plottable_df)

Making the only things that are shared between cells super-explicit might help:

I haven't really thought at all about what this might look like outside of the Python world.

nickurak commented 4 years ago

In that world, attempting to refer to func_that_makes_a_df in a cell that isn't explicitly importing it from another cell would, for example, fail, with a NameError: name 'func_that_makes_a_df' is not defined exception.

pedrovgp commented 4 years ago

@nickurak , I can see other use cases for that, but the use case you've described could be solved establishing cell dependency and splitting code in different cells accordingly. That would be a more generic approach as well, since it could apply to other languages.

Your example would be something like:

If you need a function (but not another) that is defined in a given cell, simply split it into two cells and add the dependency only to the one you need.

pedrovgp commented 4 years ago

I have worked on a (quick and dirty) visual proposition of how to use cell dependencies to facilitate story telling and organize notebook flows. It probably makes more sense in JupyterLab project, but anyway, this is what I envision: https://docs.google.com/presentation/d/1nWAjvuCZb4MEu9SiTy-QWfMWBThpDpZFnuKNp1S_fHs/edit?usp=sharing

Any comments are appreciated.

toobaz commented 4 years ago

If you need a function (but not another) that is defined in a given cell, simply split it into two cells and add the dependency only to the one you need.

A question, which I see as a prerequisite for this discussion: is there already in any Jupyter plugin a standard, or at least popular, way to uniquely identify cells?

pedrovgp commented 4 years ago

Seems like it is going to be a part of JupyterLab Core [https://github.com/jupyterlab/jupyterlab-celltags]

jasongrout commented 4 years ago

A question, which I see as a prerequisite for this discussion: is there already in any Jupyter plugin a standard, or at least popular, way to uniquely identify cells?

Yes. In the Jupyter official notebook format, a cell can have an optional unique name in its metadata: https://nbformat.readthedocs.io/en/latest/format_description.html#cell-metadata

toobaz commented 4 years ago

Yes. In the Jupyter official notebook format, a cell can have an optional unique name in its metadata: https://nbformat.readthedocs.io/en/latest/format_description.html#cell-metadata

Cool! And is this already exposed somewhere?

jasongrout commented 4 years ago

Cool! And is this already exposed somewhere?

It's exposed everywhere, in the sense that any library or frontend that can write to cell metadata can write this key. Jupyter notebook and JupyterLab, for example, expose an interface for writing to the cell metadata.

jasongrout commented 4 years ago

(To be clear, as with any metadata, it is optional and up to the writer to set this value. It is not set by default in JupyterLab, though it may be set in the notebook by default to some sort of UUID).

toobaz commented 4 years ago

It's exposed everywhere, in the sense that any library or frontend that can write to cell metadata can write this key.

Yes, sorry, my question was misleading. I should have asked: is there already some UI for allowing the user to see/change this?

jasongrout commented 4 years ago

Yes (though it's just a json editor). In JupyterLab, it's the wrench icon in the left sidebar. In classic notebook, it's the View > Cell Toolbar > Edit Metadata.

Carreau commented 4 years ago

In case that has not been posted already, please see also https://github.com/dataflownb and https://github.com/stitchfix/nodebook

Carreau commented 4 years ago

Both of those got talks at JupyterCon in 2018 so should be somewhere on Youtube.

meeseeksmachine commented 3 years ago

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/dag-based-notebooks/11173/2

stefaneidelloth commented 3 years ago

https://observablehq.com/ uses a DAG and I would love to see a JupyterLab extension providing similar features:

https://observablehq.com/@observablehq/how-observable-runs

Edit

Moved overview of projects to jupyterlab: https://discourse.jupyter.org/t/dag-based-notebooks/11173/4

meeseeksmachine commented 3 years ago

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/how-to-get-output-model-for-a-given-cell-in-a-jupyterlab-extension/11342/1

meeseeksmachine commented 3 years ago

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/dag-based-notebooks/11173/4

jondo commented 3 weeks ago

Also see https://marimo.io/ .

krassowski commented 3 weeks ago

It's surprising that no one mentioned https://github.com/ipyflow/ipyflow.