Open nvdv opened 8 years ago
Have a look at ipycache if you have long-running cells that you don't always want to re-run. I don't think we want to get into defining a DAG of cells.
There is a long thread we had a few years[*] ago about that on the mailing list.
[*] OMG I'm old now.
@nvdv : We're doing a little housekeeping on our issue log and noticed this thread from 2016. Has this issue been resolved to your satisfaction and can it be closed? thanks!
It is feature request. I am not sure it was implemented, but its up to you to close it if you think its out of scope.
On Apr 27, 2017 04:05, "JamieW" notifications@github.com wrote:
@nvdv https://github.com/nvdv : We're doing a little housekeeping on our issue log and noticed this thread from 2016. Has this issue been resolved to your satisfaction and can it be closed? thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jupyter/notebook/issues/1175#issuecomment-297582787, or mute the thread https://github.com/notifications/unsubscribe-auth/AAtf16APooTt3sBAP6TLQPtctfkAultEks5rz-nXgaJpZM4HqAPF .
The long thread discussing this, linked above by @Carreau , is unreachable for me. So apologies if I'm rehashing things discussed there.
I certainly agree managing a DAG of cells is not desirable. But it would be cool if there was a built-in cell magic for stating cells to be automatically run first before running the current cell. Naively, this doesn't seem to be too burdensome a feature to implement, but I'm mostly a Jupyter notebook user, not developer, so I could be wrong. Does there exist any such cell magic, or a cell magic that could be used for this purpose?
Conversely, while a dependency graph might tell you you don't need to evaluate/re-evaluate cell B just because A changed, it might also tell you that you're going to have a bad time trying to evaluate C if C depends on A.
In accordance with https://jupyter-notebook.readthedocs.io/en/stable/security.html , if someone tried to execute a cell that depended on another, I wonder if it would make sense to do so automatically?
At a minimum, it might be helpful to have some visual feedback to indicate that the cell isn't runnable until some particular cell above satisfies its dependencies.
@takluyver, is there any reason for a DAG of cells to be out of question? Visualising cells in a graph would certainly allow both cell dependency to become clearer as well as improve story telling capabilities, since non-linear (branching) stories are hard to tell within today's notebooks.
For a simple concrete example: imagine a notebook to evaluate three real estate expansion plans for a given city. The first node of cells loads the current real estate data and describes the current state of affairs. From there, you get three branches, each of them following similar logic but following different scenario premisses and arriving to comparable (but different) end results.
Today, this analysis could be done using a chapter for each scenario, but that still requires rolling up and down to compare, maybe unclear settings of which cell to run before scenario A, maybe (accidentally) re-running scenario A before B (run all is sooo easy to click on), etc.
I think using a magic (or cell metadata) to explicitly define dependencies for a DAG of cells is a very interesting idea. I think automatically coming up with the DAG on the front end is probably prohibitively hard, given that we have a number of kernels of different languages. There was some work from a CalPoly group of students on a kernel that would keep track of a DAG, IIRC, somewhat like ObservableHQ.
Because it's been a year, and this idea has been bouncing around my head a little -- here's a sketch of a thought in this area:
I'd be really interested in a world where the cells run in actual scopes, and cells were more explict about what they were pulling in from each other. This might be reasonably easy in python, but maybe tricky in different languages.
label_cell("utility")
def func_that_makes_a_df():
<code>
<Some markdown explaining that function>
label_cell("get_pf")
from cell("utilty") import func that_makes_a_df()
df = func_that_makes_a_df()
<Some markdown that talk about a dataframe>
from cell("get_pf") import df as plotttable_df
import plotly
plotly.plot_something(plottable_df)
Making the only things that are shared between cells super-explicit might help:
I haven't really thought at all about what this might look like outside of the Python world.
In that world, attempting to refer to func_that_makes_a_df in a cell that isn't explicitly importing it from another cell would, for example, fail, with a NameError: name 'func_that_makes_a_df' is not defined
exception.
@nickurak , I can see other use cases for that, but the use case you've described could be solved establishing cell dependency and splitting code in different cells accordingly. That would be a more generic approach as well, since it could apply to other languages.
Your example would be something like:
If you need a function (but not another) that is defined in a given cell, simply split it into two cells and add the dependency only to the one you need.
I have worked on a (quick and dirty) visual proposition of how to use cell dependencies to facilitate story telling and organize notebook flows. It probably makes more sense in JupyterLab project, but anyway, this is what I envision: https://docs.google.com/presentation/d/1nWAjvuCZb4MEu9SiTy-QWfMWBThpDpZFnuKNp1S_fHs/edit?usp=sharing
Any comments are appreciated.
If you need a function (but not another) that is defined in a given cell, simply split it into two cells and add the dependency only to the one you need.
A question, which I see as a prerequisite for this discussion: is there already in any Jupyter plugin a standard, or at least popular, way to uniquely identify cells?
Seems like it is going to be a part of JupyterLab Core [https://github.com/jupyterlab/jupyterlab-celltags]
A question, which I see as a prerequisite for this discussion: is there already in any Jupyter plugin a standard, or at least popular, way to uniquely identify cells?
Yes. In the Jupyter official notebook format, a cell can have an optional unique name
in its metadata: https://nbformat.readthedocs.io/en/latest/format_description.html#cell-metadata
Yes. In the Jupyter official notebook format, a cell can have an optional unique
name
in its metadata: https://nbformat.readthedocs.io/en/latest/format_description.html#cell-metadata
Cool! And is this already exposed somewhere?
Cool! And is this already exposed somewhere?
It's exposed everywhere, in the sense that any library or frontend that can write to cell metadata can write this key. Jupyter notebook and JupyterLab, for example, expose an interface for writing to the cell metadata.
(To be clear, as with any metadata, it is optional and up to the writer to set this value. It is not set by default in JupyterLab, though it may be set in the notebook by default to some sort of UUID).
It's exposed everywhere, in the sense that any library or frontend that can write to cell metadata can write this key.
Yes, sorry, my question was misleading. I should have asked: is there already some UI for allowing the user to see/change this?
Yes (though it's just a json editor). In JupyterLab, it's the wrench icon in the left sidebar. In classic notebook, it's the View > Cell Toolbar > Edit Metadata.
In case that has not been posted already, please see also https://github.com/dataflownb and https://github.com/stitchfix/nodebook
Both of those got talks at JupyterCon in 2018 so should be somewhere on Youtube.
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:
https://observablehq.com/ uses a DAG and I would love to see a JupyterLab extension providing similar features:
https://observablehq.com/@observablehq/how-observable-runs
Edit
Moved overview of projects to jupyterlab: https://discourse.jupyter.org/t/dag-based-notebooks/11173/4
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:
Also see https://marimo.io/ .
It's surprising that no one mentioned https://github.com/ipyflow/ipyflow.
At present all Notebook cells are executed linearly:
but sometimes there's no need to calculate
Cell 2
in order to get result fromCell 3
and calculatingCell 2
might be time-consuming. Setting cell dependency graph somehow would resolve this issue.