quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.81k stars 310 forks source link

Feature request: per cell cacheing for python cells #1092

Open ejolly opened 2 years ago

ejolly commented 2 years ago

From the documentation site:

Note that for Jupyter, the cache for a document is invalidated if any of the code blocks change. For Knitr, invalidation occurs on a per-cell basis. (emphasis added)

It would be great it adding or modifying 1 python cell only invalidates the cache for that specific cell, rather than re-executing all code cells. This would be really nice if you wanted to say add a new plotting cell but don't want to rerun expensive computations in previous cells

I'm not sure if this is a limitation of jupyter-cache or if there are plans to support this in the future. Or maybe allow usage or incorporation of a different caching library, e.g. like ipycache

jjallaire commented 2 years ago

Yes we would love it to work this way. We are finishing up our v1.0 release and will take a look at this post v1.0.

JanPalasek commented 2 years ago

I would highlight importance of this issue. The current implementation makes working with more computationally more intensive notebooks impossible.

jjallaire commented 2 years ago

Agree this is important! We currently use jupyter-cache (https://github.com/executablebooks/jupyter-cache) for notebook caching. While they don't currently have a per-cell cache option they certainly may develop one.

Another approach we've seen for users with extremely expensive computations is to author within the Jupyter Notebook UI (where there is effectively a per-cell cache). Note that when rendering an ipynb Quarto does not re-execute it by default.

JanPalasek commented 2 years ago

Agree this is important! We currently use jupyter-cache (https://github.com/executablebooks/jupyter-cache) for notebook caching. While they don't currently have a per-cell cache option they certainly may develop one.

Another approach we've seen for users with extremely expensive computations is to author within the Jupyter Notebook UI (where there is effectively a per-cell cache). Note that when rendering an ipynb Quarto does not re-execute it by default.

Thanks. Yes, I know about that option. But while I really like working in qmd, I don't like working in ipynb with Quarto that much. Not sure about that jupyter-cache, it had last release 7 months ago and no significant activity since then. I suggested the cell-level cache in a issue there, but we'll see.

JanPalasek commented 11 months ago

@jjallaire How would you like the cache to be implemented? For example I might add use NotebookClient from nb-client package (https://github.com/jupyter/nbclient/blob/main/nbclient/client.py#L60). I could use the hooks to implement the caching. This class is used by nbconvert when executing the notebook. Would Quarto be able to use this implementation?

jjallaire commented 11 months ago

Yes, we currently use NotebookClient for interacting with notebooks.

That said, I think that it would be of substantial benefit to try to collaborate with the https://github.com/executablebooks/jupyter-cache project on this. I think it would be a desirable feature there and a lot of expertise could be brought to bear if worked on collaboratively.

My biggest overall concern about per-cell caching is that it requires that the entire Python environment be serializable (e.g. pickle). There are many Python objects though that cannot be easily serialized (anything with a pointer into an external library, for example) so there would a lot of qualification around how and when the cache could be used and expected to work properly