jupyter / notebook

Jupyter Interactive Notebook
https://jupyter-notebook.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
11.43k stars 4.77k forks source link

Suggestion: Separate file for notebook executed cell outputs. #5677

Open jbursey opened 3 years ago

jbursey commented 3 years ago

Unless this is a feature already I think it would be nice to have a separate file (something like .ipynb.output) that links output to their cells in the .ipynb json file. This would make it significantly easier to exclude notebook outputs in source control systems like git.

If this is already possible somehow I would be interested to know.

gitjeff05 commented 3 years ago

Its not a bad idea. But if keeping cell output out of source control is your primary concern, the easiest solution is to just clear the outputs before committing. There are a few ways to do that:

  1. Use a commit hook as outlined in Jupyter docs.

  2. Use Jupyter's shortcut to "clear all cell output"

  3. Use nbconvert to clear the notebook outputs before committing.

  4. You could also just write your own shell script to clear outputs. I wrote one using jq to do that and it is fairly easy.

Some folks also choose to just convert the notebook to python using nbconvert and then just commit that. If you search for "How to version control jupyter notebooks" you will see a bunch of posts on the topic.

cipri-tom commented 3 years ago

I think that jupyterlab already has the capability of displaying the output in a different view from the notebook.

IvoMerchiers commented 3 years ago

Alternatively, Jupytext could be helpful for your case. It allows you to save notebooks as code. Then you only need to commit the code to git, whilst you can ignore the notebooks for version control.

Their paired notebooks avoid the need for automatically saving and converting the notebooks.

starball5 commented 1 year ago

Related: jupyterlab/jupyterlab#9444 and jupyterlab/jupyterlab-git#392

Related question on Stack Overflow: How can I configure my tools to ignore or prevent updates to the execution_count field in a Jupyter Notebook from being tracked in git?

th0ger commented 11 months ago

Good idea. The alternative discussed above are about excluding cells from source control.

But sometimes we have a need to include the executed cells in source control. (My current case is with Quarto.) Including the cell output in the .ipynb file makes it extremely difficult to review/diff a plaintext. This experience would be improved a lot if the input and output could be separated. A reviewer would then be able to decide whether the changes was cause by code change, or purely external changes and rejection of the notebook.

alexbjorling commented 8 months ago

This feature would be very helpful for cases where execution is time-consuming, or relies on the availability of input data or tricky code dependencies. With separate output, the .ipynb.output file could be managed with (eg) git LFS, making the .ipynb diffs easy to review and still allowing retension and versioning of the output.

th0ger commented 8 months ago

@alexbjorling LFS is a good point. Notebook output is very suitable for LFS, but input cells are not.

Tyrrx commented 8 months ago

I think cleaning the notebook can only be seen as a workaround.

zmbc commented 5 months ago

Yes, this would be a huge improvement. I believe this is why Quarto embeds Python in Markdown as a "plain text representation of notebooks."

If the .ipynb itself could be in a readable plain-text format, and the outputs stored in a separate file, that would:

carschandler commented 3 months ago

Hugely in support of this! Even if it isn't a default behavior, it would be amazing to have the option.

zmbc commented 3 months ago

Surprised not to see anyone mention this yet, this jupyter extension does almost exactly what this thread describes: https://jupytext.readthedocs.io/en/latest/paired-notebooks.html