Open jbursey opened 3 years ago
Its not a bad idea. But if keeping cell output out of source control is your primary concern, the easiest solution is to just clear the outputs before committing. There are a few ways to do that:
Use Jupyter's shortcut to "clear all cell output"
Use nbconvert to clear the notebook outputs before committing.
You could also just write your own shell script to clear outputs. I wrote one using jq
to do that and it is fairly easy.
Some folks also choose to just convert the notebook to python using nbconvert and then just commit that. If you search for "How to version control jupyter notebooks" you will see a bunch of posts on the topic.
I think that jupyterlab already has the capability of displaying the output in a different view from the notebook.
Alternatively, Jupytext could be helpful for your case. It allows you to save notebooks as code. Then you only need to commit the code to git, whilst you can ignore the notebooks for version control.
Their paired notebooks avoid the need for automatically saving and converting the notebooks.
Related: jupyterlab/jupyterlab#9444 and jupyterlab/jupyterlab-git#392
Related question on Stack Overflow: How can I configure my tools to ignore or prevent updates to the execution_count field in a Jupyter Notebook from being tracked in git?
Good idea. The alternative discussed above are about excluding cells from source control.
But sometimes we have a need to include the executed cells in source control. (My current case is with Quarto.) Including the cell output in the .ipynb file makes it extremely difficult to review/diff a plaintext. This experience would be improved a lot if the input and output could be separated. A reviewer would then be able to decide whether the changes was cause by code change, or purely external changes and rejection of the notebook.
This feature would be very helpful for cases where execution is time-consuming, or relies on the availability of input data or tricky code dependencies. With separate output, the .ipynb.output file could be managed with (eg) git LFS, making the .ipynb diffs easy to review and still allowing retension and versioning of the output.
@alexbjorling LFS is a good point. Notebook output is very suitable for LFS, but input cells are not.
I think cleaning the notebook can only be seen as a workaround.
Yes, this would be a huge improvement. I believe this is why Quarto embeds Python in Markdown as a "plain text representation of notebooks."
If the .ipynb itself could be in a readable plain-text format, and the outputs stored in a separate file, that would:
Hugely in support of this! Even if it isn't a default behavior, it would be amazing to have the option.
Surprised not to see anyone mention this yet, this jupyter extension does almost exactly what this thread describes: https://jupytext.readthedocs.io/en/latest/paired-notebooks.html
Unless this is a feature already I think it would be nice to have a separate file (something like .ipynb.output) that links output to their cells in the .ipynb json file. This would make it significantly easier to exclude notebook outputs in source control systems like git.
If this is already possible somehow I would be interested to know.