srstevenson / nb-clean

Clean Jupyter notebooks for version control. Remove metadata, outputs, and execution counts with Git and pre-commit support.
https://pypi.org/project/nb-clean
ISC License
135 stars 18 forks source link

Only preserve `cells`. #157

Closed yasirroni closed 1 year ago

yasirroni commented 1 year ago

What do you think about only preserve cells? It means that it will clean all except 'cells`?

In the notebook example, it will destroy:

 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:Python3] *",
   "language": "python",
   "name": "conda-env-Python3-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
yasirroni commented 1 year ago

Maybe option --remove-non-cells?

If you are interested, I might be able to help. Thanks.

yasirroni commented 1 year ago

This is the target notebooks.

https://github.com/yasirroni/nb-clean/blob/remove_non_cells/tests/notebooks/clean_only_cells.ipynb


The bare minimum that I found for notebook to be able to be rendered is:

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Hello, world\")"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 2
}
srstevenson commented 1 year ago

metadata, nbformat and nbformat_minor must remain as they're required fields as per the schema.

I'd prefer we expressed this as --remove-{thing to remove} rather than --remove-non-{thing to keep} as it composes better with existing options, and is more robust to changes to the notebook schema in future which add additional fields.

yasirroni commented 1 year ago

Then, we can make it simple by cleaning the contents of metadata. About the nbformat and nbformat_minor, I don't know the best default value for all. Maybe shouldn't touch it for now.

The conclusion then, support --remove-notebook-metadata, making it {}.

srstevenson commented 1 year ago

nbformat and nbformat_minor shouldn't be mutated. When we read the notebook with nbformat.read, we pass version=nbformat.NO_CONVERT to prevent version conversion.

yasirroni commented 1 year ago

nbformat and nbformat_minor shouldn't be mutated. When we read the notebook with nbformat.read, we pass version=nbformat.NO_CONVERT to prevent version conversion.

Agreed.

So, do you support if I add clean metadata? Because it seems not yet supported.

--remove-notebook-metadata make metadata value {}.

haplav commented 1 year ago

This is an issue also for me. It seems that no contents of metadata are mandatory per schema.

E.g. /metadata/kernelspec/display_name may get changed easily even on the same system. So it's very disturbing to have such data versioned.

I use VS Code and it also dumps

 "metadata": {
  ...
  "vscode": {
   "interpreter": {
    "hash": "<some_hash_string>"
   }
  }
 }

Also a good candidate for filtering.

Alternatively (or additionally), nb-clean could perhaps be even more flexible and support e.g. YAML paths. So that one could specify e.g. nb-convert --remove-path=/metadata/vscode or something like that.

yasirroni commented 1 year ago

Hi @haplav, I implement --remove-notebook-metadata in https://github.com/srstevenson/nb-clean/pull/169.

But, after I'm using nbQA, I think the best approach is to add:

 "metadata": {
  "language_info": {
   "name": "python"
  }

Do you have any suggestion to improve that PR? Thank you,

github-actions[bot] commented 1 year ago

This issue was closed due to inactivity. Please reopen if still relevant.