Closed Zoynels closed 3 years ago
There's no way to do this right now, but I'll add this feature in the upcoming 0.4.0 release.
The quick and dirty solution would be to just read the notebook as json
and dump cells with cell["source"] == []
.
This is my little script I use for the job, feel free to reuse it:
"""A little tool to remove empty cells from notebooks.
Since ``nbstripout`` doesn't have this feature yet, we do it ourselves.
See: https://github.com/kynan/nbstripout/issues/131
"""
import json
from pathlib import Path
from typing import List
from typing import Optional
SCRIPT_ROOT_PATH = Path(__file__).parent
NOTEBOOK_BASE_PATH = SCRIPT_ROOT_PATH / "source" / "notebooks"
def strip_empty_cells_from_notebooks(args: Optional[List[str]] = None) -> int:
"""Strips empty cells from notebooks in NOTEBOOK_BASE_PATH."""
if args is None:
notebook_paths = NOTEBOOK_BASE_PATH.rglob("*.ipynb")
else:
notebook_paths = [Path(arg) for arg in args]
for notebook_path in notebook_paths:
notebook = json.loads(notebook_path.read_text())
originale_nr_of_cells = len(notebook["cells"])
notebook["cells"] = [cell for cell in notebook["cells"] if cell.get("source", []) != []]
if originale_nr_of_cells != len(notebook["cells"]):
print(f"Fixing: {notebook_path}")
# to ensure an `lf` newline on windows we need to use `.open` instead of `write_text`
with notebook_path.open(mode="w", encoding="utf8", newline="\n") as f:
f.write(json.dumps(notebook, indent=1) + "\n")
return 0
if __name__ == "__main__":
import sys
exit(strip_empty_cells_from_notebooks(sys.argv[1:]))
Used as pre-commit
hook:
- repo: local
hooks:
- id: strip-empty-notebook-cells
name: Strip empty notebook cells
language: system
entry: python docs/strip_empty_notebook_cells.py
types: [jupyter]
To run it on all notebooks you can use python docs/strip_empty_notebook_cells.py
or pre-commit run -a strip-empty-notebook-cells
.
If you want to manually run it for the staged files use pre-commit run strip-empty-notebook-cells
, but if the pre-commit
hooks are installed this should happen on commit anyway.
I might make it a standalone hook since I don't want to copy-paste files across projects, but this is my hotfix for now.
This is now available in nbstripout 0.4.0
This is great. How do I make it so this option is applied as part of the git filter?
@devmcp If you use pre-commit
you can simply add --strip-empty-cells
to the args
- repo: https://github.com/kynan/nbstripout
rev: 0.4.0
hooks:
- id: nbstripout
args: [--strip-empty-cells]
Thanks @s-weigand. I much prefer to use it in a git filter rather than pre-commit
to avoid modifying the working copy of the notebook. That said, I think if I use pre-commit
to only strip empty cells (by also adding --keep-count
and --keep-output
) and do the rest with the git filter, that will do the trick. Thanks!
@devmcp To use this option with the git filter, just edit your .git/config
(or ~/.gitconfig
if you installed globally) and add the flag to filter.nbstripout.clean
and diff.ipynb.textconv
Hello, is there any possibility to add option to remove cells which not have any data (or only spaces/tabs/newlines) / tags / unfiltered metadata from ipynb? I often create cells with no data, which will be deleted is some time. Of cource some people in such way separate code, but this option could be optional for people who want clear ipynb files in git. Or is there any way for this with current functionality?