kynan / nbstripout

strip output from Jupyter and IPython notebooks
Other
1.19k stars 95 forks source link

flag to remove empty cell (with no data) #131

Closed Zoynels closed 3 years ago

Zoynels commented 4 years ago

Hello, is there any possibility to add option to remove cells which not have any data (or only spaces/tabs/newlines) / tags / unfiltered metadata from ipynb? I often create cells with no data, which will be deleted is some time. Of cource some people in such way separate code, but this option could be optional for people who want clear ipynb files in git. Or is there any way for this with current functionality?

kynan commented 4 years ago

There's no way to do this right now, but I'll add this feature in the upcoming 0.4.0 release.

s-weigand commented 3 years ago

The quick and dirty solution would be to just read the notebook as json and dump cells with cell["source"] == []. This is my little script I use for the job, feel free to reuse it:

"""A little tool to remove empty cells from notebooks.
Since ``nbstripout`` doesn't have this feature yet, we do it ourselves.
See: https://github.com/kynan/nbstripout/issues/131
"""
import json
from pathlib import Path
from typing import List
from typing import Optional

SCRIPT_ROOT_PATH = Path(__file__).parent
NOTEBOOK_BASE_PATH = SCRIPT_ROOT_PATH / "source" / "notebooks"

def strip_empty_cells_from_notebooks(args: Optional[List[str]] = None) -> int:
    """Strips empty cells from notebooks in NOTEBOOK_BASE_PATH."""

    if args is None:
        notebook_paths = NOTEBOOK_BASE_PATH.rglob("*.ipynb")
    else:
        notebook_paths = [Path(arg) for arg in args]

    for notebook_path in notebook_paths:
        notebook = json.loads(notebook_path.read_text())
        originale_nr_of_cells = len(notebook["cells"])
        notebook["cells"] = [cell for cell in notebook["cells"] if cell.get("source", []) != []]
        if originale_nr_of_cells != len(notebook["cells"]):
            print(f"Fixing: {notebook_path}")
            # to ensure an `lf` newline on windows we need to use `.open` instead of `write_text`
            with notebook_path.open(mode="w", encoding="utf8", newline="\n") as f:
                f.write(json.dumps(notebook, indent=1) + "\n")

    return 0

if __name__ == "__main__":
    import sys

    exit(strip_empty_cells_from_notebooks(sys.argv[1:]))

Used as pre-commit hook:

  - repo: local
    hooks:
      - id: strip-empty-notebook-cells
        name: Strip empty notebook cells
        language: system
        entry: python docs/strip_empty_notebook_cells.py
        types: [jupyter]

To run it on all notebooks you can use python docs/strip_empty_notebook_cells.py or pre-commit run -a strip-empty-notebook-cells. If you want to manually run it for the staged files use pre-commit run strip-empty-notebook-cells, but if the pre-commit hooks are installed this should happen on commit anyway.

I might make it a standalone hook since I don't want to copy-paste files across projects, but this is my hotfix for now.

kynan commented 3 years ago

This is now available in nbstripout 0.4.0

devmcp commented 3 years ago

This is great. How do I make it so this option is applied as part of the git filter?

s-weigand commented 3 years ago

@devmcp If you use pre-commit you can simply add --strip-empty-cells to the args


  - repo: https://github.com/kynan/nbstripout
    rev: 0.4.0
    hooks:
      - id: nbstripout
        args: [--strip-empty-cells]
devmcp commented 3 years ago

Thanks @s-weigand. I much prefer to use it in a git filter rather than pre-commit to avoid modifying the working copy of the notebook. That said, I think if I use pre-commit to only strip empty cells (by also adding --keep-count and --keep-output) and do the rest with the git filter, that will do the trick. Thanks!

kynan commented 3 years ago

@devmcp To use this option with the git filter, just edit your .git/config (or ~/.gitconfig if you installed globally) and add the flag to filter.nbstripout.clean and diff.ipynb.textconv