srstevenson / nb-clean

Clean Jupyter notebooks for version control. Remove metadata, outputs, and execution counts with Git and pre-commit support.
https://pypi.org/project/nb-clean
ISC License
135 stars 18 forks source link

support_dir_for_clean #150

Closed yasirroni closed 1 year ago

yasirroni commented 1 year ago

Support something like:

nb-clean clean notebooks --preserve-cell-outputs

Only implemented on clean.


Test is not written yet, if you want to merge this, I might be able to help to write the test.


Anyway, it seems that you use formatter. Can you share the git pre-commit hook?

yasirroni commented 1 year ago

Wow, after checking other PR, it seems have almost the same purpose as https://github.com/srstevenson/nb-clean/pull/148

But, my implementation did not require new argument. Which ever you prefer.

yasirroni commented 1 year ago

For the test failure, I don't understand the test:

    mock_read.assert_called_once_with(
        sys.stdin, as_version=nbformat.NO_CONVERT
    )
FAILED tests/test_cli.py::test_clean_stdin - AssertionError: Expected 'read' to be called once. Called 0 times.
yasirroni commented 1 year ago

Using nb-clean is working perfectly. Sadly, add filter seems not working partly.

  File "C:\Data\Git\nb-clean\src\nb_clean\cli.py", line 116, in clean
    if input_.is_dir():
AttributeError: '_io.TextIOWrapper' object has no attribute 'is_dir'

So, what is actually the data type of input_? Because in clean, it is Path.

srstevenson commented 1 year ago

Hi @yasirroni, thanks for working on this! Unfortunately, you were working on this at the same time as #148, so this is probably redundant. I'll still answer your questions in case it's useful though!

Anyway, it seems that you use formatter. Can you share the git pre-commit hook?

We use Black for code formatting. There isn't a pre-commit hook setup, however, the configuration is present in pyproject.toml so it's sufficient to run poetry run black (you can then check everything is formatted and lint-clean with poetry run nox, which is the command run in CI).

So, what is actually the data type of input_? Because in clean, it is Path.

The type is Union[pathlib.Path, _io.TextIOWrapper]: pathlib.Path for when filenames are passed as arguments and we read from the file, and _io.TextIOWrapper when no filenames are passed and we read from the stdin stream.

yasirroni commented 1 year ago

Thanks for answering. I can't pass the test too anyway. Still unfamiliar with sys.stdin and _io.TextIOWrapper.

I will close this PR if it is confirmed that providing only the directory will clean all notebooks there (or even subfolder?).

srstevenson commented 1 year ago

Implemented in #148.