srstevenson / nb-clean

Clean Jupyter notebooks of outputs, metadata, and empty cells, with Git integration
https://pypi.org/project/nb-clean/
ISC License
128 stars 18 forks source link

[BUG] Using preserve cell metadata flag before filename not working as expected #255

Closed DanielTsiang closed 5 months ago

DanielTsiang commented 5 months ago

Problem

Using the current latest versions of nb-clean i.e. 3.2.0, checking the notebook with preserving the cell metadata flag before the filename causes the command to hang indefinitely.

Examples

Working as expected

$ nb-clean check notebook.ipynb --preserve-cell-metadata
notebook.ipynb metadata: language_info.version

Not working as expected

These just hang and it never finishes the commands:

$ nb-clean check --preserve-cell-metadata notebook.ipynb 
$ nb-clean check --preserve-notebook-metadata --preserve-cell-metadata notebook.ipynb 

Other working examples

These commands also work as expected:

$ nb-clean check --preserve-notebook-metadata notebook.ipynb
notebook cell 12: metadata
$ nb-clean check notebook.ipynb --preserve-notebook-metadata
notebook cell 12: metadata
$ nb-clean check notebook.ipynb --preserve-cell-metadata --preserve-notebook-metadata
$ echo $?
0
srstevenson commented 5 months ago

This is an interesting case!

--preserve-cell-metadata isn't a boolean flag like --preserve-notebook-metadata: whereas --preserve-notebook-metadata takes no arguments, --preserve-cell-metadata can take a list of metadata fields to be preserved such as:

$ nb-clean clean notebook.ipynb --preserve-cell-metadata tags special

In this case, the tags and special fields will be preserved, but any others will be removed. If you don't pass any arguments to --preserve-cell-metadata, all cell metadata fields are removed:

$ nb-clean clean notebook.ipynb --preserve-cell-metadata

However, because --preserve-cell-metadata can take zero or more arguments, the argument parser assigns any positional arguments following this flag as fields to preserve rather than notebook paths to process--it can't tell the difference. It then hangs as it's waiting to receive the notebook contents on standard input. You can try this out by piping in the notebook contents, and seeing that it now executes immediately:

$ nb-clean check --preserve-cell-metadata notebook.ipynb < notebook.ipynb
stdin metadata: language_info.version

In practice, this means if you're not passing any custom fields to preserve, the --preserve-cell-metadata flag needs to appear after the paths to be processed, as you found here, or the notebook path should be preceded with -- to tell the argument parser it's not an argument to --preserve-cell-metadata:

$ nb-clean check --preserve-cell-metadata -- notebook.ipynb
notebook.ipynb metadata: language_info.version
srstevenson commented 5 months ago

I'll close this now, as we added documentation about it in #254.

DanielTsiang commented 5 months ago

That makes sense, thanks for explaining and amending the documentation!

For future reference, this is also a possible alternative if the user wishes to keep --preserve-cell-metadata at the end of the piped xargs command in the CI where nb-clean is run on each filename line individually:

echo  "file 1.ipynb\nfile 2.ipynb\nfile 3.ipynb" | xargs -r -I {} nb-clean check "{}" --preserve-notebook-metadata --preserve-cell-metadata