kynan / nbstripout

strip output from Jupyter and IPython notebooks
Other
1.19k stars 95 forks source link

Metadata keys containing periods are unstrippable #143

Closed baldwint closed 3 years ago

baldwint commented 3 years ago

I work with databricks notebooks, which (when exported to the ipynb format) add metadata fields with keys:

at the cell and notebook levels, respectively.

Stripping these does not work using this tool because nbstripout interprets the period . as a dictionary nesting delimiter when I pass these names as arguments to the extra keys argument.

For example, nbstripout --extra-keys='cell.metadata.application/vnd.databricks.v1+cell metadata.application/vnd.databricks.v1+notebook' my_notebook.ipynb will not work.

I have a modified version of nbstripout that fixes the issue and I will open a pull request for it.

robertofierimonte commented 1 week ago

Hey @kynan, @baldwint how do I enable these extra keys filters for period keys when using the tools as a pre-commit?

This is my pre-commit config but it does not seem to strip out Databricks metadata:

- repo: https://github.com/kynan/nbstripout
    rev: 0.7.1
    hooks:
        - id: nbstripout
           args: ["--extra-keys='metadata.application/vnd.databricks.v1+notebook cell.metadata.application/vnd.databricks.v1+cell'"]

I can strip out the notebooks fine if I run nbstripout with the extra keys from the CLI.