kynan / nbstripout

strip output from Jupyter and IPython notebooks
Other
1.19k stars 95 forks source link

Cell `id`s not stripped #146

Closed zmoon closed 3 years ago

zmoon commented 3 years ago

I am using nbstripout as a pre-commit hook and noticed that cell ids are not stripped. I tried doing it manually and didn't have any issues when loading the nb again (it regenerated them). I don't see anything in the readme about configuring nbstripout to do this. Seems like it would be a good thing to strip (or at least filter) since it creates unhelpful diffs.

devmcp commented 3 years ago

I'm also seeing this. It only started happening in the last week or so and I've never had the issue before. Did you find a solution/workaround?

I'm using JupyterLab 3.0.9 and nbstripout 0.3.9.

IsabellLehmann commented 3 years ago

I also didn't notice that before but today I realized it the first time.

devmcp commented 3 years ago

Ah. I really should have clicked the link in @zmoon's post. This is a new field in nbformat v4.5. Reverting the nbformat_minor attribute in the notebook metadata from 5 to 4 stops JupyterLab adding this in every time and so works around this problem for now.

IsabellLehmann commented 3 years ago

If somebody reads @devmcp 's post and wonders where exactly to change this: You can open the notebook e.g. with Notepad++ or another text editor. Then scroll down until the end, there you'll find it. If you already have cell IDs, you need to delete these lines manually when you change nbformat_minor to 4, otherwise you will get a saving problem when you open the file again with Jupyter notebook.

kynan commented 3 years ago

148 is an open PR for stripping the id field.

Is there ever any value in keeping the id field? Would people like to have an option to configure this?

zmoon commented 3 years ago

@kynan It seems like for some of the cases mentioned here there might be a situations (maybe in the future where there is more tooling that uses id) where one might want to preserve them while also using nbstripout. But they're not helpful if they change every time, which is the issue I was having.

kynan commented 3 years ago

What do you mean by "they change every time"? Under which circumstances? I just experimented a bit and it looks like existing ids are never changed.

I think we do want an option to preserve them, especially if they will actually be used at some point in future.

Currently, --extra-keys only works for metadata, but there's actually no reason why it couldn't work for any cell data. I prefer that generic way over always stripping ids as #148 does.

zmoon commented 3 years ago

What do you mean by "they change every time"? Under which circumstances? I just experimented a bit and it looks like existing ids are never changed.

@kynan I am not sure what the circumstances were; I'll see if I can recreate it. But that's when I was inspired to raise this issue. Example.

Edit: Looks like it was a JupyterLab issue (https://github.com/jupyterlab/jupyterlab/issues/9645) and was fixed recently.

I think we do want an option to preserve them, especially if they will actually be used at some point in future.

:+1:

Currently, --extra-keys only works for metadata, but there's actually no reason why it couldn't work for any cell data. I prefer that generic way over always stripping ids as #148 does.

Yeah this seems better than adding another flag to en/disable stripping ids.

kynan commented 3 years ago

FYI, I've decided to not strip ids by default for now. I've added documentation to the README on how to strip ids if you want to.

kynan commented 3 years ago

Sorry, I'll have to take that back, it won't work: nbformat generates the ids in its validation if not present, so we're out of luck. Stripping the id will in fact guarantee a diff as a new id will be generated.

I had assumed it was the notebook generating ids, but it's actually baked in to the file format.

Your only option is to explicitly specify version 4.4 of the file format if you really don't want ids.