kynan / nbstripout

strip output from Jupyter and IPython notebooks
Other
1.19k stars 94 forks source link

Any examples for --keep-id option? #196

Closed zy-cai closed 4 weeks ago

zy-cai commented 3 months ago

Hi, I just started using nbstripout and it worked well in my case except the "--keep-id" option.

I would like to keep all the cell ids in my notebook. However, I failed to configure anything and the ids were reassigned everytime I push the .ipynb files.

I've tried running "nbstripout --keep-id" alone but there was no response from the terminal until I interrupted it. I also tried "nbstripout --keep-id notebook.ipynb" but it did not work either.

Is there a quick tutorial about how to set up such options? The ReadMe for this part is not clear to newbies like me.

Thank you in advance!

kynan commented 3 months ago

@zy-cai Given you closed this, does that mean you found the answer in the README? Or how did you resolve this? :)

zy-cai commented 3 months ago

@zy-cai Given you closed this, does that mean you found the answer in the README? Or how did you resolve this? :)

No, I thought that would be an easy and quick fix since it has been released.

Indeed I also ran the test case for this option and it passed. Then I tried again "nbstripout --keep-id NOTEBOOK.ipynb" which returned me an invalid notebook. At the end I got discouraged to wait for a solution and switched to nbconvert that worked the best in my case.

kynan commented 3 months ago

@zy-cai Can you confirm what exactly was invalid when you ran nbstripout --keep-id NOTEBOOK.ipynb ? Is the notebook you used publicly available?

zy-cai commented 3 months ago

@kynan Unfortunately I cannot reproduce the problem that make the nb invalid. I could report a new issue once I have it again.

But there is indeed some bug for the "--keep-id" option, right? Running nbstripout --keep-id NOTEBOOK.ipynb and then doing git add + commit + push (to remote) will still upload a nb with reassigned cell IDs.

kynan commented 2 months ago

@zy-cai Using --keep-id works as expected for me. Can you provide more detailed reproduction steps please?

zy-cai commented 2 months ago

@kynan Sorry for the late response. Below is what I did in the mac terminal to reproduce the problem:

  1. Create a conda environment and install related packages conda create --name nbstrip conda activate nbstrip conda install pip conda install -c conda-forge nbstripout

  2. Go to the local directory that I cloned from GitHub (https://github.com/zy-cai/test-nbstripout.git), and install the git filter nbstripout --install --attributes .gitattributes

  3. Create a new ipynb file (nbformat version: 4.5). Then I did the Git push to remote: git add . git commit -m "upload test nb" git push

  4. Obviously this did not work, and the file was reassigned with new cell ids. So I created another ipynb, and tried with: nbstripout --keep-id nbstripout_test_2.ipynb git add . git commit -m "upload test nb 2" git push

  5. Same as before -- the "nbstripout_test_2.ipynb" file in the remote has different cell ids from the local one.

In my case, what I need is to keep the ids of the remote notebooks the same as my local ones. I hope the steps above help. I think it's probably because I did not use the command properly, but that's why I was requesting an example that could guide me step by step. Thanks.

kynan commented 1 month ago

@zy-cai OK, I understand what's going on now: what is happening is "working as intended" though maybe surprising.

In your step 4. you are first stripping the notebook, keeping cell ids. However when you then then actually commit the file, the --keep-id flag is not applied: what happens there is defined by the filters.nbstripout.clean, which is by default set to python -m nbstripout. You can check this with git config filters.nbstripout.clean.

To fix this, you need to edit .git/config in your local repository and add the --keep-id flag to both the clean command in the [filter "nbstripout"] section and the textconv command in the [diff "ipynb"] section. The latter is not strictly needed, however ensures git diff --cached gives the correct output after you git add a notebook file. These 2 sections should then look like the following (python is actually the full path to your Python executable):

[filter "nbstripout"]
    clean = python -m nbstripout --keep-id
    smudge = cat
[diff "ipynb"]
    textconv = python -m nbstripout -t --keep-id

Side note: pushing to a remote repository is irrelevant. What I think you mean is you want to preserve cell ids between your Git repository and working copy.

zy-cai commented 4 weeks ago

@kynan Thank you for the walkthrough. Now it works perfectly!