kynan / nbstripout

strip output from Jupyter and IPython notebooks
Other
1.19k stars 94 forks source link

nbstripout installation causes incorrect 'git status' results #65

Open jykim opened 7 years ago

jykim commented 7 years ago

I seem to see changes for notebooks I haven't touched since I installed nbstripout. Interestingly, when I uninstall nbstripout and install it again, the changes are gone. Besides general performance issue, this prevents us from adopting nbstripout as a team.

 $ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)
... list of notebooks I haven't touched ...

----(after uninstall nbstripout and install it again)---

 $ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean
kynan commented 7 years ago

What do you get as git diff when you hit the case of allegedly modified notebook files?

jykim commented 7 years ago

I generally see nothing when I run get diff on those files, but I haven't tested them all. (can't share them since it's my work notebooks anyways) As I mentioned, It's especially interesting that these files are gone when I uninstall nbstripout and install it again, which I weird.

jykim commented 7 years ago

Today I found that these changes show up after 'git pull', and I looked at a few other diffs and found that they are diffs between before) notebooks with output after) notebook with output stripped. So maybe the git maybe telling me to strip the output out from some notebooks which contains outputs from previous commits.

Does this make sense? But I still don't understand why these notebooks show up after 'git pull' and why they're gone after uninstall/reinstall.

kynan commented 7 years ago

So what exactly does git status report in this situation? Are those notebooks modified? Or is it a mode change?

Also could you run a nbstripout --status before and after the reinstall and report if there's a difference?

kynan commented 6 years ago

@jykim Any update?

stas00 commented 6 years ago

This might be related to an issue I posted about here. I still haven't figured it out, so any help is welcome.

melaanya commented 5 years ago

I have the same problem: when nbstripout is enabled and I checkout several files, the omitting of changes in these files is not reflected by git status (git status shows their presence, the changes are actually omitted). If I do 'nbstripout --uninstall' for the repo and then do 'nbstripout --install' again then the changes are reflected by git status.

stas00 commented 5 years ago

@melaanya, try to change the configuration of the content filters as explained here and it'll help you debugging the issue. You will be able to tell when nbstripout is activated.

kynan commented 5 years ago

@stas00 I've looked at your issue and am about as puzzled as you are. Thanks for documenting it so thoroughly. Guess we'll have to find a better git plumber than I am to explain this behavior.

stas00 commented 5 years ago

Currently, pretty much any time one of our developers forgets to enable the filter and commits w/o stripping the notebooks out, it breaks git pull for anybody who already had a checked out version. And other than doing a new clone the only solution seems to be is to disable the filter, git pull and reenable the filter.

What we really need is a way to enforce content filters repo-wide, except git doesn't support that due to security reasons (overriding user's config is unsafe). I'm not sure what to do about it. It's a constant problem for us. Perhaps there is a need to re-design content filters so that they aren't configured via a private .gitconfig and can be automatically installed for everybody (note that our repo carries its own version of the nbstripout filter, so the culprit is just not being able to enforce the configuration). Unless perhaps I'm missing something and you have some insights of another way of configuring it, so that the user doesn't need to do anything.

And of course, there are server-side hooks which could do the policing, but it doesn't sound like the best solution. In particular since last I looked github doesn't support those.

kynan commented 5 years ago

Yeah, to me this sounds like a Git issue rather than an nbstripout issue and I'm not sure what (if anything) we could fix in nbstripout to help with this. Have you considered raising the issue on the Git mailing list?

stas00 commented 5 years ago

I already did so to no avail.

If it weren't for github and we were running our own git backend then server-side hooks could do the policing, but since github doesn't support that, there is nothing one can do to enforce that.

The best "enforcing" solution I found is to run a CI job that flags this problem as it appears, and hopefully the developers watch for [x]'s in their commit status on github.

echo "Check we are starting with clean git checkout"
if [ -n "$(git status -uno -s)" ]; then echo "git status is not clean"; false; fi
echo "Trying to strip out notebooks"
tools/fastai-nbstripout -d dev_nb/*ipynb dev_nb/*/*ipynb docs_src/*ipynb docs_src/*/*ipynb
echo "Check that strip out was unnecessary"
git status -s # display the status to see which nbs need cleaning up
if [ -n "$(git status -uno -s)" ]; then echo -e "!!! Detected unstripped out notebooks\n!!!Remember to run tools/run-after-git-clone"; false; fi

and the best resolution I found is:

disable the filters
git pull
fix the problems
enable the filters
git commit -a
git push

which in the fastai project is done with a script (-d is for disable):

tools/trust-origin-git-config -d
git pull
tools/trust-origin-git-config
kynan commented 4 years ago

FYI, we're discussing adding a dedicated nbstripout command to help with this flow in #108. Could be worth taking inspiration from the fastai scripts.

DanielRudnicki commented 3 years ago

I see the same problem, stopped using the tool after 2 days.

kynan commented 3 years ago

@DanielRudnicki Reproduction steps for the failures you're seeing would help.

rhstanton commented 2 months ago

I think I'm seeing the same issue and may have found a fix. I've been getting (erroneous) complaints for ages about modified ipynb files in (just cloned or pulled) git repositories, and finally decided to track it down. I found that the error occurs when file ~/.config/git/attributes contains the lines

*.ipynb filter=nbstripout
*.zpln filter=nbstripout
*.ipynb diff=ipynb

If I comment out all these lines, the error goes away.

Note that ~/.gitconfig contains the lines

[filter "nbstripout"]
        clean = \"/opt/anaconda3/bin/python3\" -m nbstripout
        smudge = cat
[diff "ipynb"]
      textconv = \"/opt/anaconda3/bin/python3\" -m nbstripout -t

So nbstripout is still installed and runs fine.

Also note that I've encountered the same issue (and the same fix works) on both MacOS and Ubuntu machines.

kynan commented 2 months ago

@rhstanton Are you sure nbstripout is still active when you comment out the attributes? I'd find that very surprising. How would git know to apply the nbstripout filter to *.ipynb files?

rhstanton commented 2 months ago

No, I'm no longer sure! I just tried a small test where I reran some code in a notebook file, and without the attributes, git diff reports that the file has changed, while with the attributes, git diff reports no changes. I guess my "fix" was just to turn off nbstripout - not as interesting as I'd hoped...

kynan commented 2 months ago

OK, thanks for confirming! If you have any further insights on this issue please send them along - this one is still a mystery!