jupyterhub / nbgitpuller

Jupyter server extension to sync a git repository one-way to a local path
https://nbgitpuller.readthedocs.io
BSD 3-Clause "New" or "Revised" License
205 stars 84 forks source link

Fix sync problems after deleting files #257

Closed jdmansour closed 2 years ago

jdmansour commented 2 years ago

Hi, we have encountered problems when sometimes syncing would fail after deleting files. Issue #254 happens after you change a file locally, sync, then nbgitpuller keeps your changes, and then you delete the file. Issue #234 seems related, and happens when you delete a file locally, and from the repository.

This pull request tries to fix both bugs. First, in reset_deleted_files(), I don't checkout from the remote branch, because the file might not exist there anymore. Instead, I checkout from the local branch, and trust that the merge will give me the updated version. Second, I issue a git reset in order to unstage files. Staged files that might be lying around (for example from a failed merge, without this patch) prevent a proper merge.

I think this is the minimal change that fixes the sync problems. I also added some test cases for the issues. Please have a look if everything makes sense, I am not sure that I didn't miss any corner case!

(I should mention that I first made another version that does something like git stash; git pull --ff-only; git stash pop -Xours. It had the advantage of not introducing extra commits and keeping history clean, and also passes tests. But then I came up with this solution, since I wanted to touch the merging code as little as possible...).

welcome[bot] commented 2 years ago

Thanks for submitting your first pull request! You are awesome! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly. welcome You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

yuvipanda commented 2 years ago

Thanks a lot for working on this, @jdmansour! I really appreciate the tests too!

Can you investigate why the tests are failing?

I have to re-investigate how git reset (rather than git reset --hard) works! Does it touch any unstaged files at all? I know that git reset --hard basically wipes out anything that isn't committed, and I don't want that to happen here. A comment on what exactly git reset will do around the code that calls it would be great!

choldgraf commented 2 years ago

Looks like it's the "merge after commit" test:

https://github.com/jupyterhub/nbgitpuller/runs/5593008892?check_suite_focus=true#step:8:49

jdmansour commented 2 years ago

I had to look up what it does, too. git reset without arguments is the same as git reset --mixed. It resets the index, i.e. the "staging area", but keeps all changes to local files. Basically, it undoes git add. I added a comment and replaced it with the explicit version.

I don't know exactly how files got staged on my instance. I think that happens when a merge fails (without this patch). Starting from scratch it should not be necessary, but it helps recover when the problem has already occurred, or somebody messed around with the command line.

The reason the tests failed is that user.email and user.name are not set on the CI. I added that to the tests and now it looks green!

welcome[bot] commented 2 years ago

Congrats on your first merged pull request in this project! :tada: congrats Thank you for contributing, we are very proud of you! :heart:

yuvipanda commented 2 years ago

Thanks a lot, @jdmansour!

albertmichaelj commented 1 year ago

I'm pretty sure that this pull request broke the ability to do the following. Click the JuptyerHub link, make changes to a file (likely mess it up), click the jupyterhub link again, then delete the file, and click the jupyterhub link again and get a fresh version. Instead you end up with the messed up version.

The line that was changed that I think is responsible for this new behavior is https://github.com/jdmansour/nbgitpuller/blob/a2942a664d4c4c0f90c40e40e398b60eaac2f0f2/nbgitpuller/pull.py#L175. The git checkout now checks out from the local repository where the changes of the messed up file were committed due to the auto commit of nbgitpuller when the jupyterhub link was clicked for the second time. This is instead of the intended (and original behavior) where the file was checked out from the original branch.

However, I don't know why that line was changed in this commit, so I am loath to revert the change without understanding why that was necessary. @jdmansour can you explain why that line was changed in your pull request?

This is a big regression for my use case, and it has caused me to revert to 1.0.2.

albertmichaelj commented 1 year ago

I knew this code looked familiar! You reverted my change (commit 10dee1f51f871ed11f889caffba9125c61090ccc)! You were trying to fix #234, and I see why that is an issue with my code. However, the reversion of my change means that reset_deleted_files does not actually do what it says it's going to do (due to the autocommit of nbgitpuller).

It seems like there needs to be some additional check where before running my original file checkout command, git checks to see if the file is still in the remote repository. I think some modification of something like git cat-file -e origin/master:FAILME (see this link for reference). Should work.

I will try to create a pull request when I can. It should be as simple as checking to see if the file exists before running:

yield from execute_cmd(['git', 'checkout', 'origin/{}'.format(self.branch_name), '--', filename], cwd=self.repo_dir)

Unless I'm missing something...

jdmansour commented 1 year ago

Sorry for the breakage! I was trying to fix the case where somebody deleted the file in the remote repository, and the sync failed.

However I couldn't reproduce your problem. Could it be that the fix you are talking about is already in main?

https://github.com/jupyterhub/nbgitpuller/blob/c23461d4f7ce47ec4e0d2bcfc3b2fde4939b80f8/nbgitpuller/pull.py#L169-L172

I think I introduced the check you mentioned here: https://github.com/jupyterhub/nbgitpuller/commit/427d92c989cc74c2696b41074e6d7a3268b2f557

albertmichaelj commented 1 year ago

No worries! It was an easy fix to downgrade the version. I just started teaching a new class this week, and this was the first time I was using the most recent version of nbgitpuller, and this regression caught me and my students off guard. I just thought it was funny when I was trying to figure out what happened that I realized I had fixed this problem once before (and, apparently introduced the new bug reported in #234).

You seemed to have re-fixed the problem (correctly this time) in main. It just hasn't made it to a release yet, I guess. I'll open an issue to request a new release.

Thanks for responding!