kubernetes / git-sync

A sidecar app which clones a git repo and keeps it in sync with the upstream.
Apache License 2.0
2.21k stars 410 forks source link

Add Git submodules remote tracking #265

Open neutronth opened 4 years ago

neutronth commented 4 years ago

Add Git submodules remote tracking

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

thockin commented 3 years ago

/lifecycle frozen /remove-lifecycle stale /remove-lifecycle rotten

thockin commented 1 year ago

@nirutgupta

I finally got around to looking at this, and I think it is worth discussing the design, since I do not use this feature of git myself.

Should it be fundamentally async to the main sync loop? The current PR seems to be, but this is a problem because such updates are not atomic, meaning a consumer could see the repo in an inconsistent state. I think, for sanity, we need to do it all in-phase with the main sync loop, and do it in a new worktree.

This is an issue because there's a sort of contract that the basename of the thing the link points to is the hash that is checked out. If the remote tracking needs to update, it could be the same "main" hash. We could maybe do some indiana-jones-swapping-the-idol trickery to retain this property, creating a new worktree and playing with links, like:

thockin-glaptop4 root c017f e2e-branch /$ git worktree add --detach /tmp/git-sync-e2e.2603115845/root/c017f5ae9aab079a210fec80004b16369b9a364d.old c017f5ae9aab079a210fec80004b16369b9a364d
Preparing worktree (detached HEAD c017f5a)
HEAD is now at c017f5a add submodule3

thockin-glaptop4 root c017f e2e-branch /$ ls -l
total 4
drwxr-xr-x 5 thockin primarygroup 4096 Nov 26 11:00 c017f5ae9aab079a210fec80004b16369b9a364d
drwxr-xr-x 5 thockin primarygroup 4096 Nov 26 11:28 c017f5ae9aab079a210fec80004b16369b9a364d.old
lrwxrwxrwx 1 thockin primarygroup   41 Nov 26 11:20 link -> c017f5ae9aab079a210fec80004b16369b9a364d/

thockin-glaptop4 root c017f e2e-branch /$ ls link
file  sub1  sub2  sub3

thockin-glaptop4 root c017f e2e-branch /$ ln -s c017f5ae9aab079a210fec80004b16369b9a364d.old link.old

thockin-glaptop4 root c017f e2e-branch /$ mkdir _old

thockin-glaptop4 root c017f e2e-branch /$ ln -s ../link.old _old/c017f5ae9aab079a210fec80004b16369b9a364d

thockin-glaptop4 root c017f e2e-branch /$ ln -s _old/c017f5ae9aab079a210fec80004b16369b9a364d link.new

thockin-glaptop4 root c017f e2e-branch /$ mv -T link.new link

thockin-glaptop4 root c017f e2e-branch /$ ls -l link
lrwxrwxrwx 1 thockin primarygroup 45 Nov 26 11:22 link -> _old/c017f5ae9aab079a210fec80004b16369b9a364d

thockin-glaptop4 root c017f e2e-branch /$ ls link
file  sub1  sub2  sub3

We still need to figure out for each remote tracking branch whether it needs to be updated before we do any of this work. So roughly:

check main repo for updates check each tracking sub for updates (recursively?) if any of them need update, do a full sync

Almost certainly needs some internal refactoring to accomodate.

Then we need to consider impact on hooks - do we re-run webhook and exechook notifiers? I think we have to, but they only get the hash of the main repo as an argument. We may need to change that, too.

There may be more to consider, but this is a start :)