kubernetes / git-sync

A sidecar app which clones a git repo and keeps it in sync with the upstream.
Apache License 2.0
2.16k stars 409 forks source link

Submodules with relative path fail to clone #763

Closed ventrebd closed 1 year ago

ventrebd commented 1 year ago

Running v4.0.0-rc2 as a k8s sidecar container.

git-sync worked great until I added a submodule to my repo. It then started failing & repeatedly restarting; looks like the root cause is that the git submodule commands are interpreting the relative path in .gitmodules as relative to the filesystem rather than relative to the original clone address (HTTPS).

Logs:

INFO: detected pid 1, running init handler
{"logger":"","ts":"2023-07-05 17:51:28.211056","caller":{"file":"main.go","line":720},"level":0,"msg":"starting up","pid":11,"uid":472,"gid":472,"home":"/tmp","flags":["--add-user=false","--change-permissions=0","--cookie-file=false","--depth=1","--exechook-backoff=3s","--exechook-timeout=30s","--git=git","--git-gc=always","--group-write=false","--help=false","--http-metrics=false","--http-pprof=false","--link=REDACTED_ROOT_REPO.git","--man=false","--max-failures=0","--max-sync-failures=0","--one-time=false","--password=REDACTED","--period=30s","--ref=HEAD","--repo=https://REDACTED_SERVER/REDACTED_ROOT_PATH/REDACTED_ROOT_REPO.git","--root=/git","--ssh=false","--ssh-key-file=/etc/git-secret/ssh","--ssh-known-hosts=true","--ssh-known-hosts-file=/etc/git-secret/known_hosts","--stale-worktree-timeout=0s","--submodules=recursive","--sync-timeout=2m0s","--timeout=0","--username=REDACTED_DEPLOY_TOKEN_USER","--v=-1","--verbose=0","--version=false","--wait=0","--webhook-backoff=3s","--webhook-method=POST","--webhook-success-status=200","--webhook-timeout=1s"]}
{"logger":"","ts":"2023-07-05 17:51:28.607087","caller":{"file":"main.go","line":1760},"level":0,"msg":"update required","ref":"HEAD","local":"REDACTED_LOCAL_SHA","remote":"REDACTED_REMOTE_SHA","syncCount":0}
{"logger":"","ts":"2023-07-05 17:51:29.015793","caller":{"file":"main.go","line":969},"msg":"too many failures, aborting","error":"Run(git submodule update --init --recursive --depth 1): exit status 1: { stdout: \"\", stderr: \"fatal: repository '/git/REDACTED_SUBMODULE_PATH/REDACTED_SUBMODULE_REPO.git' does not exist\\nfatal: clone of '/git/REDACTED_SUBMODULE_PATH/REDACTED_SUBMODULE_REPO.git' into submodule path '/git/.worktrees/REDACTED_REMOTE_SHA/REDACTED_SUBMODULE_REPO' failed\\nFailed to clone 'REDACTED_SUBMODULE_REPO'. Retry scheduled\\nfatal: repository '/git/REDACTED_SUBMODULE_PATH/REDACTED_SUBMODULE_REPO.git' does not exist\\nfatal: clone of '/git/REDACTED_SUBMODULE_PATH/REDACTED_SUBMODULE_REPO.git' into submodule path '/git/.worktrees/REDACTED_REMOTE_REPO/REDACTED_SUBMODULE_REPO' failed\\nFailed to clone 'REDACTED_SUBMODULE_REPO' a second time, aborting\" }","failCount":1}

.gitmodules file:

[submodule "REDACTED_SUBMODULE_REPO"]
    path = REDACTED_SUBMODULE_REPO
    url = ../../REDACTED_SUBMODULE_PATH/REDACTED_SUBMODULE_REPO.git

Luckily the submodule is not critical to the repo's contents, so my temporary work-around is to use --submodules=off on the sidecar container.

thockin commented 1 year ago

Thanks for trying v4! Is this the same in v3 ? Or is it new in v4?

On Wed, Jul 5, 2023, 11:26 AM Ventre, Brian D. @.***> wrote:

Running v4.0.0-rc2 as a k8s sidecar container.

git-sync worked great until I added a submodule to my repo. It then started failing & repeatedly restarting; looks like the root cause is that the git submodule commands are interpreting the relative path in .gitmodules as relative to the filesystem rather than relative to the original clone address (HTTPS).

Logs:

INFO: detected pid 1, running init handler {"logger":"","ts":"2023-07-05 17:51:28.211056","caller":{"file":"main.go","line":720},"level":0,"msg":"starting up","pid":11,"uid":472,"gid":472,"home":"/tmp","flags":["--add-user=false","--change-permissions=0","--cookie-file=false","--depth=1","--exechook-backoff=3s","--exechook-timeout=30s","--git=git","--git-gc=always","--group-write=false","--help=false","--http-metrics=false","--http-pprof=false","--link=REDACTED_ROOT_REPO.git","--man=false","--max-failures=0","--max-sync-failures=0","--one-time=false","--password=REDACTED","--period=30s","--ref=HEAD","--repo=https://REDACTED_SERVER/REDACTED_ROOT_PATH/REDACTED_ROOT_REPO.git","--root=/git","--ssh=false","--ssh-key-file=/etc/git-secret/ssh","--ssh-known-hosts=true","--ssh-known-hosts-file=/etc/git-secret/known_hosts","--stale-worktree-timeout=0s","--submodules=recursive","--sync-timeout=2m0s","--timeout=0","--username=REDACTED_DEPLOY_TOKEN_USER","--v=-1","--verbose=0","--version=false","--wait=0","--webhook-backoff=3s","--webhook-method=POST","--webhook-success-status=200","--webhook-timeout=1s"]} {"logger":"","ts":"2023-07-05 17:51:28.607087","caller":{"file":"main.go","line":1760},"level":0,"msg":"update required","ref":"HEAD","local":"REDACTED_LOCAL_SHA","remote":"REDACTED_REMOTE_SHA","syncCount":0} {"logger":"","ts":"2023-07-05 17:51:29.015793","caller":{"file":"main.go","line":969},"msg":"too many failures, aborting","error":"Run(git submodule update --init --recursive --depth 1): exit status 1: { stdout: \"\", stderr: \"fatal: repository '/git/REDACTED_SUBMODULE_PATH/REDACTED_SUBMODULE_REPO.git' does not exist\nfatal: clone of '/git/REDACTED_SUBMODULE_PATH/REDACTED_SUBMODULE_REPO.git' into submodule path '/git/.worktrees/REDACTED_REMOTE_SHA/REDACTED_SUBMODULE_REPO' failed\nFailed to clone 'REDACTED_SUBMODULE_REPO'. Retry scheduled\nfatal: repository '/git/REDACTED_SUBMODULE_PATH/REDACTED_SUBMODULE_REPO.git' does not exist\nfatal: clone of '/git/REDACTED_SUBMODULE_PATH/REDACTED_SUBMODULE_REPO.git' into submodule path '/git/.worktrees/REDACTED_REMOTE_REPO/REDACTED_SUBMODULE_REPO' failed\nFailed to clone 'REDACTED_SUBMODULE_REPO' a second time, aborting\" }","failCount":1}

.gitmodules file:

[submodule "REDACTED_SUBMODULE_REPO"] path = REDACTED_SUBMODULE_REPO url = ../../REDACTED_SUBMODULE_PATH/REDACTED_SUBMODULE_REPO.git

Luckily the submodule is not critical to the repo's contents, so my temporary work-around is to use --submodules=off on the sidecar container.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/git-sync/issues/763, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVAFW5MUUXSOQIATT3DXOWWTVANCNFSM6AAAAAAZ7KEZEM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ventrebd commented 1 year ago

TLDR: v3.6.8 works but v4.0.0-rc2 does not.

v4.0.0-rc2 was actually my first-ever use of git-sync; I did try it today with v3.6.8 as well. Turns out my original deploy token was only for the primary repo rather than both primary & submodule (did get a more helpful error of "HTTP Basic: Access denied."). After fixing the token (& having v3.6.8 work), I tried switching to v4.0.0-rc2 (with new token) & it fails with the same error as above.

thockin commented 1 year ago

Can you paste a successful v3 log?

On Thu, Jul 6, 2023 at 12:13 PM Ventre, Brian D. @.***> wrote:

TLDR: v3.6.8 works but v4.0.0-rc2 does not.

v4.0.0-rc2 was actually my first-ever use of git-sync; I did try it today with v3.6.8 as well. Turns out my original deploy token was only for the primary repo rather than both primary & submodule (did get a more helpful error of "HTTP Basic: Access denied."). After fixing the token (& having v3.6.8 work), I tried switching to v4.0.0-rc2 (with new token) & it fails with the same error as above.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/git-sync/issues/763#issuecomment-1624187515, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVEQVQO2WQ7E2HUTGOLXO4E5VANCNFSM6AAAAAAZ7KEZEM . You are receiving this because you commented.Message ID: @.***>

ventrebd commented 1 year ago
INFO: detected pid 1, running init handler
I0706 19:31:03.766962      12 main.go:401] "level"=0 "msg"="starting up" "pid"=12 "args"=["/git-sync","--repo=https://REDACTED_SERVER/REDACTED_ROOT_PATH/REDACTED_ROOT_REPO.git","--max-sync-failures=-1","--branch=main"]
I0706 19:31:03.790017      12 main.go:950] "level"=0 "msg"="cloning repo" "origin"="https://REDACTED_SERVER/REDACTED_ROOT_PATH/REDACTED_ROOT_REPO.git" "path"="/tmp/git"
I0706 19:31:04.253757      12 main.go:760] "level"=0 "msg"="syncing git" "rev"="HEAD" "hash"="REDACTED_REMOTE_SHA"
I0706 19:31:04.275038      12 main.go:800] "level"=0 "msg"="adding worktree" "path"="/tmp/git/REDACTED_REMOTE_SHA" "branch"="origin/main"
I0706 19:31:04.292381      12 main.go:860] "level"=0 "msg"="reset worktree to hash" "path"="/tmp/git/REDACTED_REMOTE_SHA" "hash"="REDACTED_REMOTE_SHA"
I0706 19:31:04.292409      12 main.go:865] "level"=0 "msg"="updating submodules"

No further logs (been running for a couple of minutes), but the submodule folder is properly populated (/tmp/git/REDACTED_ROOT_REPO.git/REDACTED_SUBMODULE_REPO/)

thockin commented 1 year ago

Sorry to be a pain, I was on mobile before. Can you run the passing and failing cases with -v 6 so I can see EVERYTHING?

thockin commented 1 year ago

Nevermind, I am able to reproduce it.

thockin commented 1 year ago

It looks like the relative path is being interpreted relative to the worktree, which move one directory deeper in v4. This means the relative path is legitimately wrong. Can you replace it with a file:///<absolute> path and try v4?

thockin commented 1 year ago

It's unclear to me how this is supposed to work. Is the relative path relative to the original repo (upstream) or the clone?

ventrebd commented 1 year ago

The path in .gitmodules is relative to the upstream. Project is at https://server/a/b/project.git, while submodule is at https://server/a/d/e/submodule.git. The relative path is necessary to support multiple clone methods (HTTPS or SSH; server is GitLab: https://docs.gitlab.com/ee/ci/git_submodules.html#using-relative-urls)

thockin commented 1 year ago

TL;DR: I didn't get it, now I do, I think. Jump to the end of this

$ pwd
/tmp/gsr

$ # This is an empty dir
$ find .
.

$ # Let's make a repo
$ mkdir -p upstream/repo

$ cd upstream/repo/

$ git init
Initialized empty Git repository in /tmp/gsr/upstream/repo/.git/

$ date > file-in-repo; git add .; git commit -am "add file"
[main (root-commit) 3cc4a76] add file
 1 file changed, 1 insertion(+)
 create mode 100644 file-in-repo

$ cd -
/tmp/gsr

$ # Let's make another for use as a sub-module
$ mkdir -p other/upstream/sub-repo

$ cd other/upstream/sub-repo

$ git init
Initialized empty Git repository in /tmp/gsr/other/upstream/sub-repo/.git/

$ date > file-in-sub-repo; git add .; git commit -am "add file"
[main (root-commit) 9668f47] add file
 1 file changed, 1 insertion(+)
 create mode 100644 file-in-sub-repo

$ cd -
/tmp/gsr

$ # Add it ass a sub
$ cd upstream/repo/

$ # The relative path resolves, relative to the upstream repo
$ ls ../../other/upstream/sub-repo/
file-in-sub-repo

$ git -c protocol.file.allow=always submodule add ../../other/upstream/sub-repo/
Cloning into '/tmp/gsr/upstream/repo/sub-repo'...
done.

$ git commit -am "add submodule"
[main f2f4fd7] add submodule
 2 files changed, 4 insertions(+)
 create mode 100644 .gitmodules
 create mode 160000 sub-repo

$ # The sub is saved as a relative path
$ cat .gitmodules 
[submodule "sub-repo"]
    path = sub-repo
    url = ../../other/upstream/sub-repo/

$ git rev-parse HEAD
f2f4fd7d023fc1b157e15016b9bc5a86e263194d

$ cd -
/tmp/gsr

$ # Let's make a client like git-sync v4 does
$ mkdir -p client/deep/dir/fetch

$ cd client/deep/dir/fetch

$ # That relative path DOES NOT resolve from the client
$ ls ../../other/upstream/sub-repo/
ls: cannot access '../../other/upstream/sub-repo/': No such file or directory

$ git init
Initialized empty Git repository in /tmp/gsr/client/deep/dir/fetch/.git/

$ git fetch file:///tmp/gsr/upstream/repo/ f2f4fd7d023fc1b157e15016b9bc5a86e263194d
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 1), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (6/6), 850 bytes | 850.00 KiB/s, done.
From file:///tmp/gsr/upstream/repo
 * branch            f2f4fd7d023fc1b157e15016b9bc5a86e263194d -> FETCH_HEAD

$ git reset --soft FETCH_HEAD

$ git worktree add wt f2f4fd7d023fc1b157e15016b9bc5a86e263194d
Preparing worktree (detached HEAD f2f4fd7)
HEAD is now at f2f4fd7 add submodule

$ # Note the dash - that means unhappy!
$ git -C wt submodule status
-9668f479028e678ce446e0e0654e67aa7d74836d sub-repo

$ # Still a relative path, but relative to the upstream!
$ cat wt/.gitmodules 
[submodule "sub-repo"]
    path = sub-repo
    url = ../../other/upstream/sub-repo/

$ # That relative path DOES NOT resolve from the client
$ ls ../../other/upstream/sub-repo/
ls: cannot access '../../other/upstream/sub-repo/': No such file or directory

$ cd wt

$ # That relative path DOES NOT resolve from the worktree
$ ls ../../other/upstream/sub-repo/
ls: cannot access '../../other/upstream/sub-repo/': No such file or directory

$ # Let's try a simpler clone
$ cd /tmp/gsr

$ mkdir -p client/deep/dir/clone

$ cd client/deep/dir/clone

$ # Emulate git-sync v3
$ git clone --no-checkout file:///tmp/gsr/upstream/repo/ .
Cloning into '.'...
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 1), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (6/6), 870 bytes | 870.00 KiB/s, done.
Resolving deltas: 100% (1/1), done.

$ git worktree add wt f2f4fd7d023fc1b157e15016b9bc5a86e263194d
Preparing worktree (detached HEAD f2f4fd7)
HEAD is now at f2f4fd7 add submodule

$ # Still unhappy
$ git -C wt submodule status
-9668f479028e678ce446e0e0654e67aa7d74836d sub-repo

$ # Still relative to upstream
$ cat wt/.gitmodules
[submodule "sub-repo"]
    path = sub-repo
    url = ../../other/upstream/sub-repo/

I wrote about how this just doesn't work across clones, and how the fix was complicated and niche, and then I realized git already has the fix (at least, I thought so). Specifically, git is looking at the origin remote and making the relative path relative to that, which is super clever. In theory, I should just need to set the origin and it should work:

$ pwd
/tmp/gsr/client/deep/dir

$ mkdir fetch2

$ cd fetch2

$ git init
Initialized empty Git repository in /tmp/gsr/client/deep/dir/fetch2/.git/

$ git remote add origin file:///tmp/gsr/upstream/repo/

$ git fetch file:///tmp/gsr/upstream/repo/ f2f4fd7d023fc1b157e15016b9bc5a86e263194d
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 1), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (6/6), 850 bytes | 850.00 KiB/s, done.
From file:///tmp/gsr/upstream/repo
 * branch            f2f4fd7d023fc1b157e15016b9bc5a86e263194d -> FETCH_HEAD

$ git reset --soft FETCH_HEAD

$ git worktree add wt f2f4fd7d023fc1b157e15016b9bc5a86e263194d
Preparing worktree (detached HEAD f2f4fd7)
HEAD is now at f2f4fd7 add submodule

$ git -C wt submodule status
-9668f479028e678ce446e0e0654e67aa7d74836d sub-repo

$ cat wt/.gitmodules
[submodule "sub-repo"]
    path = sub-repo
    url = ../../other/upstream/sub-repo/

$ git -C wt -c protocol.file.allow=always submodule update --init
Cloning into '/tmp/gsr/client/deep/dir/fetch2/wt/sub-repo'...
Submodule path 'sub-repo': checked out '9668f479028e678ce446e0e0654e67aa7d74836d'

Hot damn. Let me see about cleaning this up.

thockin commented 1 year ago

Thanks for a fun bug - I learned something new about git. After using it for 10 years, it still surprises me.