kubernetes / git-sync

A sidecar app which clones a git repo and keeps it in sync with the upstream.
Apache License 2.0
2.16k stars 409 forks source link

Container hangs in combination with one-time, PVC, and exechook-command #761

Closed sboardwell closed 1 year ago

sboardwell commented 1 year ago

What?

A container which uses a persistent volume and the options --one-time and --exechook-command hangs on the second run.

For example, subsequent attempts could occur when used in a kubernetes init-container and the exechook command fails. In this case, kubernetes would try to restart the init container.

To reproduce using docker:

# create a tmp directory
cd $(mktemp -d)

# run first-time with a persistent volume
❯ docker run -v $(pwd):$(pwd) -w $(pwd) -u$(id -u):$(id -g) \       
    registry.k8s.io/git-sync/git-sync:v3.6.7 \
        --one-time=true \
        --root $(pwd) \
        --repo=https://github.com/kubernetes/git-sync \
        --branch=master \
        --exechook-command=/bin/true
INFO: detected pid 1, running init handler
I0627 09:28:33.083598      12 main.go:401] "level"=0 "msg"="starting up" "pid"=12 "args"=["/git-sync","--one-time=true","--root","/tmp/tmp.n75Xjwjr2G","--repo=https://github.com/kubernetes/git-sync","--branch=master","--exechook-command=/bin/true"]
I0627 09:28:33.088387      12 main.go:950] "level"=0 "msg"="cloning repo" "origin"="https://github.com/kubernetes/git-sync" "path"="/tmp/tmp.n75Xjwjr2G"
I0627 09:28:34.975414      12 main.go:760] "level"=0 "msg"="syncing git" "rev"="HEAD" "hash"="0753bd511ffbb00fff5b4f20f3f0de0896f553c0"
I0627 09:28:34.986726      12 main.go:800] "level"=0 "msg"="adding worktree" "path"="/tmp/tmp.n75Xjwjr2G/0753bd511ffbb00fff5b4f20f3f0de0896f553c0" "branch"="origin/master"
I0627 09:28:35.067422      12 main.go:860] "level"=0 "msg"="reset worktree to hash" "path"="/tmp/tmp.n75Xjwjr2G/0753bd511ffbb00fff5b4f20f3f0de0896f553c0" "hash"="0753bd511ffbb00fff5b4f20f3f0de0896f553c0"
I0627 09:28:35.067440      12 main.go:865] "level"=0 "msg"="updating submodules"
I0627 09:28:35.092118      12 exechook.go:73] "level"=0 "msg"="running exechook" "command"="/bin/true" "timeout"=30000000000

# second time hangs...
❯ timeout 10 docker run -v $(pwd):$(pwd) -w $(pwd) -u$(id -u):$(id -g) \
    registry.k8s.io/git-sync/git-sync:v3.6.7 \
        --one-time=true \
        --root $(pwd) \
        --repo=https://github.com/kubernetes/git-sync \
        --branch=master \
        --exechook-command=/bin/true
INFO: detected pid 1, running init handler
I0627 09:29:58.366198      12 main.go:401] "level"=0 "msg"="starting up" "pid"=12 "args"=["/git-sync","--one-time=true","--root","/tmp/tmp.n75Xjwjr2G","--repo=https://github.com/kubernetes/git-sync","--branch=master","--exechook-command=/bin/true"]

❯ echo $?
124

Expected Behaviour

The process should not hang and the exechook command should be called again as in the first run.

thockin commented 1 year ago

I see the issue.

On Tue, Jun 27, 2023 at 2:39 AM Steve Boardwell @.***> wrote:

What?

A container which uses a persistent volume and the options --one-time and --exechook-command hangs on the second run.

For example, subsequent attempts could occur when used in a kubernetes init-container and the exechook command fails. In this case, kubernetes would try to restart the init container.

To reproduce using docker:

create a tmp directorycd $(mktemp -d)

run first-time with a persistent volume

❯ docker run -v $(pwd):$(pwd) -w $(pwd) -u$(id -u):$(id -g) \ registry.k8s.io/git-sync/git-sync:v3.6.7 \ --one-time=true \ --root $(pwd) \ --repo=https://github.com/kubernetes/git-sync \ --branch=master \ --exechook-command=/bin/true INFO: detected pid 1, running init handler I0627 09:28:33.083598 12 main.go:401] "level"=0 "msg"="starting up" "pid"=12 "args"=["/git-sync","--one-time=true","--root","/tmp/tmp.n75Xjwjr2G","--repo=https://github.com/kubernetes/git-sync","--branch=master","--exechook-command=/bin/true"] I0627 09:28:33.088387 12 main.go:950] "level"=0 "msg"="cloning repo" "origin"="https://github.com/kubernetes/git-sync" "path"="/tmp/tmp.n75Xjwjr2G" I0627 09:28:34.975414 12 main.go:760] "level"=0 "msg"="syncing git" "rev"="HEAD" "hash"="0753bd511ffbb00fff5b4f20f3f0de0896f553c0" I0627 09:28:34.986726 12 main.go:800] "level"=0 "msg"="adding worktree" "path"="/tmp/tmp.n75Xjwjr2G/0753bd511ffbb00fff5b4f20f3f0de0896f553c0" "branch"="origin/master" I0627 09:28:35.067422 12 main.go:860] "level"=0 "msg"="reset worktree to hash" "path"="/tmp/tmp.n75Xjwjr2G/0753bd511ffbb00fff5b4f20f3f0de0896f553c0" "hash"="0753bd511ffbb00fff5b4f20f3f0de0896f553c0" I0627 09:28:35.067440 12 main.go:865] "level"=0 "msg"="updating submodules" I0627 09:28:35.092118 12 exechook.go:73] "level"=0 "msg"="running exechook" "command"="/bin/true" "timeout"=30000000000

second time hangs...

❯ timeout 10 docker run -v $(pwd):$(pwd) -w $(pwd) -u$(id -u):$(id -g) \ registry.k8s.io/git-sync/git-sync:v3.6.7 \ --one-time=true \ --root $(pwd) \ --repo=https://github.com/kubernetes/git-sync \ --branch=master \ --exechook-command=/bin/true INFO: detected pid 1, running init handler I0627 09:29:58.366198 12 main.go:401] "level"=0 "msg"="starting up" "pid"=12 "args"=["/git-sync","--one-time=true","--root","/tmp/tmp.n75Xjwjr2G","--repo=https://github.com/kubernetes/git-sync","--branch=master","--exechook-command=/bin/true"]

❯ echo $? 124

The process should not hang and the exechook command should be called again as in the first run.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/git-sync/issues/761, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVHSQVDW2JNTVPS3ZHTXNKS3JANCNFSM6AAAAAAZVKPS5Y . You are receiving this because you are subscribed to this thread.Message ID: @.***>

thockin commented 1 year ago

Good news is that this does not exist in v4

sboardwell commented 1 year ago

Nice! Is there a release date planned for v4?

thockin commented 1 year ago

v4.0.0-rc2 is tagged and ready for testing (registry.k8s.io/git-sync/git-sync:v4.0.0-rc2) ! Please read the release notes - there are some flag changes (hence the major-version increment).

I'd love to get feedback - it's a big release.

I've got a fix for v3, too, but it is a code change

thockin commented 1 year ago

https://github.com/kubernetes/git-sync/pull/762

sboardwell commented 1 year ago

Excellent. Thanks for the quick response. Will look at using v4-rcX in our testing environments and get back to you.

thockin commented 1 year ago

Fixed in v3 branch, if/when we cut another release.