kubernetes / git-sync

A sidecar app which clones a git repo and keeps it in sync with the upstream.
Apache License 2.0
2.24k stars 413 forks source link

Exec hooks - logging script output to containers stdout #739

Closed sboardwell closed 1 year ago

sboardwell commented 1 year ago

Hi, kudos to the maintainers of git-sync. An excellent tool.

I have a exec-hook configured to run upon sync. However, there doesn't seem to be a way to collect the output of the script. Have I missed something? The stdout is printed if the script fails, but sometimes you just want a little more info.

thockin commented 1 year ago

Thanks for the kudos!

If the exec fails, we log the stdout and stderr.

If you set -v 6 we log stdout and stderr even on success.

I'm not sure "more info" you would like and where?

sboardwell commented 1 year ago

Hi Tim,

Ah, I only went until log level 5 :see_no_evil:. I couldn't find a list of known levels and guessed (incorrectly) that 5 would be sufficient. Good to know it is possible. Just for my knowledge, are 6 levels common, or should I have known this somehow? Happy to add a PR to improve the docs if needed.

Regarding the actual issue, I'm wondering if it would be possible to add an option to return the results of the exec hook command without having to log everything.

In my use case I do some post processing using the --exechook-command which I would would like to see the result of. Usually log level -v 0 is great since it only logs something on changes. Whilst -v 6 would give me the results of the exec hook command, it also introduces a lot of additional noise. See the example below, bearing in mind that this is only 30 seconds and the noise produced is constant.

Running with v0 (30 seconds with 1 commit added)

❯ timeout 30 docker run --rm registry.k8s.io/git-sync/git-sync:v3.6.6 --repo=https://github.com/sboardwell/scratch --exechook-command uname --branch=main -v 0 2>&1 | tee git-sync-log-v0.log
INFO: detected pid 1, running init handler
I0512 06:48:44.448376      12 main.go:401] "level"=0 "msg"="starting up" "pid"=12 "args"=["/git-sync","--repo=https://github.com/sboardwell/scratch","--exechook-command","uname","--branch=main","-v","0"]
I0512 06:48:44.453184      12 main.go:950] "level"=0 "msg"="cloning repo" "origin"="https://github.com/sboardwell/scratch" "path"="/tmp/git"
I0512 06:48:45.267314      12 main.go:760] "level"=0 "msg"="syncing git" "rev"="HEAD" "hash"="b460540ec2f46ce5e87336a5e5b40ce5949a8f72"
I0512 06:48:45.278434      12 main.go:800] "level"=0 "msg"="adding worktree" "path"="/tmp/git/b460540ec2f46ce5e87336a5e5b40ce5949a8f72" "branch"="origin/main"
I0512 06:48:45.280635      12 main.go:860] "level"=0 "msg"="reset worktree to hash" "path"="/tmp/git/b460540ec2f46ce5e87336a5e5b40ce5949a8f72" "hash"="b460540ec2f46ce5e87336a5e5b40ce5949a8f72"
I0512 06:48:45.280669      12 main.go:865] "level"=0 "msg"="updating submodules"
I0512 06:48:45.307805      12 exechook.go:73] "level"=0 "msg"="running exechook" "command"="uname" "timeout"=30000000000
I0512 06:49:11.224310      12 main.go:1112] "level"=0 "msg"="update required" "rev"="HEAD" "local"="b460540ec2f46ce5e87336a5e5b40ce5949a8f72" "remote"="164d0afa2bdce8553d789257631f72d99ab18687"
I0512 06:49:11.224340      12 main.go:760] "level"=0 "msg"="syncing git" "rev"="HEAD" "hash"="164d0afa2bdce8553d789257631f72d99ab18687"
I0512 06:49:12.190215      12 main.go:800] "level"=0 "msg"="adding worktree" "path"="/tmp/git/164d0afa2bdce8553d789257631f72d99ab18687" "branch"="origin/main"
I0512 06:49:12.192642      12 main.go:860] "level"=0 "msg"="reset worktree to hash" "path"="/tmp/git/164d0afa2bdce8553d789257631f72d99ab18687" "hash"="164d0afa2bdce8553d789257631f72d99ab18687"
I0512 06:49:12.192666      12 main.go:865] "level"=0 "msg"="updating submodules"
I0512 06:49:12.222469      12 exechook.go:73] "level"=0 "msg"="running exechook" "command"="uname" "timeout"=30000000000
[1]    1326443 exit 124    timeout 30 docker run --rm registry.k8s.io/git-sync/git-sync:v3.6.6   uname   | 
       1326444 terminated  tee git-sync-log-v0.log

Running with v6 (30 seconds with 1 commit added)

❯ timeout 30 docker run --rm registry.k8s.io/git-sync/git-sync:v3.6.6 --repo=https://github.com/sboardwell/scratch --exechook-command uname --branch=main -v 6 2>&1 | tee git-sync-log-v6.log
INFO: detected pid 1, running init handler
I0512 06:49:36.082965      12 main.go:401] "level"=0 "msg"="starting up" "pid"=12 "args"=["/git-sync","--repo=https://github.com/sboardwell/scratch","--exechook-command","uname","--branch=main","-v","6"]
I0512 06:49:36.083067      12 main.go:1306] "level"=5 "msg"="running command" "cwd"="" "cmd"="git config --global gc.autoDetach false"
I0512 06:49:36.084489      12 main.go:1306] "level"=6 "msg"="command result" "stdout"="" "stderr"=""
...
...
...
I0512 06:49:57.369366      12 main.go:917] "level"=5 "msg"="running command" "cwd"="/tmp/git" "cmd"="git gc --auto"
I0512 06:49:57.370687      12 main.go:917] "level"=6 "msg"="command result" "stdout"="" "stderr"=""
I0512 06:49:57.370726      12 main.go:608] "level"=1 "msg"="next sync" "wait_time"=1000000000
I0512 06:49:57.370763      12 exechook.go:73] "level"=0 "msg"="running exechook" "command"="uname" "timeout"=30000000000
I0512 06:49:57.370775      12 exechook.go:74] "level"=5 "msg"="running command" "cwd"="/tmp/git/53d125abc687dbaef537f6279b44ac8264588951" "cmd"="uname "
I0512 06:49:57.371476      12 exechook.go:74] "level"=6 "msg"="command result" "stdout"="Linux" "stderr"=""
...
...
[1]    1327536 exit 124    timeout 30 docker run --rm registry.k8s.io/git-sync/git-sync:v3.6.6   uname   | 
       1327537 terminated  tee git-sync-log-v6.log

Generated Logs

❯ wc -l git-sync-log-v0.log
14 git-sync-log-v0.log

❯ wc -l git-sync-log-v6.log
228 git-sync-log-v6.log
thockin commented 1 year ago

Just for my knowledge, are 6 levels common, or should I have known this somehow?

There's no real rules. 5 is the usual "show me the details" level. Logging the output of every command even when successful seemed like it went beyond even that. It's not super consequential - I don't really consider that part of the API. Sometimes I have put tracing code at level 9 or 10, even.

I'm wondering if it would be possible to add an option to return the results of the exec hook command without having to log everything

By "return" you mean "log" ? Even when it succeeds? stdout only or stderr, too? How about exit status code? The assumption was that a successful hook (code 0) doesn't need further examination - can you explain what's going on here?

I'm not, per se, against logging it, I guess. It's just a little surprising :)

sboardwell commented 1 year ago

By "return" you mean "log" ? Even when it succeeds? stdout only or stderr, too? How about exit status code? The assumption was that a successful hook (code 0) doesn't need further examination - can you explain what's going on here?

I think the current logging mechanism is perfect for normal running of things and I would normally only need -v 0.

I think the ask would be to have the option of increasing the logging for actions when an update is triggered only, thus sparing us all the noise in between updates. Currently it's all or nothing so either you get lines and lines of logs which also include the updates when the are detected, or clean logs showing when an update is detected but no other information.

Full -v 6+ would still be needed when debugging other more internal issues, but usually we don't care about the logs telling us nothing has happened. And I am thinking about the climate and all the resources used to process the superfluous parts :innocent:

thockin commented 1 year ago

I'm still unsure what you mean. Therer are things we do on every loop, which most of the time nobody cares about, and things we do when an update is needed. "increasing the logging for actions when an update is triggered" says to me "the current V-levels are wrong", which I can 100% believe, but I want to be careful not to be too spammy. Most of the logs there are related how how git-sync invokes git, so not very interesting to most people.

WRT exechook, we could always log the stdout (today we throw it away unless -v 6 or error). Logging stderr or more would require some plumbing. And it still ends up in the log, which most people don't look at unless there's an error. So I am trying to make sure I understand your ask - You want stdout when a hook succeeds? Do you want stderr? Do you want exit code? I'd like to understand what you are doing with that info from within a log.

Also if we do this, we should do the equivalent for webhooks.

Can you run with --v6, and then trim it down to the logs you think are interesting at V 0, 1, 2 ?

Also, the sync loop in the v4 branch (nearing completion!!) is materially different :)

sboardwell commented 1 year ago

Yes, perhaps it could be solved with the V-Levels. Something like -v 1 increasing verbosity for actions performed as part of a sync triggering (e.g. adding the stdout for exechooks and webhooks, etc), with the "things we do on every loop, which most of the time nobody cares about" coming at -v 2 or higher. Would that make sense?

Let me think about it over the weekend.

sboardwell commented 1 year ago

Also, the sync loop in the v4 branch (nearing completion!!) is materially different :)

Ah, yes v4. The GITSYNC_* vs GIT_SYNC_* have caught me out a couple of times :smile:

thockin commented 1 year ago

Something like -v 1 increasing verbosity for actions performed as part of a sync triggering (e.g. adding the stdout for exechooks and webhooks, etc), with the "things we do on every loop, which most of the time nobody cares about" coming at -v 2 or higher.

Exactly. I rejiggered V levels in the v4 branch, but I'm open to input on what users think is "useful" (vs my own opinions). Hooks have no particular logging, today, but could.

thockin commented 1 year ago

I've got a patch cooking to do the plumbing here and fix some other hook logging, too.

sboardwell commented 1 year ago

Sorry, haven't had time to articulate my thoughts completely, but I think we are on the same page with log levels increasing in terms of "what is important to the user when debugging".

Something possibly along the lines of...?

thockin commented 1 year ago

Good idea. https://github.com/kubernetes/git-sync/commit/0a0834af243787af116bc99b642da7b20a5db465

On Tue, May 16, 2023 at 3:17 AM Steve Boardwell @.***> wrote:

Sorry, haven't had time to articulate my thoughts completely, but I think we are on the same page with log levels increasing in terms of "what is important to the user when debugging".

Something possibly along the lines of...?

  • v0: change detection minimal (as is)
  • v1: change detection debug
  • v2: change detection trace
  • v3: sync loop minimal
  • v4: sync loop debug
  • v5: sync loop trace
  • etc...

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/git-sync/issues/739#issuecomment-1549389503, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVEOOY6MEW2KJR5J4R3XGNH3DANCNFSM6AAAAAAX6D56IA . You are receiving this because you commented.Message ID: @.***>

sboardwell commented 1 year ago

Oh, I thought I had already replied here. Apologies.

That looks great. I didn't see where the stdout/stderr of the exec hook was moved, but I may have missed it.

            - 0: Minimal, just log updates
            - 1: More details about updates  # <----- is the stdout/stderr from the exec hook printed at this level?
            - 2: Log the sync loop
            - 3: More details about the sync loop
            - 4: More details
            - 5: Log all executed commands
            - 6: Log stdout/stderr of all executed commands
            - 9: Tracing and debug messages

The main advantage of placing it before the sync loop stuff is that you only get the extended info at update time and you don't have to scroll through countless sync loops until you find the sync in which the update happened.

thockin commented 1 year ago

I landed the "running exechook" at V(0) and "exechook succeeded" at V(1). The latter still feels verbose to me (I know many people WANT the sync loop logged, so V(2)), but it's not really a big deal.