Open bboreham opened 6 years ago
What version are you running? From https://github.com/weaveworks/docker-ansible/blob/master/vendor/manifest I see its 1e15c08f93a0286d8945b784ccad2dbdf0000def
I'd try latest; I re-wrote this a while back to fix a bunch of issues.
I did look at the latest code, and it doesn't seem any different in this respect - it still calls Wait4()
to wait indefinitely.
But it should never call it concurrently. On Wed, 7 Nov 2018 at 17:58, Bryan Boreham notifications@github.com wrote:
I did look at the latest code, and it doesn't seem any different in this respect - it still calls Wait4() to wait indefinitely.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/tomwilkie/prom-run/issues/7#issuecomment-436718084, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbGhXYRKDwT1xgNogXy20P60-wtCmP8ks5usx9TgaJpZM4YQkkQ .
Look at the stack traces - exec
is calling wait and reap
is calling wait.
Unless I'm mistaken, master never calls exec and wait concurrently...
However, reflecting on the ps
output I saw, I wonder if the sub-process has not exited at all.
From the docs I expect cancel()
to kill it.
The code on master calls defer cancel()
- how would that work on a timeout?
Thats just idiomatic use of context; its a noop in this case.
This is where it ends up being used: https://github.com/golang/go/blob/go1.11/src/os/exec/exec.go#L408, and the waitDone
channel will be closed first.
Return from the http endpoint:
Stack traces via
kill -SIGQUIT
:I believe the source is https://github.com/tomwilkie/prom-run/blob/1e15c08f93a0286d8945b784ccad2dbdf0000def/main.go#L35
So both goroutines calling
wait
are stuck. I wonder if there is a race?