purpleidea / mgmt

Next generation distributed, event-driven, parallel config management!
https://purpleidea.com/tags/mgmtconfig/
GNU General Public License v3.0
3.68k stars 315 forks source link

ctrl-c/interrupt/errors during init won't exit mgmt #340

Open aequitas opened 6 years ago

aequitas commented 6 years ago

lib.Main:Exit sends to a channel: https://github.com/purpleidea/mgmt/blob/9ba893c06c1058f39ac1e1ed7791d8386fd70065/lib/main.go#L199-L201

that will not be listened to until Main is 'running': https://github.com/purpleidea/mgmt/blob/9ba893c06c1058f39ac1e1ed7791d8386fd70065/lib/main.go#L807

At several points that causes mgmt to get 'stuck'. For example while connecting to to an unreachable seed[0] and interrupting with ctrl-c: https://github.com/purpleidea/mgmt/blob/9ba893c06c1058f39ac1e1ed7791d8386fd70065/lib/cli.go#L155

Or when the embedded etcd server fails to start: https://github.com/purpleidea/mgmt/blob/9ba893c06c1058f39ac1e1ed7791d8386fd70065/lib/main.go#L352

My golang knowledge about channels is not that deep yet, so my assumptions might be wrong.

[0]

$ ./mgmt run --no-server --tmp-prefix
16:44:00 hello.go:46: This is: mgmt, version: 0.0.14-32-g9ba893c
16:44:00 hello.go:47: Main: Start: 1518795840442349830
16:44:00 main.go:230: Main: Warning: Working prefix directory is temporary!
16:44:00 main.go:236: Main: Working prefix is: /var/folders/2c/1qlhcpv95t19mrvynljtfhph0000gq/T/mgmt-Firefly.local-726444075
16:44:01 pgp.go:88: PGP: Created key: D4E2B930
16:44:01 main.go:329: Main: Seeds: No seeds specified!
16:44:01 etcd.go:324: Etcd: Connect: Endpoints: []
16:44:01 etcd.go:338: Etcd: Connect: CtxError...
16:44:01 etcd.go:618: Etcd: CtxError: Reason: CtxDelayErr(1s): No endpoints available yet!
16:44:02 etcd.go:324: Etcd: Connect: Endpoints: []
16:44:02 etcd.go:338: Etcd: Connect: CtxError...
16:44:02 etcd.go:618: Etcd: CtxError: Reason: CtxDelayErr(2s): No endpoints available yet!
16:44:04 etcd.go:324: Etcd: Connect: Endpoints: []
16:44:04 etcd.go:338: Etcd: Connect: CtxError...
16:44:04 etcd.go:618: Etcd: CtxError: Reason: CtxDelayErr(3s): No endpoints available yet!
^C16:44:05 cli.go:154: Interrupted by ^C
16:44:07 etcd.go:324: Etcd: Connect: Endpoints: []
16:44:07 etcd.go:338: Etcd: Connect: CtxError...
16:44:07 etcd.go:618: Etcd: CtxError: Reason: CtxDelayErr(4s): No endpoints available yet!
^C^\SIGQUIT: quit
PC=0x406147b m=0 sigcode=0

goroutine 0 [idle]:
runtime.mach_semaphore_wait(0x1103, 0xe, 0x0, 0x0, 0x0, 0x5f57ca0, 0x7fff5fbff4f0, 0x405b9e3, 0xffffffffffffffff, 0x0, ...)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/sys_darwin_amd64.s:445 +0xb
runtime.semasleep1(0xffffffffffffffff, 0x0)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/os_darwin.go:413 +0x52
runtime.semasleep.func1()
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/os_darwin.go:432 +0x33
runtime.systemstack(0x7fff5fbff518)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/asm_amd64.s:360 +0xab
runtime.semasleep(0xffffffffffffffff, 0x0)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/os_darwin.go:431 +0x44
runtime.notesleep(0x5f58518)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/lock_sema.go:167 +0xe9
runtime.stopm()
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/proc.go:1680 +0xe5
runtime.findrunnable(0xc420021300, 0x0)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/proc.go:2135 +0x4d2
runtime.schedule()
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/proc.go:2255 +0x12c
runtime.park_m(0xc42011d080)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/proc.go:2318 +0xb6
runtime.mcall(0x7fff5fbff700)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/asm_amd64.s:286 +0x5b

goroutine 1 [chan send]:
github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).Set(0xc4201fa700, 0xc420178600, 0x17, 0x54874c7, 0x1, 0x0, 0x0, 0x0, 0x0, 0x40d7f54)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:979 +0x11a
github.com/purpleidea/mgmt/etcd.SetClusterSize(0xc4201fa700, 0x5490005, 0x0, 0x0)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/methods.go:263 +0x148
github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).Startup(0xc4201fa700, 0xd, 0x0)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:434 +0x5ec
github.com/purpleidea/mgmt/lib.(*Main).Run(0xc4202e0240, 0x5492760, 0xc4202e0240)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/lib/main.go:353 +0x9c6
github.com/purpleidea/mgmt/lib.run(0xc4201f4c60, 0x0, 0x0)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/lib/cli.go:166 +0x1329
github.com/urfave/cli.HandleAction(0x50f7fe0, 0x5492768, 0xc4201f4c60, 0x0, 0xc42015d680)
    /Users/johan/.gopath/src/github.com/urfave/cli/app.go:501 +0xd2
github.com/urfave/cli.Command.Run(0x5431641, 0x3, 0x0, 0x0, 0xc4202dcc00, 0x1, 0x1, 0x5431641, 0x3, 0x0, ...)
    /Users/johan/.gopath/src/github.com/urfave/cli/command.go:165 +0x4bb
github.com/urfave/cli.(*App).Run(0xc4201fa380, 0xc420010140, 0x4, 0x4, 0x0, 0x0)
    /Users/johan/.gopath/src/github.com/urfave/cli/app.go:259 +0x740
github.com/purpleidea/mgmt/lib.CLI(0x5585324, 0x4, 0x558de30, 0x12, 0x4000000, 0x0, 0x0)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/lib/cli.go:460 +0x2957
main.main()
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/main.go:46 +0x61

goroutine 5 [syscall]:
os/signal.signal_recv(0x5ed9640)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/sigqueue.go:131 +0xa7
os/signal.loop()
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.0
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/os/signal/signal_unix.go:28 +0x41

goroutine 28 [chan receive]:
github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/pkg/logutil.(*MergeLogger).outputLoop(0xc42024fe60)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/pkg/logutil/merge_logger.go:174 +0x47a
created by github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/pkg/logutil.NewMergeLogger
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/pkg/logutil/merge_logger.go:92 +0xb8

goroutine 66 [chan receive]:
github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/pkg/logutil.(*MergeLogger).outputLoop(0xc4202c6100)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/pkg/logutil/merge_logger.go:174 +0x47a
created by github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/pkg/logutil.NewMergeLogger
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/pkg/logutil/merge_logger.go:92 +0xb8

goroutine 67 [chan receive]:
github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/pkg/logutil.(*MergeLogger).outputLoop(0xc4202c6160)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/pkg/logutil/merge_logger.go:174 +0x47a
created by github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/pkg/logutil.NewMergeLogger
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/vendor/github.com/coreos/etcd/pkg/logutil/merge_logger.go:92 +0xb8

goroutine 60 [select]:
github.com/purpleidea/mgmt/converger.(*converger).Loop(0xc4201620f0, 0x1)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/converger/converger.go:199 +0x10d
created by github.com/purpleidea/mgmt/lib.(*Main).Run
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/lib/main.go:323 +0x780

goroutine 33 [chan send]:
github.com/purpleidea/mgmt/lib.(*Main).Exit(...)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/lib/main.go:200
github.com/purpleidea/mgmt/lib.run.func1(0xc4202e0240, 0xc42026c240)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/lib/cli.go:155 +0x3c9
created by github.com/purpleidea/mgmt/lib.run
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/lib/cli.go:145 +0x1318

goroutine 73 [select, locked to thread]:
runtime.gopark(0x5495848, 0x0, 0x543427b, 0x6, 0x18, 0x1)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/proc.go:287 +0x12c
runtime.selectgo(0xc420308750, 0xc420074240)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/select.go:395 +0x1149
runtime.ensureSigM.func1()
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/signal_unix.go:511 +0x1fe
runtime.goexit()
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/asm_amd64.s:2337 +0x1

goroutine 61 [sleep]:
time.Sleep(0xee6b2800)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/time.go:65 +0x130
github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).CtxError(0xc4201fa700, 0x5ee0a80, 0xc420016158, 0x5ecd140, 0xc42012e120, 0x0, 0x0, 0x0, 0x0)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:619 +0x16ef
github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).Connect(0xc4201fa700, 0x485f800, 0x0, 0x0)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:339 +0x3e3
github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).Startup.func1(0xc4201fa700)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:363 +0x34
created by github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).Startup
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:362 +0x72

goroutine 62 [semacquire]:
sync.runtime_SemacquireMutex(0xc4201fa704, 0x0)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc4201fa700)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/sync/mutex.go:134 +0xee
github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).Connect(0xc4201fa700, 0x5ee9d00, 0x0, 0x0)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:300 +0x6a
github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).CbLoop(0xc4201fa700)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:732 +0x11e
created by github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).Startup
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:369 +0x97

goroutine 63 [semacquire]:
sync.runtime_SemacquireMutex(0xc4201fa704, 0xc42031cb00)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc4201fa700)
    /Users/johan/.brew/Cellar/go/1.9.4/libexec/src/sync/mutex.go:134 +0xee
github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).Connect(0xc4201fa700, 0x5ee9d00, 0x0, 0x0)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:300 +0x6a
github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).Loop(0xc4201fa700)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:801 +0x11e
created by github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).Startup
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:370 +0xbc

goroutine 64 [chan send]:
github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).AddWatcher(0xc4201fa700, 0xc420178340, 0x11, 0xc420040400, 0x1, 0xc420150018, 0x1, 0x1, 0xc42026c720, 0xc42026c6c0, ...)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:1101 +0x13b
created by github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).Startup
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:374 +0x236

goroutine 65 [chan send]:
github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).AddWatcher(0xc4201fa700, 0xc4201785e0, 0x17, 0xc420040410, 0x1, 0x0, 0x0, 0x0, 0xc42026c720, 0xc42026c6c0, ...)
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:1101 +0x13b
created by github.com/purpleidea/mgmt/etcd.(*EmbdEtcd).Startup
    /Users/johan/.gopath/src/github.com/purpleidea/mgmt/etcd/etcd.go:378 +0x35b

rax    0xe
rbx    0x5f583e0
rcx    0x7fff5fbff490
rdx    0x7fff5fbff518
rdi    0x1103
rsi    0x1
rbp    0x7fff5fbff4c8
rsp    0x7fff5fbff490
r8     0x40
r9     0x7fff5fbfed78
r10    0x7fff5fbfed90
r11    0x246
r12    0x1286ffe3d0eca
r13    0xf9
r14    0x11
r15    0x138
rip    0x406147b
rflags 0x246
cs     0x7
fs     0x0
gs     0x0
purpleidea commented 6 years ago

@aequitas I'll be honest: our lib/main.go is a bit terrible, and needs some rewriting...

aequitas commented 6 years ago

@purpleidea yea, I started a short debug session but figured it would take a bit more time :D But is doesn't look that bad from a distance :)

So this is just the recording of my investigation for when I (or anyone else) wants to pick this up.

purpleidea commented 6 years ago

@aequitas A have a fairly decent dependency graph in my head of what needs to happen in what order in terms of most efficiently solving all of the problems, but it's sadly not documented anywhere. This is most definitely an issue. It will get much easier to fix once https://github.com/purpleidea/mgmt/issues/277 is fixed, and also once my embedded etcd rewrite lands.

Thank you for the report!

aequitas commented 6 years ago

@purpleidea Cool, when I have my initial setup running I'll resync on this with you and see what I can do :)