Closed burdiyan closed 1 month ago
It's not easy for me to provide a clean reproduction for this, but you could clone this repo: https://github.com/seed-hypermedia/seed and do go run ./backend/cmd/seed-daemon
. After leaving it for a while (until periodic auto relay logs are seen), and then pressing ctrl+c
it can be seen that the Shutdown started, but it gets stuck.
Doing some very tedious and manual debugging I figured out that it gets stuck in the place I shared previously.
Can you check if the environment variable GODEBUG="asynctimerchan=1" fixes the issue. It's probably because of https://github.com/golang/go/issues/69312
Alternatively, you can change your go version in your go.mod to go1.22.
I think I found a solution for the timer problem (will make pr for pubsub as well):
if !timer.Stop() {
select {
case <-timer.C:
default:
}
}
@sukunrt Oooh, I see. Unfortunately I can't use Go 1.22 at this point, because I'm already using iterators in some places :)
I think the solution @vyzo proposes could work. I remember doing something similar in my own code at some point.
@vyzo I'd advise against making any changes to production code. This was a Go bug and is going to get fixed in Go 1.23.2. Just use the compiler flag @sukunrt mentioned for now.
@sukunrt can you point to me to the exact timer that could be causing the shutdown issues?
Confirming that running with GODEBUG="asynctimerchan=1"
fixes the problem for me.
@vyzo that solution is racy for versions <= go1.22.
if !timer.Stop() {
select {
case <-timer.C:
default:
}
}
When timer.Stop returns false, it doesn't mean the value has been pushed to the channel. It only means that Stop
didn't stop the timer from executing, the value may be available in the channel or will be pushed soon.
ok, fair enough; lets wait for the upstream fix then.
@sukunrt can you point to me to the exact timer that could be causing the shutdown issues?
One is in quic-go: see https://github.com/quic-go/quic-go/pull/4659 One is in autonat: https://github.com/libp2p/go-libp2p/blob/master/p2p/host/autonat/autonat.go#L221
I'm sure there are some others in go-libp2p and the dependencies.
I'm keeping this issue open. I'll add some text in the next patch release regarding this, and close the issue.
there is one in pubsub too
We recently started facing issues with graceful shutdown in our app. After receiving termination signal, the app still hangs and never exists until forcefully shut down.
After spending some time debugging, I've found our that this place in libp2p never returns:
https://github.com/libp2p/go-libp2p/blob/v0.36.3/config/host.go#L28
To clarify, we are using libp2p with AutoRelay, HolePunching, DHT, and other things. The node needs to run for a while before this problem occurs. I suspect that it could be AutoRelay that's causing this, because the problem starts occurring after AutoRelay starts doing periodic relay finding.
So,
closableRoutedHost.Close()
gets called, but the underlyingfx.App
's Stop method never returns.