lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.63k stars 2.07k forks source link

[bug]: `ChannelRouter` cannot be shutdown while the `syncGraphWithChain` function is running. #8721

Closed ViktorTigerstrom closed 1 month ago

ViktorTigerstrom commented 4 months ago

Background

The syncGraphWithChain function in routing/router.go cannot be stopped by a shutdown signal while it's running. The function is ment to be able to exit early by closing the quit channel here: https://github.com/lightningnetwork/lnd/blob/fb632bb94565918dfa4a0aaa3b8544243e74e9f0/routing/router.go#L854

But as the syncGraphWithChain function is synchronously called in the Start function, the Stop function which closes the quit channel, can never be triggered until the syncing is complete. The only call-sites for the Stop function can only ever happen after the ChannelRouter Start function has fully executed: https://github.com/lightningnetwork/lnd/blob/fb632bb94565918dfa4a0aaa3b8544243e74e9f0/server.go#L2020 & https://github.com/lightningnetwork/lnd/blob/fb632bb94565918dfa4a0aaa3b8544243e74e9f0/lnd.go#L681

Steps to reproduce

To reproduce the issue, you can add some simple code to the beginning of the syncGraphWithChain function like the following:

for i := 0; i <= 10000; i++ {
    time.Sleep(3 * time.Millisecond)

    log.Infof("Syncing test: %v", i)

    select {
    case <-r.quit:
        return ErrRouterShuttingDown
    default:
    }
}

Then try to shutdown lnd with a shutdown signal while the code above is executing.

Expected behaviour

The syncGraphWithChain function should exit early by a shutdown signal.

Actual behaviour

lnd will only shutdown after the syncGraphWithChain and the Start function has finished their execution.

yyforyongyu commented 4 months ago

Think this will be fixed by #8497 cc @ziggie1984

ziggie1984 commented 4 months ago

Yes as soon as we always do the cleanup we are fine and close the stop channel when triggering the shutdown.

ViktorTigerstrom commented 4 months ago

Awesome, thanks! I just want to clarify though, that the ChannelRouter Start function will not return while the syncGraphWithChain function is running. So only adding the Stop function for the ChannelRouter to the cleanUp func in the server, before the Start function is executed won't really change the behaviour detailed in this issue unless I'm missing something?

ziggie1984 commented 4 months ago

You are right, thank you for the input will come up with a different solution.