lnd unresponsive after closing a channel with many updates

C-Otto commented 2 years ago

Background

I closed a channel with num_updates of around 600,000. Once the transaction got confirmed, lnd stopped responding to gRPC requests (the requests hang) and only recovered after a minute or so. I saw lots of disk activity (100% reading, not writing at all to channel.db) for around 3 minutes.

This isn't a big deal, but I think that latency/availability are important. If possible, the maintenance caused by closing the channel should be performed in the background / in parallel instead of blocking other parts of lnd.

Your environment

version of lnd: v0.14.1-beta
which operating system (uname -a on *Nix): 5.10.0-10-amd64
version of btcd, bitcoind, or other backend: bitcoind v22

Steps to reproduce

Have a channel with 600,000 updates. Coop-close channel. Wait for close-tx to confirm. Issue gRPC requests.

Expected behaviour

lnd continues running as expected, serving the requests.

Actual behaviour

gRPC API calls are stalled for several seconds. It's possible that other requests (forwarding requests, connection attempts) are also delayed.

@zerofeerouting maybe you've also seen this.

Roasbeef commented 2 years ago

I'm guessing this is the result of needing to purge all the keys related to the updates on disk. Thanks for the report, shouldn't be too difficult to reproduce.

If it happens again, and you can snag a CPU profile, then that'll be super helpful as well.

C-Otto commented 2 years ago

pprof.lnd.samples.cpu.001.pb.gz

Hope this helps?

lightningnetwork / lnd