lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.64k stars 2.08k forks source link

Race condition-free backups #6311

Open Kixunil opened 2 years ago

Kixunil commented 2 years ago

I'm looking for ways to maximize reliability of LN node and I don't see any way to do backups reliably without race conditions. It'd be useful to have some API which calls into external service/command/... and blocks opening the channel until backup is completed. If the call fails for whatever reason channel is rejected. This way LND/power crashing at any moment couldn't cause loss of sats. As far as I can see, currently on may at best watch for FS changes and backup immediately hoping nothing crashes between opening a channel and performing backup.

Note that such situation isn't crazy-improbable because reasonable backup solutions relies on a remote machine and the connection may be unreliable.

There are two ways to do this: GRPC or having something like backupcommand option in LND that runs the specified command and waits until it exits. Pros and cons:

I think I myself would prefer running a command but can adapt and respect the choice of using GRPC.

guggero commented 2 years ago

I think the main problem is the order of events here. lnd will only craft the channel backup once it sent the FundingSigned message. And after you've sent that message to your peer, there is no way to "reject" the channel anymore (only force-close).

But with some refactoring I think it would be possible to create a new BackupInterceptor RPC that is invoked before sending the FundingSigned message. If a client is registered to that interceptor then it needs to acknowledge the channel as being backed up before the final signature is sent.

gRPC is definitely the preferred way to implement something like this!

Kixunil commented 2 years ago

Interesting didn't expect the actions would be in this order although doesn't sound difficult to change. Anyway shouldn't the backup be created even before signing to account for external signers?

One more detail: it should be possible to enforce in the config that the interceptor MUST be used. Otherwise if the backup service fails it would risk losing backups. (Similar problem to the custom macaroon handling we discussed not long ago.)

guggero commented 2 years ago

didn't expect the actions would be in this order

I think that's just a consequence of the channel update notification mechanism being re-used for triggering the backup creation. And that notification only triggers after the channel is fully signed. But a new backup interceptor mechanism could hook in earlier and ensure atomicity. And yes, I'd expect the mandatory interceptor mechanism of the RPC middleware interceptor to be re-used for a new interceptor like this.