lightningdevkit / ldk-node

A ready-to-go node implementation built using LDK.
Other
140 stars 72 forks source link

Lower/override minimum syncing intervals / improve initialization time #339

Closed dpc closed 1 month ago

dpc commented 1 month ago

In Fedimint we use a lot of e2e tests, where we set up a full local environment including bitcoind, electrs, bunch of lightning nodes with channels between them, and Fedimint federation. The whole thing can initialize in less than 20 seconds.

Recently we've added LDK as one of the nodes, and now we have to wait around 60s for everything to initialize. The culprit seems to be LDK sync intervals - waiting up to 10s after every step is quite a lot. Though possibly there are other things that could be improved to enable fast startup.

If we could somehow improve these times by setting sync intervals lower, etc. it would be a big quality of life improvement.

dpc commented 1 month ago

Ping @tvolk131

moneyball commented 1 month ago

@dpc do you mean you've recently added LDK (not LND)? ie you suspect LDK is the culprit

dpc commented 1 month ago

@moneyball Yes, sorry. LDK . I fixed the post.

tnull commented 1 month ago

Recently we've added LDK as one of the nodes, and now we have to wait around 60s for everything to initialize. The culprit seems to be LDK sync intervals - waiting up to 10s after every step is quite a lot. Though possibly there are other things that could be improved to enable fast startup.

If we could somehow improve these times by setting sync intervals lower, etc. it would be a big quality of life improvement.

I'm not quite sure I understand the full extent of the issue. We never 'wait' on anything besides the initial fee estimation on startup? Everything else should happen in the background. Could you expand what you're trying to do and which 'steps' are waiting on what exactly?

Secondly, if you really find yourself needing to adjust the defaults (though note they have been picked conservative for good reason as syncing via Esplora can often hit rate-limiting, especially when using public servers), you can already do so via adjusting the onchain_wallet_sync_interval_secs, wallet_sync_interval_secs, fee_rate_cache_update_interval_secs config options (see https://docs.rs/ldk-node/latest/ldk_node/struct.Config.html).

tvolk131 commented 1 month ago

It would be helpful for us if we could reduce (or eliminate) this constant:

https://github.com/lightningdevkit/ldk-node/blob/2bbb76476a85c0bfa079398eab049ffc364acc4e/src/config.rs#L50

It blocks us from setting the wallet sync interval to anything faster than 10 seconds:

https://github.com/lightningdevkit/ldk-node/blob/2bbb76476a85c0bfa079398eab049ffc364acc4e/src/lib.rs#L279-L282

https://github.com/lightningdevkit/ldk-node/blob/2bbb76476a85c0bfa079398eab049ffc364acc4e/src/lib.rs#L386-L387

I don't think this accounts for the full extent of the added delay for us, but it seems like the easiest place to start

I realize that Node::sync_wallets() exists. That would be non-trivial for us to use though, since our Node instance lives behind a rust interface, so we'd need to expose additional behavior in that interface. Is there anything preventing us from lowering WALLET_SYNC_INTERVAL_MINIMUM_SECS to, say, 1 second?

tnull commented 1 month ago

It would be helpful for us if we could reduce (or eliminate) this constant:

https://github.com/lightningdevkit/ldk-node/blob/2bbb76476a85c0bfa079398eab049ffc364acc4e/src/config.rs#L50

It blocks us from setting the wallet sync interval to anything faster than 10 seconds:

Right, as the name indicates, that is its intended purpose :)

https://github.com/lightningdevkit/ldk-node/blob/2bbb76476a85c0bfa079398eab049ffc364acc4e/src/lib.rs#L279-L282

https://github.com/lightningdevkit/ldk-node/blob/2bbb76476a85c0bfa079398eab049ffc364acc4e/src/lib.rs#L386-L387

I don't think this accounts for the full extent of the added delay for us, but it seems like the easiest place to start

These operations are happening in the background, so could you clarify where you seeing any delay exactly and what you're trying to do?

I realize that Node::sync_wallets() exists. That would be non-trivial for us to use though, since our Node instance lives behind a rust interface, so we'd need to expose additional behavior in that interface.

Right, it's also mostly there for testing purposes and we more or less discourage using it to avoid spamming sync commands.

Is there anything preventing us from lowering WALLET_SYNC_INTERVAL_MINIMUM_SECS to, say, 1 second?

Yes, most Esplora servers (in particular public ones, but also private ones) are heavily rate-limited and will return an HTTP429 ("too many requests") whenever it's hit. Unfortunately hitting even one such error results in the sync starting over (as especially for Lightning we have to be very cautious to avoid any inconsistencies encountered during syncing), which leads to yet more requests being generated, increasing the risk of further hitting the rate-limits. So this lower-bound is essentially a safety to disallow spamming Esplora servers which results in an unusable wallet and potentially leading to security issues when no sync would ever finish.

Please also note: a) This is really about chain syncing, i.e., syncing to a state that only changes every 10 minutes. Having an added worst-case delay of 10s on top doesn't really make much of a difference under most circumstances. b) A lot of the syncing stuff is about to get much more efficient with the BDK upgrade, and we'll add support for sourcing full blocks (soon) and BIP157/158 filtered blocks (when kyoto is ready), which will not be encumbered by a sync interval and are probably the way to go for your usecase anyways.

tvolk131 commented 1 month ago

Thanks for that explanation, that makes sense.

The issue we're running into is after calling Node::connect_open_channel() and mining 10 regtest blocks, waiting for the channel to appear when calling Node::list_channels() with ChannelDetails.is_channel_ready == true takes around 30-40 seconds.

Could it make sense to ignore WALLET_SYNC_INTERVAL_MINIMUM_SECS if Config.network is set to regtest? This way we could make sure our local esplora has rate-limiting turned off, then safely set the interval to 1 second.

tnull commented 1 month ago

Thanks for that explanation, that makes sense.

The issue we're running into is after calling Node::connect_open_channel() and mining 10 regtest blocks, waiting for the channel to appear when calling Node::list_channels() with ChannelDetails.is_channel_ready == true takes around 30-40 seconds.

Ah, this is the exact use-case why we added sync_wallets to allow triggering the sync instantly. Would using it your (CI?) script really not be an option?

Could it make sense to ignore WALLET_SYNC_INTERVAL_MINIMUM_SECS if Config.network is set to regtest? This way we could make sure our local esplora has rate-limiting turned off, then safely set the interval to 1 second.

I guess we could consider that if you really prefer it, although I think sync_wallets is what you want in testing: you know a new block is available, and you sync to the tip immediately.

tvolk131 commented 1 month ago

I think sync_wallets is what you want in testing

Ok I can wire that up 👍 the documentation for it scared me off, but if this is the case it's meant for then I'll give it a try

I think we can close this issue for now - I'll re-open it if I run into any issues with sync_wallets

tnull commented 1 month ago

Cool, let me know if you'd prefer the sketched carve-out for regtest after all. Closing until then.

tvolk131 commented 1 day ago

@tnull from my testing within fedimint, it seems Node::sync_wallets() improves the startup speed of our dev environment a bit, but doesn't allow us to bump Config.onchain_wallet_sync_interval_secs and Config.wallet_sync_interval_secs up to their default values (80 and 30 seconds respectively) without introducing significant slowdown.

During dev startup (just mprocs, for Fedimint folks), we wait for lightning gateways to sync to regtest chain tip several times. I have a test branch where before each chain sync wait, we also call sync_wallets() on the ldk-node::Node instance. I was hoping this would allow us to change onchain_wallet_sync_interval_secs and wallet_sync_interval_secs to their default values, but this slows things down quite a bit.

When onchain_wallet_sync_interval_secs and wallet_sync_interval_secs are both set to the minimum value of 10 seconds, calling sync_wallets() or not makes almost no difference. But when they are both set to their default values, calling sync_wallets() cuts down startup time by ~45%. Here's some rough startup speeds of different configurations:

onchain_wallet_sync_interval_secs & wallet_sync_interval_secs of 10, not calling sync_wallets(): ~55s onchain_wallet_sync_interval_secs & wallet_sync_interval_secs of 10, calling sync_wallets(): ~50s onchain_wallet_sync_interval_secs & wallet_sync_interval_secs of default values (80 and 30), not calling sync_wallets(): ~180s onchain_wallet_sync_interval_secs & wallet_sync_interval_secs of default values (80 and 30), calling sync_wallets(): ~100s

It's worth noting that in my testing, in the last case (onchain_wallet_sync_interval_secs & wallet_sync_interval_secs are default values and we're calling sync_wallets()), the first sync is very slow but the subsequent ones are just a few seconds.

These results are strange to me. Clearly calling sync_wallets() is doing something, hence the 45% speedup. But I would expect the last case above to produce results similar to the first and second cases above. Is this a bug, or am I misunderstanding how sync_wallets() works?

tnull commented 1 day ago

During dev startup (just mprocs, for Fedimint folks), we wait for lightning gateways to sync to regtest chain tip several times. I have a test branch where before each chain sync wait, we also call sync_wallets() on the ldk-node::Node instance. I was hoping this would allow us to change onchain_wallet_sync_interval_secs and wallet_sync_interval_secs to their default values, but this slows things down quite a bit.

I'm still not quite sure what you mean by "slowdown"/"slows things down quite a bit": If you call sync_wallets the Node will sync the on-chain and lightning wallets and update the fee rate estimation, and the call will block until they are synced (mod syncing failures due to connection issues/ratelimiting etc., of course). So if the call returns the node will be synced to whatever the current chain tip at that specific time was, or rather what the electrs instance actually picked up until then (as there is not only a latency for broadcasting, bitcoind processing the transaction and adding it to the mempool, but also an additional latency for electrs to be synced to the latest bitcoind state). This is btw. why we use a wait_for_tx method in our integration tests that polls with some backoff until a particular transaction we're interested is seen via Esplora before we proceed to call sync_wallets in a few places.

But when they are both set to their default values, calling sync_wallets() cuts down startup time by ~45%.

I'm not sure what you mean with "startup" time? How do you measure that? Calling sync_wallets definitely doesn't cut down on any processing time, it only allows to trigger syncing right when you're positive that a transaction you're waiting on has been confirmed and has been picked up by Esplora.

It's worth noting that in my testing, in the last case (onchain_wallet_sync_interval_secs & wallet_sync_interval_secs are default values and we're calling sync_wallets()), the first sync is very slow but the subsequent ones are just a few seconds.

What do you mean by 'slow'? How do you measure that exactly? Does it refer to the time stated in the logs or do you measure it in another way? And are you referring to the sync_wallets calls, or to the background syncing?