lightningdevkit / rust-lightning

A highly modular Bitcoin Lightning library written in Rust. It's rust-lightning, not Rusty's Lightning!
Other
1.14k stars 353 forks source link

Scan Mempool for HTLC preimages #2341

Open TheBlueMatt opened 1 year ago

TheBlueMatt commented 1 year ago

With HTLC transactions getting delayed confirming due to fee spikes and HTLCs sticking around too long causing force-closes, it may be worth at least having an API to do mempool scans for HTLC preimages, mostly because it'd be a pretty simple API. Basically all we'd need it a method for users to pass in loose transactions (which match our filter) which we look at for preimages and claim HTLCs backwards if we find any. We'd want to keep it optional and wouldn't include it in our trust model, but given the API is simple it may be worth doing eventually. Certainly not a high priority, though.

ariard commented 1 year ago

For MempoolInterface, I think we can have a UserConfig::mempool_catch_htlc_preimage or a IgnoringMempoolInterface to make such scan optional for mobile nodes. This is not that they’re not exposed to delay HTLCs sticking around too long, though it’s not realistic in term of bandwidth ressources to have them supporting transaction-relay.

ariard commented 1 year ago

From my perspective, a MempoolInterface allowing a LDK node to inspect its local mempool allows the improvement of two standards flows / use-case (at least):

I think there is an additional c) advantage, seeing the evolution of mempool fluctuations over time as a source of information for liquidity management (go to open more channels when it’s empty), off-chain routing fees setting and jamming mitigations (to know the risk of accepting a HTLC). How this interferes with a smarterFeeEstimator is a good question and maybe be should combine both.

TheBlueMatt commented 1 year ago

For MempoolInterface, I think we can have a UserConfig::mempool_catch_htlc_preimage or a IgnoringMempoolInterface to make such scan optional for mobile nodes

I'm very confused what you mean by this - if we add an interface, it would be optional, and users could either submit mempool transactions, or they could ignore it. There's no need for more than one type or any config options.

a) passing backward a claim preimage from a counterparty transaction without waiting confirmation therefore better liquidity usage of an inbound edge

Mmm, that's a fair point, especially around the issues during the fee spikes earlier this year transaction delays led to resolution delays. I'm not sure this is the most important thing from that, though, given most of the high priority issues we saw were force-closes as a result of pending commitment transactions not confirming in time. We do need to carefully weigh this against the additional complexity cost for our downstream users, however, and I'm far from convinced handling this edge case a bit better is worth it.

b) detecting a dual-funding / splicing double-spend by one of the contributing counterparty therefore freeing up the locked UTXOs on our side

I don't think we want to do this at all - a counterparty double-spending is totally fine, as long as it doesn't confirm. We should ignore this and just handle things once transactions hit the chain.

I think there is an additional c) advantage, seeing the evolution of mempool fluctuations over time as a source of information for liquidity management (go to open more channels when it’s empty), off-chain routing fees setting

This is something for downstream code to do, not something for LDK to concern itself with.

and jamming mitigations (to know the risk of accepting a HTLC). How this interferes with a smarterFeeEstimator is a good question and maybe be should combine both.

Not sure how this is related to mempool scanning, you should just look at fees for this.

ariard commented 1 year ago

I'm very confused what you mean by this - if we add an interface, it would be optional, and users could either submit mempool transactions, or they could ignore it. There's no need for more than one type or any config options.

Let’s take the use-case a) which is about discovery of a mempool transaction spending a specific channel commitment output. Here it’s polling an event, not a user submitting transaction ? Yes it can be “optional” in the sense you give a dumb implementation IgnoringMempoolInterface or logic being firewalled behind a config option.

Mmm, that's a fair point, especially around the issues during the fee spikes earlier this year transaction delays led to resolution delays. I'm not sure this is the most important thing from that, though, given most of the high priority issues we saw were force-closes as a result of pending commitment transactions not confirming in time.

In today world where you have more fee spikes breaking cooperation on channel feerates and assuming outbound outputs claimed by HTLC-preimage from “flappy” mobile clients, this scenario is more likely. I don’t deny the additional complexity cost for our users though it sounds LND is going for it (#https://github.com/lightningnetwork/lnd/pull/7615) and not doing it means we’ll be less performant in term of liquidity management on the long-term IMHO, or do we want to rule out routing hop on the server-side as a LDK use-case ?

I don't think we want to do this at all - a counterparty double-spending is totally fine, as long as it doesn't confirm. We should ignore this and just handle things once transactions hit the chain.

Are you aware of this post from the ACINQ-side on liquidity griefing: https://lists.linuxfoundation.org/pipermail/lightning-dev/2023-May/003920.html and what is your thinking ?

This is something for downstream code to do, not something for LDK to concern itself with.

I’ll agree here than most of the code should be in downstream, yet we should be able to lean on it in the future to take force-close or cooperative decision, see #1056.

Not sure how this is related to mempool scanning, you should just look at fees for this.

Okay here it might be 2 steps ahead of the game, though if we have mempool clustering and historical buckets of the clusters one could have “predictive” on-chain fees and take liquidity / jamming management in function :)

What do you think on the idea of combining the proposed MempoolInterface with our current FeeEstimator one as both can assume access to some in-memory pool of unconfirmed transaction, or do we have deployed fee-estimator not relying on a mempool ? That the fee-estimation or mempool-scanning can be delegated to a one or more third-party is a subsdiary question in my mind.

TheBlueMatt commented 1 year ago

Let’s take the use-case a) which is about discovery of a mempool transaction spending a specific channel commitment output. Here it’s polling an event, not a user submitting transaction ? Yes it can be “optional” in the sense you give a dumb implementation IgnoringMempoolInterface or logic being firewalled behind a config option.

All of the LDK transaction stuff is the user doing their thing and submitting it to LDK. There's no polling for transactions anywhere in LDK and I really don't see why we should add that. I don't understand why this is controversial or something to discuss, let's discuss whether we want a mempool-tx-notification API or not.

In today world where you have more fee spikes breaking cooperation on channel feerates and assuming outbound outputs claimed by HTLC-preimage from “flappy” mobile clients, this scenario is more likely.

That seems like a pretty specific scenario - a mobile phone managed to broadcast a commitment transaction, which eventually confirmed, but then was offline for some extended duration and its HTLC-preimage transaction which was broadcast at the same time at the same feerate didn't confirm? I'm not sure that's likely.

not doing it means we’ll be less performant in term of liquidity management on the long-term IMHO, or do we want to rule out routing hop on the server-side as a LDK use-case ?

Huh?! Please provide a specific issue before you jump to sweeping conclusions. This isn't a helpful comment.

Are you aware of this post from the ACINQ-side on liquidity griefing

I don't think mempool scanning really helps here - an attacker will just replace a few things and we won't get anywhere. Also, scanning for general purpose conflicts isn't something that's trivial without a full mempool,

What do you think on the idea of combining the proposed MempoolInterface with our current FeeEstimator one as both can assume access to some in-memory pool of unconfirmed transaction, or do we have deployed fee-estimator not relying on a mempool ? That the fee-estimation or mempool-scanning can be delegated to a one or more third-party is a subsdiary question in my mind.

Fetching the whole mempool isn't practical. Most LDK clients don't have a full mempool and never should. Fee estimators should figure themselves out, not LDK do it for them (or, better yet, lightning shouldn't care so much about fees, which we're getting close to).

ariard commented 1 year ago

I don't understand why this is controversial or something to discuss, let's discuss whether we want a mempool-tx-notification API or not.

Calm down and scroll back, it was your previous statement "if we add an interface, it would be optional, and users could either submit mempool transactions, or they could ignore it. “ that talk about submitting transactions and my answer was saying a “polling” interface could be considered rather than a “push” mempool-tx-notification API as you seemed to propose.

Though please let’s raise the confusion. Here is a simple reason to propose a “polling” interface where the user has to poll the interface every X in a background thread, namely replacement. Replacement could happen in an unbounded fashion from downstream and provoke processing on our side, rather than fetching the latest in-mempool transaction.

I don’t think it’s a serious concern as soon as we have matching preimage, we can forget the “scanned” output, do we have more concerns to think about like cross-layer deanonymization oracle by the backward link counterparty ?

That seems like a pretty specific scenario - a mobile phone managed to broadcast a commitment transaction, which eventually confirmed, but then was offline for some extended duration and its HTLC-preimage transaction which was broadcast at the same time at the same feerate didn't confirm?

I think we agree the scenario is technically plausible. Post-anchor there is 1 CSV between transactions so you might have one block of confirmation changing the “top mempool” feerates and note here the fee-bumping is on the burden of the mobile user, which might be rely on third-party of diverse quality.

On the probabilities of such happening just looking on today blocks sat/vB you have few jumps in the order of ~5 units, sufficient for that type of scenario to happen. I can come with further math though here we would need to agree on a fee-estimator model first, like the Bitcoin Core one.

Huh?! Please provide a specific issue before you jump to sweeping conclusions. This isn't a helpful comment.

During the first years of LDK development, iirc we wondered a lot if we wished to focus on embedded, hardware, point of sales, mobiles niches as priorities among other things.

I don't think mempool scanning really helps here - an attacker will just replace a few things and we won't get anywhere. Also, scanning for general purpose conflicts isn't something that's trivial without a full mempool,

The attack model is the following:

Instead if you can scan the mempool for the contributed input, as soon as you see the double-spend, the local UTXO contributed could be unlocked and re-allocated in another collaborative txn, without on-chain fee on our side.

This is true a full-mempool helps here (even distributed full-mempool if you assume monitor/backend replica like documented in our glossary) though note dual-funding by LSP is going to be a real-thing, and this is expected those LSP to rely on a full-mempool to pay lower on-chain fees cost for all their rebalancing.

Fetching the whole mempool isn't practical. Most LDK clients don't have a full mempool and never should. Fee estimators should figure themselves out, not LDK do it for them (or, better yet, lightning shouldn't care so much about fees, which we're getting close to).

I think saying most LDK clients don’t have a full mempool and never should is again making an assumption on the type of use-case (mobile / watchtower / routing hops / LSP) which is build. Beyond saying "lightning shouldn’t care so much about fees”, I think you don’t understand LN and Bitcoin to say this and don’t take it as an insult, I’m happy to come with the formal game-theory demonstration on the mailing list.


Being more constructive, do we agree the use-case "a) passing backward a claim preimage from a counterparty transaction without waiting confirmation therefore better liquidity usage of an inbound edge” is worthy of a simple mempool scanning implementation ? And then we have to agree on polling or mempool-tx-notification API trade-offs ?

ariard commented 1 year ago

I think I’ll move forward with an implementation of a mempool-scanning interface out-of-LDK tree as you can easily extend the FeeEstimator integrated with all components with new methods needed. I’ll keep it simple in a way it can be brought back in the tree easily.

I think we have a real issue with LDK project development culture due to Matt personality and his poor communication skills to ensure good social scalability of the project. Staying open to talk with him out-of-band to ease the introspection work on his side.

TheBlueMatt commented 1 year ago

Calm down and scroll back, it was your previous statement "if we add an interface, it would be optional, and users could either submit mempool transactions, or they could ignore it. “ that talk about submitting transactions and my answer was saying a “polling” interface could be considered rather than a “push” mempool-tx-notification API as you seemed to propose.

I'm still thoroughly confused what this would look like. Fom the PoV of LDK, polling would mean regularly calling user code to ask it to look at the mempool, but LDK has no runtime, its only regular interval operations are user code calling into LDK. Users generally implement block connection by polling some block source and telling LDK when they get new info (from LDK's PoV this is a push interface, no matter how the user implements it). I'm not at all sure how you envision some kind of polling interface for a library with no runtime.

I think we agree the scenario is technically plausible. Post-anchor there is 1 CSV between transactions so you might have one block of confirmation changing the “top mempool” feerates and note here the fee-bumping is on the burden of the mobile user, which might be rely on third-party of diverse quality.

Mmm, that's fair, but I think you've made a second jump - the phone didn't simply go offline and fail to broadcast the HTLC transaction, but rather was online and its fee estimator was good enough to get a commitment transaction confirmed quickly, but then the mempool spiked hard and the HTLC transaction didn't confirm for quite some time, with the device going offline and not coming back to RBF. While mempool spikes certainly happen, them jumping up and not coming back down for weeks isn't all that common, and the device has to have a very specific pattern of online -> offline at the wrong time and never come back online. This seems like a pretty rare occurrence, and building a whole new API to optimize this case, but not save some funds loss, seems like a stretch to me.

Instead if you can scan the mempool for the contributed input, as soon as you see the double-spend, the local UTXO contributed could be unlocked and re-allocated in another collaborative txn, without on-chain fee on our side.

This is a much more narrow mempool API, and I'm not sure that we'll want to be tracking double spends and UTXO locking in LDK for dual-funding to begin with. While dual-funding users may want this, I'm not sure if/how much of it belongs in LDK, vs the on-chain wallet handling the locking.

I'm gonna go ahead and ignore the repeated insults both against me and the broader LDK project that aren't constructive here. But if you continue lacing responses with them, at some point the discussion isn't worth having anymore.

ariard commented 1 year ago

I'm still thoroughly confused what this would look like. Fom the PoV of LDK, polling would mean regularly calling user code > to ask it to look at the mempool, but LDK has no runtime, its only regular interval operations are user code calling into LDK. > Users generally implement block connection by polling some block source and telling LDK when they get new info (from LDK's PoV this is a push interface, no matter how the user implements it). I'm not at all sure how you envision some kind of > polling interface for a library with no runtime.

This is correct from the PoV of a library with no runtime and the polling responsibility could be deferred to the users (e.g here ldk-node). I wonder if the mempool-notification interface shouldn’t be landed in Bitcoin Core or equivalent, and we should have on the Lightning-side only a consumer trait like we have for block source consumption. All I know I’m worried to introduce new cross-layer linking vectors based on timing issues or DoS vectors due to interaction with the mempool.

Mmm, that's fair, but I think you've made a second jump - the phone didn't simply go offline and fail to broadcast the HTLC > transaction, but rather was online and its fee estimator was good enough to get a commitment transaction confirmed quickly, but then the mempool spiked hard and the HTLC transaction didn't confirm for quite some time, with the device going offline and not coming back to RBF. While mempool spikes certainly happen, them jumping up and not coming back > down for weeks isn't all that common, and the device has to have a very specific pattern of online -> offline at the wrong time and never come back online.

Well I would said we have to build more and more for a world of acyclic mempool spikes, where we suddenly see sats/vB jumps and pre-signed feerates becoming stale and no visibility on the liveliness of mobile clients. The user might have to go to live his life and here a RBF notification would be welcome to invite him to correct the shot. Note such RBF notification from this new API could be consumed by its local always-online LSP post-anchor thanks to the anchor output. I think this is interesting and worthy though the conversation dwelve more into mobile-clients LSP flow.

This is a much more narrow mempool API, and I'm not sure that we'll want to be tracking double spends and UTXO locking in LDK for dual-funding to begin with.

Yeah that’s a fair point, though I’m not sure either it would belong more in BDK or Bitcoin Core scope, as from a Lightning viewpoint you might be interested to be able to subscribe on witness/transaction pattern (to fetch the preimage), this is fair feedback it might belong more to rust-bitcoin which is downstream of both BDK / LDK.

Sorry for the insult, I keep deep appreciation for your technical skills and knowledge of Lightning, though I took it as an insult to close code PR when a) they’re open in all good faith with the aim to collect feedback on what a cross-layer interface could looks like (and code is always speaking more than words) and b) we have already a number of old 2022 PRs open if management of open PRs is a real concern. I think I’m as part LDK project than you’re.

Still I would appreciate if an effort can be done in the janitorial role and communication around it, especially to coordinate around complex and context-rich PRs, where it should be welcome to propose a poc, collect feedback on implementation / interface from multiple contributors, come back with a more production-ready PR. From my experience of Bitcoin Core, janitorial role is to be a calming force in the project, not a “move-fast-and-try-to-end-quickly-complex-technical-conversation”.

FWIW, re-opened the mempool scanning interface I wish in a “wizard" fork of LDK here: https://github.com/civkit/rust-lightning-wizards/pull/2. Doing so is something I’ve in mind since experimenting with DLC back in 2020, somehow it’s good to have a nursery for advanced mechanism where in fact you’re designing new end-to-end protocol. I’ll do the best to keep backport simple if code or ideas are judged strong enough to land back in main tree (or in fact if it’s better in rust-bitcoin).

Changed nothing on my side to keep contributing to more in-BOLT spec and security / safety maintenance to LDK. Generally, yes I think we have “social scalability” issues to move forward with complex Lightning features where we’re crossing boundaries with base-layer though here it’s more in fine a lack of qualified reviewers and developers :(