Scorer: Only update max liquidity bounds for `temporary_channel_failure(0x1007)`

Currently, it seems we would update the max liquidity bound even if forwarding would fail due to, e.g., fee_insufficient(0x100d):

2023-10-19 12:08:04 INFO (LDK): Got route: path 0:
 node_id: 0296b2db342fcf87ea94d981757fdf4d3e545bd5cef4919f58b5d38dfdd73bf5c9, short_channel_id: 893285027868704769, fee_msat: 4000, cltv_expiry_delta: 80
 node_id: 035e4ff418fc8b5554c5d9eea66396c227bd429a3251c8cbc711002ba215bfc226, short_channel_id: 888839702390636544, fee_msat: 0, cltv_expiry_delta: 40
 node_id: 033fc997c1622281b143fce0cce0711f18fdc8129684840b835d6a4c5ef9d49ddf, short_channel_id: 804725963415420929, fee_msat: 0, cltv_expiry_delta: 40
 node_id: 03cde60a6323f7122d5178255766e38114b4722ede08f7c9e0c5df9b912cc201d6, short_channel_id: 713570951859798016, fee_msat: 40000000, cltv_expiry_delta: 80
 blinded_tail: None
...
2023-10-19 12:08:07 INFO (LDK): Onion Error[from 035e4ff418fc8b5554c5d9eea66396c227bd429a3251c8cbc711002ba215bfc226: fee_insufficient(0x100d) htlc_msat(000002625a000088)] Node indicated the fee amount does not meet the required level (lightning::ln::onion_utils 696)
2023-10-19 12:08:07 DEBUG (LDK): Not able to penalize channel with SCID 893285027868704769 as we do not have graph info for it (likely a route-hint last-hop). (lightning::routing::scoring 1461)
2023-10-19 12:08:07 DEBUG (LDK): Setting min liquidity of 0 from SCID 888839702390636544, towards NodeId(035e4ff418fc8b5554c5d9eea66396c227bd429a3251c8cbc711002ba215bfc226) to 40000000 (lightning::routing::scoring 1322)
2023-10-19 12:08:07 DEBUG (LDK): Setting max liquidity of SCID 804725963415420929, towards NodeId(033fc997c1622281b143fce0cce0711f18fdc8129684840b835d6a4c5ef9d49ddf) from 9900000000 to 40000000 (lightning::routing::scoring 1309)

However, this doesn't make a lot of sense: a node telling us the fee wasn't high enough doesn't tell us anything about their available liquidity. We therefore shouldn't update the (max) liquidity bounds at all in such cases.

Hmm, maybe? Its still a failed payment and we still need to "punish" the node, however. For RGS users this can get hit sometimes, but best to punish that channel and avoid it if its updating its fee too often (and we'll decay that info eventually, especially if we're fetching a scorer from a probing server). For nodes not using RGS, the node is probably kinda buggy, or at least failing to get its updates out, in which case we definitely want to avoid it in the future.

At a high level, whenever a payment fails, we need to do something to avoid routing through that channel in the future, or if not need certainly need to very carefully consider it if we don't. If we start skipping scoring based on error code, an intermediary node is incentivized to always just return a specific error code so that we always try them again in the future because they "haven't failed". While we could add some new logic to handle specific cases like incorrect fee, I'm somewhat hesitant to do that, both because we no longer apply the updates, and because it might cause some other issues where nodes failing to get their update out in time and constantly updating their fees always end up seeing our payments and failing.

Hmm, maybe? Its still a failed payment and we still need to "punish" the node, however. For RGS users this can get hit sometimes, but best to punish that channel and avoid it if its updating its fee too often (and we'll decay that info eventually, especially if we're fetching a scorer from a probing server). For nodes not using RGS, the node is probably kinda buggy, or at least failing to get its updates out, in which case we definitely want to avoid it in the future.

Hm, but munching all the metrics together could really mess with the assumptions of our (historical) liquidity estimation, no?

At a high level, whenever a payment fails, we need to do something to avoid routing through that channel in the future, or if not need certainly need to very carefully consider it if we don't.

But we do? We'd avoid reusing any failed channels via PaymentParameters::previously_failed_channels at least for the same payment?

If we start skipping scoring based on error code, an intermediary node is incentivized to always just return a specific error code so that we always try them again in the future because they "haven't failed". While we could add some new logic to handle specific cases like incorrect fee, I'm somewhat hesitant to do that, both because we no longer apply the updates, and because it might cause some other issues where nodes failing to get their update out in time and constantly updating their fees always end up seeing our payments and failing.

In any case, wouldn't using our per-channel scoring be a bad fit because for things like "insufficient fee" we'd really like to avoid the whole node for a while, not just a single channel?

Hm, but munching all the metrics together could really mess with the assumptions of our (historical) liquidity estimation, no?

Kinda? I mean, yes, we're learning that we can't send amount X over a channel, but in fact we've learned here that in this instance we can't send even 1 msat over the channel. Arguably based on this error code we should set the channel's available liquidity to zero instead, which I think wouldn't be a different metric. But even ignoring that specifically, I don't think we're mixing metrics tooo much here - we learned that we cant send a payment for a given amount, who cares why, we couldn't.

But we do? We'd avoid reusing any failed channels via PaymentParameters::previously_failed_channels at least for the same payment?

Right, I meant for future payments.

In any case, wouldn't using our per-channel scoring be a bad fit because for things like "insufficient fee" we'd really like to avoid the whole node for a while, not just a single channel?

Yea, having a node-level score rather than only channel-level scores is something we should consider, and something I've wanted to have in general, but we do have to be careful how we do it. In this case its easy - punish them! In cases where the node just often fails to forward, I do want to think a bit more about whether we want to be pushing at the node level (implying if you don't regularly rebalance your node we will avoid you, similar to what LND does) and whether that's a good result for the network.

lightningdevkit / rust-lightning

Scorer: Only update max liquidity bounds for `temporary_channel_failure(0x1007)` #2673