Payment channel congestion via spam-attack

lightning / bolts

BOLT: Basis of Lightning Technology (Lightning Network Specifications)

2.08k stars 493 forks source link

Payment channel congestion via spam-attack #182

Open EmelyanenkoK opened 7 years ago

EmelyanenkoK commented 7 years ago

Since maximum number of htlc in fly is limited (by tx-size) number 432 it is possible and costless to congest payment channel: lets consider topology A-B-C-A. Node A can easily DDOS B-C channel by sending 432 (or max_htlc_in_fly) htlc-payments and not automatically resolving it (or resolving it with considerable delay). As I understand there are no ways to stop this attack with current specification, since each node can't defend itself: blocking by ip, or blocking specific node is ineffective due to onion routing (attacking node may not interact directly with B or C). Thus this issue is different from #122.

Since this attack is critical for Lightning Network proper work and since I can't imagine mitigating of this attack without adding new messages or significant changes in existing I think this issue cant wait till v1.1 .

I figured out one possible way to avoid this issue: non-refundable prepaid fee for htlc.

Roasbeef commented 7 years ago

Hi @EmelyanenkoK, excellent observation!

One quick correction:

Node A can easily DDOS B-C channel by sending 432 (or max_htlc_in_fly)

The max_htlcs_in_flight parameter is local to the channel participants. Those routing through the channel don't know of such channel flow control parameters.

It's worth noting that attacks of this nature have bene brought up before in the past. See this mailing list titled "Loop Attacks in Onion Routing....". In that thread the solution which was tossed around was the concept of "peeling the onion". Essentially, this means that in the case of misbehavior nodes, can poll each other and incrementally decrypt the onion (in addition to providing a valid close transaction for the final hop). Such a feature would allows nodes to ascribe blame to malicious actors within the network. However, IMO it's a bit of a non-solution as it's feasible that if not designed correctly, it would allow nodes to actively deanonymize payment routes at a large scale, defeating the whole point of onion routing.

During the Lightning Summit in Milan (event where we all met up and brained dumped specification ideas, Oct 2016), this attack was revisited and received a good bit of discussion. IIRC, like you, we also agreed that an easy way to mitigate an attack of this nature was to enforce "pre-paying" of fees. Pre-paying means that a sender pays fees on two occasions: when an HTLC is initially cleared, and also when an HTLC is settled by the receiver.

We never fully fleshed out the design the design around prepaying, but allow me to share some of my thoughts on the matter. The function used to calculate the amount of fees prepaid for clearing an HTLC must factor in the absolute CLTV timelock of the HTLC itself. With this, a node's valuation of the time value of their BTC can easily be expressed and serve mitigate attacks like this. With this addition, the attacker must suspend their initial capital within the network and also bleed away those founds with each attack HTLC attempted. Going a bit further, I think the normal HTLC settle fees (post-pay) should also take into account the absolute time locks of each HTLC. In my opinion, the lack of such a factor within the current fee calculation function is a massive oversight. Without also factoring in the pessimistic fund suspension period, costs of clearing HTLC's aren't fully realized in the current fee schedules.

On the other hand, one may argue that participants within the network need to internalize the pessimistic time delays possible when routing HTLC's and expect funds to be tied up for the full duration of the absolute timeout. Thankfully, the switch to 2-layer HTLC's allows us to avoid the possibly extreme delays with longer routes that the prior HTLC design suffered from.

It's also worth noting that currently within the spec, when creating/accepting channels, nodes have quite a bit of knobs that allow them to reduce their exposure at any given time (max_htlc_value_in_flight_msat, channel_reserve_satoshis, max_accepted_htlcs). Wide-spread Intelligent usage of these knobs can also serve to mitigate attacks like this. Proper usage of these knobs will also force nodes to ensure their forwarding logic is backpressure aware.

EmelyanenkoK commented 7 years ago

The max_htlcs_in_flight parameter is local to the channel participants. Those routing through the channel don't know of such channel flow control parameters.

Yes, but anyway max_htlcs_in_flight is limited and relatively small number.

https://lists.linuxfoundation.org/pipermail/lightning-dev/2015-August/000135.html

Very interesting reading. However, I want to note that attack became much more nasty than it was in 2015. Now, with limited number of htlcs you not only locking amount of payment multiplied with number of hops, but with 432 htlcs of arbitrary size (still should be higher than htlc_minimum_msat) attacker can block arbitrary channel and all funds in it.

deanonymize payment routes

As it was pointed in ML, such practice is inherently dangerous and can cause fragmentation of the network and whole deanonimization.

It's also worth noting that currently within the spec, when creating/accepting channels, nodes have quite a bit of knobs that allow them to reduce their exposure at any given time (max_htlc_value_in_flight_msat, channel_reserve_satoshis, max_accepted_htlcs). Wide-spread Intelligent usage of these knobs can also serve to mitigate attacks like this.

I believe that central hubs(which will gain profit from fees) will be most attractive to attack(because it is effective way to stop most of the payments). As I understand. there are no incentives for those hubs keep max_htlc_value_in_flight_msat less than funding_satoshis*1000 : those hubs never initiate payments by themselves and want all their funds to work.

On the other hand, one may argue that participants within the network need to internalize the pessimistic time delays possible when routing HTLC's and expect funds to be tied up for the full duration of the absolute timeout.

We discussed this attack with @AndrewSamokhvalov and I came to the conclusion that with current specification most effective way to attack is to delay update_fail_htlc (for instance with routing error message). In this case attack will be costless no matter how big are proportional fees.

By the way, in the ML was pointed another possible solution as a general rule:

Give me response (update_fulfill_htlc, update_fail_htlc or update_fail_malformed_htlc)
in 20 seconds (+probably some decrementing through route addition) or I will close
channel.

As I understand in worst scenario the last node before attacker can lose htlc amount (or even many htlcs). However payment still will be resolved in 20 second for all other nodes and thus we can defend central hubs. Maybe we somehow can mitigate consequences for last node before attacker?

P.S. Nevertheless I still think this attack is critical and should mitigated before release in production.

Roasbeef commented 7 years ago

Yes, but anyway max_htlcs_in_flight is limited and relatively small number.

What I'm saying is this: this value is private to the nodes that created the channel, and this value can be < the artificial constraint on HTLC's in a particular direction.

The current constraint on the max number of HTLC's in a particular direction was set in order to allow peer's catching a channel participant broadcasting a revoked transaction to be able to sweep all the funds in a single transaction. IMO, we shouldn't really care to be optimizing for a scenario which may be very rare in practice. Instead the value should simply be the maximum number of hHTCL's that can fit in a commitment transaction without exceeding the weight policy observed by the majority of relaying nodes. With this constraint in mind, the number rises to: ~2321 total (or 1160 for each side).

I believe that central hubs(which will gain profit from fees) will be most attractive to attack(because it is effective way to stop most of the payments

Another disincentive to the rise of "central hubs"! If such "hubs" exist, there're far easier ways to disrupt them than send HTLC's and never settle them. If such "hubs" exist, they're a clear near single-point-of-failure. A resilient network graph will be a diffuse one with a high degree of path diversity (which is also important for the privacy properties of onion routing).

As I understand in worst scenario the last node before attacker can lose htlc amount (or even many htlcs).

How can a node lose the values of HTLC's? You mean if the HTLC's were all dust?

Depending on how one looks at the situation, this isn't really a solution either. If the node right before the attacker doesn't follow this 20 second rule, and properly manages their HTLC queue, then they'd still be able to use their channel as normal forwarding and making payments to their desire. Only once the HTLC timeout enters the grace period would they then broadcast their commitment transaction, sweeping the HTLC's soon after the broadcast. If they broadcast immediately, then they lose the utility of their channel entirely and for the full duration of the HTLC timeout.

In either case, all the upstream nodes would get the cancel and remove the pending HTLC off-chain.

EmelyanenkoK commented 7 years ago

The current constraint on the max number of HTLC's in a particular direction was set in order to allow peer's catching a channel participant broadcasting a revoked transaction to be able to sweep all the funds in a single transaction.

Indeed, we can make multi-stage htlc output, where commitment tx contains output which can be spent by transaction which serve htlc's (or another layer). Thus it is possible to handle any number of htlc by large number medium-size transactions.

How can a node lose the values of HTLC's? You mean if the HTLC's were all dust?

I mean if node right before the attacker will close channel with attacker and return back update_htlc_fail (only in this case other node will resolve htlc during 20sec), attacker can spend htlc from published commitment via preimage and thus node right before attacker will lose htlc value (or maybe many such htlc values if attacker simultaneously delay many htlcs). I'm curious is there the way to compel node to which we offered htlc to resolve it (or make it expensive to not resolve).

If the node right before the attacker doesn't follow this 20 second rule, and properly manages their HTLC queue, then they'd still be able to use their channel as normal forwarding and making payments to their desire.

I believe that if we will introduce something like "20sec rule" it should be mandatory for all participants.

rustyrussell commented 7 years ago

Olaoluwa Osuntokun notifications@github.com writes:

During the Lightning Summit in Milan (event where we all met up and brained dumped specification ideas, Oct 2016), this attack was revisited and received a good bit of discussion. IIRC, like you, we also agreed that an easy way to mitigate an attack of this nature was to enforce "pre-paying" of fees. Pre-paying means that a sender pays fees on two occasions: when an HTLC is initially cleared, and also when an HTLC is settled by the receiver.

No, actually, we concluded afterwards that it's a dead-end. As you can imaging, paying for failed HTLCs introduces a major incentive problem :)

It's worth noting that attacks of this nature have bene brought up before in the past. See this mailing list titled "Loop Attacks in Onion Routing....". In that thread the solution which was tossed around was the concept of "peeling the onion". Essentially, this means that in the case of misbehavior nodes, can poll each other and incrementally decrypt the onion (in addition to providing a valid close transaction for the final hop). Such a feature would allows nodes to ascribe blame to malicious actors within the network. However, IMO it's a bit of a non-solution as it's feasible that if not designed correctly, it would allow nodes to actively deanonymize payment routes at a large scale, defeating the whole point of onion routing.

Perfect is the enemy of the good.

Unfortunately, nobody has come up with another disincentive for such attacks. And without disincentive, the "solution" is to proactively deanonymize the network so you can judge whether a payment is "likely" to be malicious, or restrict the network to "trusted" nodes.

The 20-second-peel-or-close requirement costs the attacker a channel each time, and sets a hard expectation of reliability when payments are in-flight, promoting better uptime overall.

Yet it is of limited use for deanonymizing, since it only peels the onion for the prior nodes; you can use it to trace the path between nodes you control, and even then you can only unmask N-1 nodes if you control N nodes on the path. It's also detectable for the payer.

We never fully fleshed out the design the design around prepaying, but allow me to share some of my thoughts on the matter. The function used to calculate the amount of fees prepaid for clearing an HTLC must factor in the absolute CLTV timelock of the HTLC itself. With this, a node's valuation of the time value of their BTC can easily be expressed and serve mitigate attacks like this. With this addition, the attacker must suspend their initial capital within the network and also bleed away those founds with each attack HTLC attempted. Going a bit further, I think the normal HTLC settle fees (post-pay) should also take into account the absolute time locks of each HTLC. In my opinion, the lack of such a factor within the current fee calculation function is a massive oversight. Without also factoring in the pessimistic fund suspension period, costs of clearing HTLC's aren't fully realized in the current fee schedules.

Yeah, simplicity won here, but also per-timeout charging is even more centralizing, pushing short routes as it does.

So the decision was to see how bad it gets in practice...

Thanks, Rusty.

deuteragenie commented 6 years ago

I wonder if a reputation-based and/or scoring mechanism could not be helpful against spam / (D)DOS attacks on the lightning network. Nodes would prefer to talk to nodes with higher reputation. Need to carefully look at the impact on centralization though.

fresheneesz commented 5 years ago

I recently discussed this problem in some depth and I think I've come up with a novel solution.

The basic idea is that if a payment is started and a node fails or refuses to forward the secret, after some timeout (much shorter than the HTLC time lock times), nodes who haven't received the secret will demand proof that some node up the line was punished in some way. This punishment could be channel closure, but could also be something less severe (will expand on that later). Th proof would be something that identifies the channel was part of the route (eg by passing along the HTLC that contains the preimage hash) along with proof of the punishment itself (like a submitted channel-closing-transaction).

Examples:

... -> H1 -> H2 -> A1 -> ...

If the attacker A1 is right next to H2, H2 will time out and demand proof-of-punishment from A1. A1 won't be able to provide any proof, and so H2 will close its channel with A1. H1 will also ask H2 for proof-of-punishment and since H2 closed its channel with A1, it can pass along that proof. H1 can then pass the same proof data to anyone down the line.

... -> H1 -> A1 -> A2 -> A3 -> ...

If the attacker has buffer nodes, you can see that this won't help. The attacker can choose to close one of its own channels, or can simply refuse to cooperate and H1 will close its channel with A1. Anything that happens, either payment is completed or a channel in the route is closed.

The incentive model is that everyone wants to pass the buck if possible, keeping their own channels open. So only a channel where one partner failed or is dishonest will be forced to close.

We can extend this idea using something akin to the "reputation mechanism" other people have mentioned. If a public "greylist" is kept recording channels that have failed to properly forward payment data, channels wouldn't have to be closed every time this happens. After all, network failures, power failures, machine failures, etc can all cause an honest node to fail to route a payment occasionally. In this case, instead of closing a channel, the punishment would be for one channel partner to add the channel to the greylist. However, the greylist would have a limit to the number of times a node can be recorded (in some time window). Once that limit is reached, instead of adding the channel to the greylist again, the channel must be closed. A node that doesn't follow the protocol and refuses to close the channel will themselves have their channel closed by an honest channel partner.

This reputation method has the benefit of being entirely based on channel-partners. Only a partner involved in the channel could add the channel to the greylist, so there's no possibility of abuse - an attacker could only add its own channels to the greylist. I think this probably needs to be a general property of any reputation system used for the lightning network - ratings must be accompanied by proof that the rater has a direct channel with the counterparty being rated.