lightning / bolts

BOLT: Basis of Lightning Technology (Lightning Network Specifications)
2.11k stars 492 forks source link

Error decryption - adopt requirement also for new tlv package format. #1125

Closed ziggie1984 closed 4 months ago

ziggie1984 commented 10 months ago

Currently the spec restrict the error decryption to 20 (hops):

SHOULD continue decrypting, until the loop has been repeated 20 times.

I think this springs from the old legacy format which had a maximum of 20 hops in a path. I think the new tlv package format can have up to 27 hops, so I wonder if this was kept deliberately or whether we should adopt this to the new tlv package structure ?

How I came to 27:

// NumMaxHops is the maximum path length. There is a maximum of 1300 bytes in
// the routing info block. Legacy hop payloads are always 65 bytes, while tlv
// payloads are at least 47 bytes (tlvlen 1, amt 2, timelock 2, nextchan 10,
// hmac 32) for the intermediate hops and 37 bytes (tlvlen 1, amt 2, timelock 2,
// hmac 32) for the exit hop. The maximum path length can therefore only be
// reached by using tlv payloads only. With that, the maximum number of
// intermediate hops is: Floor((1300 - 37) / 47) = 26. Including the exit hop,
// the maximum path length is 27 hops.

https://github.com/lightningnetwork/lightning-onion/blob/06182b1d7d2f9bd15ba1ebb404d026604758e78b/path.go#L10-L20

t-bast commented 10 months ago

I believe that's just a leftover that we should remove from the spec, feel free to open a PR! We should loop until we've exhausted the shared secrets (which matches the number of hops we actually used for the payment path), not until a specific hard-coded value.

morehouse commented 10 months ago

I believe that's just a leftover that we should remove from the spec, feel free to open a PR! We should loop until we've exhausted the shared secrets (which matches the number of hops we actually used for the payment path), not until a specific hard-coded value.

Isn't the fixed number of decryptions needed to prevent any intermediate nodes from learning the source/destination via timing? If so, it would make more sense to increase the hard-coded value to 27 instead of removing it.

t-bast commented 10 months ago

Maybe we're not talking about the same steps in the onion process. I thought @ziggie1984 meant what is done by the sender of a payment when it receives an update_fail_htlc message and tries to decrypt the error contained in that message. In that case I don't think there is any timing-related attacks that applies?

ziggie1984 commented 10 months ago

Yeah the spec just refers to the origin node in this part:

The origin node:

once the return message has been decrypted:
SHOULD store a copy of the message.
SHOULD continue decrypting, until the loop has been repeated 20 times.
SHOULD use constant ammag and um keys to obfuscate the route length.

https://github.com/lightning/bolts/blob/master/04-onion-routing.md#returning-errors

So I think we can remove it.

ziggie1984 commented 10 months ago

I am also confused about the last part: SHOULD use constant ammag and um keys to obfuscate the route length.

I am not sure what is meant by obfuscate, because the origin node only cares about de-obfuscating, the ammag and um keys are locked in at this point for the routing hops (defined when building the route?). So not sure why this is important. I thought the erring nodes decides the size of the failure message and all the following nodes should keep the size constant?

And not sure about the first requirement as well:

SHOULD store a copy of the message.

Should we really require this, it's up to the sender to decide whether he wants to keep the data or not, I think relay protection is not really relevant for the Erring Messages?

t-bast commented 10 months ago

I believe @morehouse is right, those 3 requirements were added to be somewhat "constant-time" on the sender side when decrypting an update_fail_htlc error. What those requirements suggest is that after correctly decrypting the error message after N steps (where N is the position of the erring node in the route) you should keep doing dummy decryption steps to make it look like the error came from the 20th node in the route. Since the route is usually smaller than 20, you don't have ammag and um keys for the fake hops, so the requirements suggest using dummy constant ones.

I believe all of this is useless (timing leaks are dominated by network latency in retries), so we have not implemented this in eclair and we instead stop as soon as we get the decrypted message. Those requirements are only a SHOULD, they aren't a MUST, so it's perfectly ok to ignore them.

ziggie1984 commented 10 months ago

Ok now I understand, it specifically relates to the case where we would try the same payment via a different route and if we do not account for timing we would kind of reveal to the nodes in the path at which position they are probably in.

So I think 20 seems like good enough. I think it probably makes sense to add more context (Rationale) to this part of the requirement I can add a PR describing it if we want that or even increase the hop size to 27?