multi: implement WIP extra onion blob encoding scheme as prep for AMP and beyond

NOTE: The encoding scheme implemented in this PR is not final. Instead, it's a draft of the latest format for encoding EOB data that is currently being discussed on the mailing list with the various implementations and contributors

Overview

In this PR, we implement and end to end scheme for encoding data inside "virtual onion" hops. We call this dat EOB data or extra onion blob data. This PR allows a caller to encode extra data in any hop. We're constrained w.r.t the amount of data we can encode in each hop, as some fields (like the HMAC) must always be in place. As a result, depending on the amount of data encoded. we may need to expand the full size of the path. If we need to expand a path, then we use the same DH key to derive the shared secrete for those hops so the node can continue to unwrap packets and parse them to uncover all the EOB data.

Payment Paths

In this PR, we add a new abstraction of a PaymentPath. A payment path is a fixed size array with the max size of a payment path within the network. A path has a "true size" which is the number of hops that actually contain useful data. A payment path is then comprised of multiple OnionHops. An OnionHop represents an abstract hop (a link between two nodes) within the Lightning Network. A hop is composed of the incoming node (able to decrypt the encrypted routing information), and the routing information itself. Optionally, the crafter of a route can indicate that additional data aside from the routing information is be delivered, which will manifest as additional hops to pack the data.

From there, we extend the OnionHop field to allow a caller to specify which hops should be possibly expanded to encode extra onion blob data. We call the process of expanding a hop to encoded extra data "unrolling". As a result of the unrolling process, the true size of a path may increase. For example, if we have a 2 hop route, and we want to encode 32 bytes for the last hop, since that can't fit into a pivot hop, we need to add an additional full hop which increases the size of the path to 3 hops. We then update the NewOnionPacket method to detect when a hop contains EOB data. If so, then we'll unroll the entire path into one which encodes the EOB data within the hop data of the same, or additional virtual hops. The unrolling process will duplicate hop public keys accordingly such that we encrypt a hop to the same public key for several hops to allow the target node to continue to parse and unwrap each packet.

Along the way we add some helper functions used to determine how large an unrolled will be EOB data wise, and if it has any data at all.

Extra Onion Blob Encoding Format

In thisPR, we add a new file eob.go which will house all of our logic for encoding and decoding any data for EOB hops. An EOB hop is a hop that contains extra data and may span over several hops. There are two types of EOB hops: pivot hops, and full hops. Within a pivot hop, we can fit up 10 bytes of data as we use 1 byte to signal the type of the EOB data, and another byte to signal if there are more hops (high bit) and the number of bytes consumed (lower 7 bytes). Within a full hop, we can fit up to 32 bytes as we use all the regular forwarding fields, but leave the HMAC in tact, and then use 1 byte of the extra padding bytes to encode more+length in a similar manner as the pivot hop.

Currently (just a draft) there are three EOB data types. Empty, sphinx send, and inner TLV. Empty is the default value used today and indicates that there are no extra bytes encoded. The sphinx type is used to send non-interactive payments to a destination node. It consumes by default 2 hops and will encode a 32 byte pre-image along with 10 bytes of extra data which can be used as an identifier. The final TLV type is mean to signal that an upper layer will interpret the opaque bytes as encoded TLV values.

Modified Onion Packet Processing

A s prep for the modified onion packet processing, we refactor the processOnionPacket as prep for the future EOB parsing. Once EOB's are implemented, the main processOnionPacket may actually unwrap multiple inner packets that are encrypted to the same node identity key. We then update the processOnionPacket method to allow it to detect if there are any additional hops encoded to the router than contain EOB data. If so, then it will unpack the pivot hop, and then continue to unwrap layers from the onion until all EOB data has been recovered. Processing is mostly the same. The major difference is that we check the HMAC at the final outer hop rather than the first in order to properly detect the exit hop.

We've also added a series of tests to ensure that we're able to uncover EOB data at each hop, even when an EOB hop expands into multiple actual hops.

Thanks @Roasbeef for the work in getting this ball rolling. I am however a bit concerned that the proposal has a few downsides. I hope I understood the code correctly, but without a spec, I had to reverse engineer a bit :wink:

There are 3 issues I see with this proposal:

The proposal sticks closely to the processing structure of the current onion, keeping each payload separate from the previous one, and dynamically deciding whether to unwrap more layers in order to incrementally read more of the payload into a separate buffer. If I'm not mistaken, each layer is also still encrypted like the first payload, which requires us to perform the decryption for each 32 byte increment of the payload, which is really wasteful. There is imho no advantage in encrypting the payload destined for a single node multiple times, just to keep the processing structure intact, and we could just read the other payloads without decrypting them.
The second consequence from keeping the processing identical to the "old" onion spec is that we are wasting 32 bytes in the HMAC for each hop, so we have a 32 byte overhead for each 33 bytes of payload. If we break the model a bit (as we'll see below it's just a matter of changing the shift-size) we can use those bytes as well, giving us 65 bytes for each additional hop (you noted this here)
Finally, the payload is split into 32 byte chunks (and a 10 byte chunk for the first payload), which is suboptimal since it requires a copy into a new buffer to be parsed. It would be desirable to keep the payload in memory in one contiguous region in order to make zero-copy serialization and deserialization possible, and it'd definitely make the implementation much easier.

I think most of this stems from trying to maintain the processing model as rigidly as possible, and keeping the current payload format intact (for the pivot payload). This also creates the need to create 3 different payload types. I'd propose however that we go one step further, deprecate the old payload format completely and make the direct switch to TLV.

So my proposal is as follows:

Use the 4 MSB in the first realm byte as a counter of how many 65 byte payload-chunks are destined for the processing node. This clearly distinguishes them from the current payload and directly gives an indication on how much it has to read/allocate.
Separate payload parsing from the payload extraction, i.e., the onion processing code just hands back a slice of bytes and a realm that are then passed to external parsers that don't interact with the onion processing at all. Mirroring this, when creating a new onion we just get passed the 4 LSB of the realm, and an opaque slices of bytes that may or may not be TLV encoded, we don't care.
Given the payload length we can infer how many hops we need to read/write. The format looks like this:

For a payload of length <= 32:

|-------+-------------------+------|
| realm | payload + padding | HMAC |
|-------+-------------------+------|

For payloads > 32 and <= 162 for example:

|-------+--------------+-----------------+-------------------------+------|
| realm | payload[:64] | payload[64:129] | payload[129:] + padding | HMAC |
|-------+--------------+-----------------+-------------------------+------|

In other words we use the realm byte from the first hop to determine the number of hops to read, the first hop still has room for 64 bytes of payload (per-hop-payload + HMAC size - realm byte). Any intermediate hop has the full 65 bytes available. The last hop has 33 bytes of payload available, since we will use its HMAC to pass it on to the next node in the path. Notice that we do not decrypt the payloads after the first since processing the first hop already decrypted all the following hops, including the ones we'll be processing. In addition we get a better use of the limited space that we have available and the entire payload is contiguous in memory and can be passed out to the parser as is, without having to stitch it together.

The implementation is also rather trivial, all we need to do is to pass the payload as a byte slice out during processing, and to get the next onion instead of shifting by 65 bytes and padding with 0s, we shift by 65*nhops and pad that with 0s. So the only thing we really need to do is to have the rightShift, headerWithPadding, and copy not take 65, but 65*nhops as arguments.

What do you think @Roasbeef? Would that work?

* Use the 4 MSB in the first realm byte as a counter of how many 65 byte payload-chunks are destined for the processing node. This clearly distinguishes them from the current payload and directly gives an indication on how much it has to read/allocate.

I assume you mean "how many additional". That was the intent (ie. 0 = just this one, as now).

But yes, this is what I was thinking and seems more optimal. Of course it gives @Roasbeef more room to stream movies, too...

I assume you mean "how many additional". That was the intent (ie. 0 = just this one, as now).

That's one option, I was hoping to signal the use of TLV and variable payloads with any of the 4 MSB bits being set, but I think you are right that that should be LSB bits which indicate payload type.

Closing in favor of #36.

lightningnetwork / lightning-onion