Design for payment channel-based settlement engine

Current thinking:

Payment channel claims are sent in ILP packets addressed to peer.settle.<currency> (credit to @adrianhopebailie for this idea)
To support backwards compatibility with the JS plugins, the BTP service can convert claims sent in ILP packets to/from claims sent in BTP packets
Incoming claims are processed by an IncomingService written in Rust. Using an atomic Lua script, the latest claim is set and the balance updated accordingly in Redis
Opening and closing payment channels, as well as topping up and withdrawing is handled by a settlement engine written in Javascript (to take advantage of the ripple-lib SDK)
The settlement engine polls the account balances at a regular interval (could be as short as every 100-200ms) to check when outgoing claims should be sent out
The settlement engine signs payment channel claims and puts them in ILP packets to send to the relevant peer

Open questions:

If the BTP service is converting claims put in ILP packets to/from claims in BTP packets, how does it know whether it should do that conversion for a particular peer? Should that be a configuration option or based on something in the BTP handshake?
When using ILP-Over-HTTP, should the settlement engine make outgoing HTTP calls directly to the peer or should it always forward packets through a connector instance?
If the connector is supposed to forward outgoing payment channel claims to a peer, how does it know whether a packet addressed to peer.settle.<currency> is an outgoing or incoming claim? Should the router look at the from account to figure out whether the from is "internal" or not?
Should the same settlement engine do both payment channel and on-ledger settlement? If so, would the preference be configured on the settlement engine, per-account, or based on some factors like how often the engine has to do on-ledger operations with a particular account?

ILP packets addressed to peer.settle.<currency>

peer.settle.<currency> implies that the single account can support multiple assets. How would that be supported? What asset is their balance denominated in?

Using an atomic Lua script, the latest claim is set and the balance updated accordingly in Redis

👍

Incoming claims are processed by an IncomingService written in Rust.

Opening and closing payment channels, as well as topping up and withdrawing is handled by a settlement engine written in Javascript (to take advantage of the ripple-lib SDK)

These two points seem at odds with one another. Separating claim validation from channel opening/funding/closing opens up a number of security issues, and resolving those issues would make the design much more complex (would require multiple distributed locks shared between JS and Rust in order to implement securely).

If settlement engines exposed an interface like this (spitballing here):

interface SettlementEngine {
  /* Sends outgoing settlement to the peer */
  sendMoney(accountId: string, amount: string): Promise<void>

  /** Triggers callback when incoming money is received */
  registerMoneyHandler(accountId: string, (amount: string) => Promise<void>): void

  /**
   * Wrapper registers this on the settlement engine, which the settlement engine
   * calls when it wants to send an outgoing message (paychan claim or other info)
   * to the peer. (The data would be an ILP packet). This allows the connector to handle
   * all the bilateral communication, so the settlement engine doesn't need to.
   */
  registerDataSender(accountId: string, (data: Buffer) => Promise<Buffer>): void

  /**
   * When the wrapper/connector receives an incoming ILP packet addressed to
   * `peer.settle`, they could call this to pass it to the settlement engine.
   */
  handleData(accountId: string, data: Buffer): Promise<Buffer>
}

It'd be straightforward to write a wrapper that interfaces with the Redis database, applies balance updates/triggers settlements, and forwards incoming & outgoing messages/claims.

Also, none of the claim validation/signing outgoing claims would have to be reimplemented.

Should the same settlement engine do both payment channel and on-ledger settlement? If so, would the preference be configured on the settlement engine, per-account, or based on some factors like how often the engine has to do on-ledger operations with a particular account?

I'd lean towards probably not. Seems like it adds complexity when at least for the time being, I imagine most accounts will either have near zero trust (payment channels), or pretty high trust (on-ledger settlement).

Thanks for the input!

peer.settle. implies that the single account can support multiple assets. How would that be supported? What asset is their balance denominated in?

How so? Accounts are denominated in a single currency

These two points seem at odds with one another. Separating claim validation from channel opening/funding/closing opens up a number of security issues, and resolving those issues would make the design much more complex (would require multiple distributed locks shared between JS and Rust in order to implement securely).

What do you mean?

I was thinking that the Rust code would check that the signature is valid and then offload the figuring out whether the claim is for a higher amount to Redis. The settlement engine would similarly rely on Redis to make sure the claiming of claims and updating the balances is atomic.

If settlement engines exposed an interface like this (spitballing here):

The main thing I'm trying to avoid is forwarding all packets that look like incoming claims (i.e. have the destination address peer.settle.whatever) to the settlement engine. That would make it super easy to DoS the settlement engine. The alternative is having the incoming claim processing in the connector itself, with the assumption that that component is built to be horizontally scaled based on traffic. Take a look at this architecture explanation for more details on my thinking about the settlement engine / connector split.

I imagine most accounts will either have near zero trust (payment channels), or pretty high trust (on-ledger settlement).

That's fair that it may be an account-specific thing that doesn't really change. However, that suggests that either it should be treated as account configuration and handled by one settlement engine, because the alternative would be running two different settlement engines and having a different way of separating which account belongs to which one other than currency (right now the balances are stored under a key of the form balances:xrp so the XRP settlement engine knows that it should settle for all of the accounts listed there, whereas another settlement engine would look under balances:eth).

What do you mean?

I was thinking that the Rust code would check that the signature is valid and then offload the figuring out whether the claim is for a higher amount to Redis. The settlement engine would similarly rely on Redis to make sure the claiming of claims and updating the balances is atomic.

To name a few things:

An incoming payment channel claim cannot be credited when the channel is being closed. If the settlement engine is responsible for that, it needs to acquire a lock. Otherwise, as soon as the tx to close the channel is submitted or in the mempool, a malicious user could send very large claims and get them credited to their balance, which would be worthless since the channel would be closed momentarily later. The lock would be released after the channel is closed or the on-chain tx failed. Something like this would necessitate Redlock (e.g. if the settlement engine went down after the lock was acquired, the lock should be reset at some point).
Similarly, if the settlement engine operates a channel watcher, closing channels because they're expiring/disputed must also prevent incoming paychan claims from being credited.
In order to validate claims, Rust needs awareness of the on-ledger address, likely other metadata in the case of ETH/ERC-20s, and a connection to the ledger to refresh channel state. In particular, the logic to link a new channel to an account (to avoid crediting the same claim multiple times) is nontrivial to reason about.
By contrast, if the settlement engine alone is updating the state in Redis, and Rust is just pulling from that, Rust needs a mechanism to trigger the settlement engine to update the channel state from the network (e.g. if the claim is greater than the channel capacity, such as if the peer deposited to the channel, and we need to fetch the new channel capacity from the network to check if that's true).

I'd also emphasize that all this is based on unidirectional payment channels, and there may be other complications with bidirectional channels or settlement engines the talk to an an external payment channel manager (e.g. LND).

@adrianhopebailie's comment here convinced me that the settlement engine shouldn't be aware of the balance because it makes everything so much simpler (I rescind my proposal there!). Connector/balance middleware just says to the settlement engine, "send some money," or settlement engine says "got some money." If a settlement fails, the settlement engine keeps track of how much is owed, and tries to settle again when it sees fit. There's clear separation of concerns between settlement and packet clearing. Straddling those two things was a big misstep with the JS implementation.

Quoting from the architecture explanation:

If a settlement engine requires bilateral messaging, for example to exchange payment channel claims or updates, it is recommended to have a component written in Rust and a separate settlement engine

I understand this as a DoS prevention, but I don't think that's reasonable. Bilateral communication of settlement messages is so intertwined with settlement itself that at that point, I think it's simpler to reimplement the whole settlement engine in Rust. Just some examples of settlement-related messages that would need to reimplemented: exchanging Ethereum/XRP addresses, a request that a peer closes a channel, Lightning peering information, and Lightning invoices. In the future, I imagine there'll be a lot more messages surrounding negotiating/buying incoming channel capacity, and ERC-20 integration will require coordinating token contract addresses, to name a couple. In any case, that limitation would probably require significant refactoring of the existing plugins.

The main thing I'm trying to avoid is forwarding all packets that look like incoming claims (i.e. have the destination address peer.settle.whatever) to the settlement engine.

Can't the DoS prevention here be on an account-by-account basis? For example, if tons of settlement messages are coming in from a particular account, the Rust connector could lookup the balance for that account, and if it's not changing/increasing (e.g. the settlement engine wasn't crediting settlements from any of those packets), then it could stop forwarding packets from that account to the settlement engine.

(I'm not sure what the best implementation looks like, I'm just saying there's probably a better DoS solution than preventing the JS settlement engine from handling incoming messages directly).

That's fair that it may be an account-specific thing that doesn't really change. However, that suggests that either it should be treated as account configuration and handled by one settlement engine, because the alternative would be running two different settlement engines and having a different way of separating which account belongs to which one other than currency (right now the balances are stored under a key of the form balances:xrp so the XRP settlement engine knows that it should settle for all of the accounts listed there, whereas another settlement engine would look under balances:eth).

I think the settlement engines would need be named differently, and not solely based on the asset (and now I understand the rationale for peer.settle.<currency>, or maybe peer.settle.<settlement engine identifier>). In JS we already have at least 4 different XRP plugins, only 2 of which are compatible with one another (not to mention an XRP on-ledger plugin, if that exists), so my guess is it's inevitable we'll need to have different identifiers for them!

An incoming payment channel claim cannot be credited when the channel is being closed

This seems like it would only be an issue for Bitcoin-style payment channels where you need to close it to deposit or withdraw, no? With XRP and ETH, wouldn't you just keep the channel open? The channel would only be closed when you no longer want to have a relationship with that party and then you'd want to remove their account entirely so no more packets are forwarded on their behalf.

Similarly, if the settlement engine operates a channel watcher, closing channels because they're expiring/disputed must also prevent incoming paychan claims from being credited.

Seems easy enough to have a flag indicating whether the channel is active that could be checked before crediting the peer for the settlement.

In order to validate claims, Rust needs awareness of the on-ledger address, likely other metadata in the case of ETH/ERC-20s, and a connection to the ledger to refresh channel state.

Agreed about knowing the on-ledger address and the contract address, which is basically the asset identifier, for ERC20s. What kind of channel state are you referring to? I would imagine that the settlement engine would be watching the ledger and would update the database with any relevant state changes. The Rust code would operate completely based on what's in the DB.

I'd also emphasize that all this is based on unidirectional payment channels, and there may be other complications with bidirectional channels or settlement engines the talk to an an external payment channel manager (e.g. LND).

I'm fine designing for unidirectional payment channels. I am very skeptical about the utility of bidirectional channels, because it seems so unlikely that you would have a situation where you both have balanced enough flows to net out a meaningful amount and super low trust. It makes more sense if you're coming from a world in which on-ledger transactions are prohibitively expensive, but I think that's always going to lead to a lousy layer 2 and 3 experience, so I'd rather not design around that assumption too much.

What would the additional complication be around interacting with an external payment channel manager? I would have assumed that that would take care of a lot of the complicated logic for you.

Connector/balance middleware just says to the settlement engine, "send some money," or settlement engine says "got some money."

I think there are 3 types of logic to consider:

When forwarding packets, does the account have a high enough balance that we should send the packet?
How to update the balance based on incoming/outgoing settlements
How often it's actually worth sending money based on things like transaction fees and latency

Right now I'm leaning towards the split where the connector code has a simple check so see whether there is sufficient balance to forward a packet and either does or doesn't forward it as a result. The settlement engine would be responsible for sending outgoing settlements if our balance with a peer is going above/below some threshold.

I think it's simpler to reimplement the whole settlement engine in Rust

I thought about that and would have liked to but ran into the massive issue that there's no SDK for XRP (or many other blockchains) in Rust, I had trouble compiling and linking to the C++ library, and implementing the serialization and protocols for each blockchain is a huge project. I spent a good chunk of a day working on starting to write (or even figure out) the serialization and trying to compile rippled and link to it from Rust before giving up and deciding to build something simple in JS and creating a framework to allow a non-Rust settlement engine to work with the rest of Interledger.rs.

Just some examples of settlement-related messages that would need to reimplemented: exchanging Ethereum/XRP addresses, a request that a peer closes a channel, Lightning peering information, and Lightning invoices.

Those all seem doable to me.

if tons of settlement messages are coming in from a particular account, the Rust connector could lookup the balance for that account, and if it's not changing/increasing (e.g. the settlement engine wasn't crediting settlements from any of those packets), then it could stop forwarding packets from that account to the settlement engine.

The Rust connector is intended to be stateless so I'm not sure how it would track this

(I'm not sure what the best implementation looks like, I'm just saying there's probably a better DoS solution than preventing the JS settlement engine from handling incoming messages directly).

Since the settlement engine is responsible for watching channels I actually think it's super important that that process should never handle messages sent by an external party directly. Anything you can talk to directly can be DoSed, and the settlement engine should never ever go down. I think it would be more robust to have the settlement engine only send outgoing messages, interact with the DB, and interact with the blockchain. Incoming messages should be handled by the same infrastructure that already needs to scale to handle any potential volume of messages from external parties.

In JS we already have at least 4 different XRP plugins, only 2 of which are compatible with one another (not to mention an XRP on-ledger plugin, if that exists)

But is this a feature, inevitability, or bug? We should probably think about the bilateral settlement protocols as proper protocols that are versioned but if I'm building something new now I'd prefer to have one thing responsible for XRP. It may handle multiple versions of the bilateral protocol, on-ledger and payment channel settlement, etc -- but it's responsible for settling XRP balances.

An incoming payment channel claim cannot be credited when the channel is being closed

I'm not sure this is a problem for unidirectional channels? Incoming claims are always advancing the state in my favor, and my peer can't unilaterally close. The worst my peer can do is initiate a close at state i and then start sending me claims for state i + n. In that case, I can just use those latest claims to claim the channel during dispute. I do agree that you then need to flag the channel as 'closing' at some point and stop accepting claims before the channel watcher submits the latest claim.

An incoming payment channel claim cannot be credited when the channel is being closed

I'm not sure this is a problem for unidirectional channels? Incoming claims are always advancing the state in my favor, and my peer can't unilaterally close. The worst my peer can do is initiate a close at state i and then start sending me claims for state i + n. In that case, I can just use those latest claims to claim the channel during dispute. I do agree that you then need to flag the channel as 'closing' at some point and stop accepting claims before the channel watcher submits the latest claim.

That would be safe, yes, but that approach has a liveness requirement (e.g. if the settlement engine flags a channel as closing, decides not to close due to a fee or the transaction fails, and subsequently goes down, then the channel is deadlocked as closing). It'd require a more robust distributed lock mechanism to resolve those situations.

Closing this issue because we hashed out more of the design in https://forum.interledger.org/t/settlement-architecture/545

interledger / interledger-rs

Design for payment channel-based settlement engine #57