Raiden should ask PFS for and use the provided path

LefterisJP commented 5 years ago

Problem Definition

Task

[ ] Raiden should ask the PFS for paths whenever it needs to make a transfer
[ ] It should process the PFS response and use the provided routes in the transfer
[ ] The above basically means that the routing logic needs to change from backtracking to trying the provided paths one by one.

Timeline

As discussed with Rakan, this has high priority for the PFS team and an implementation until end of January would be good.

palango commented 5 years ago

Please let me know how I can help here.

rakanalh commented 5 years ago

Raiden Routing Meeting notes:

Participants: @ulope @palango, @konradkonrad, @LefterisJP and @rakanalh

The purpose of the meeting was to discuss two topics:

Using source routing vs querying the PFS for routes at every hop.
If the changes from implementing the above are backwards-compatible or not.

Source routing vs query-at-every-hop approach

Source routing is an approach where the suggested route is attached to the message, while the alternative is to query the PFS at every hop so that a route is calculated by the PFS until target is reached.

Source Routing

Pros:

Route has to be calculated once
Route calculation fees will only be paid once

Cons:

Calculated route could become unusable during transfer (nodes going offline or out of capacity)
Cannot be enforced since the routes list are metadata of the transfer to suggest a potential route. Can be ignored by mediators.

Query PFS

Pros:

Provides an up-to-date usable route.
Relativly easier development time.

Cons:

Increased latency of the transfer given the each hop has to calculate the route to the next hop towards target.

@konradkonrad shared an opinion about fees:

I assume all fees operate on a minimal cost calculation. Now the problem with a source routing protocol (initiator pays for pfs and passes route along) is: Mediation fees set the incentives for forwarding without querying PFS, they pay for the capital cost of locking tokens/providing capacity and electricity for running the node, but they don’t set incentives for paying PFS as a Mediator. I believe, the fees should pay every node for knowing how to best route. Also, there is a second fee incentive+routing issue, that I did not bring up in the call: Doing RefundTransfers cannot pay out additional fees (afaict), but they lead to further capital cost/locked tokens. The only reason for doing a RefundTransfer can be to minimize my losses: I already payed capital cost, so refunding lowers the probability of the transfer timing out/expiring entirely. Again, that only works, if I expect the network to do optimal routing (source routing isn’t necessarily optimal over the whole path of a transfer). Worst case scenario: Initiators “find out” that the best way for getting a transfer through is “fan out”: do 5 simulaneous transfers and only reveal the secret of the first successful transfer. Therefore I believe we should align the incentives so that mediators knowledge about good routes is priced in, i.e. an implementation that does avoid any “Dead end routes” at the cost of more PFS queries…

TL;DR: Participants agree that we should implement the query-at-every-hop approach.

Backward compatibility

Since implementing this will only introduce additional state changes / events, and provided a proper upgrade mechanism is implemented for this (see #3227 and #3275), the implementation here should be backwards compatible. The change, as we see it right now, would be to replace the internal Raiden routing module with a set of events / state changes that would eventually provide a list of RouteStates for a given transfer to be used to forward the transfer.

If transport messages format needs to be changed (unlikely), then the change will become backwards-incompatible unless a "Version handshake" is implemented (unplanned).

@hackaugusto please provide us with your opinion on the notes.

konradkonrad commented 5 years ago

To add to my quote:

Cons:

Calculated route could become unusable during transfer (nodes going offline or out of capacity)

I am afraid that using the source provided route is not in the best interest of all mediators: A) A mediator will need to assess the probability of the provided route to fail/succeed anyway, because this is the probability to gain mediation fees vs having capital locked up. B) wealthy attackers can use source routes to lock/drain certain parts of the network by providing dead-end routes that touch as many routes as possible (mitigation may be possible by enforcing max lengths)

hackaugusto commented 5 years ago

Since implementing this will only introduce additional state changes / events, and provided a proper upgrade mechanism is implemented for this (see #3227 and #3275), the implementation here should be backwards compatible.

Using "query-at-every-hop approach" doesn't need any new state changes or events. The only thing needed is to change the get_best_routes function to query the PFS

hackaugusto commented 5 years ago

To add to what konrad said: I actually would expect mediators to run the bundle, which includes the PFS, and the mediator can just not fee itself for its own query

palango commented 5 years ago

Using "query-at-every-hop approach" doesn't need any new state changes or events. The only thing needed is to change the get_best_routes function to query the PFS

This is true, but the request will be a context switch and so has to be taken out of the state machine.

hackaugusto commented 5 years ago

This is true, but the request will be a context switch and so has to be taken out of the state machine.

get best routes is outside of the state machine, it's called by the raiden service while the state change is being created.

konradkonrad commented 5 years ago

I actually would expect mediators to run the bundle

I guess that depends on the throughput...

palango commented 5 years ago

@rakanalh @hackaugusto I can prepare a PR for this.

LefterisJP commented 5 years ago

As discussed today @palango I think this is a good idea for you to start on this as it will free others for the other problems we are seeing. Will assign you and if situation changes we can re-discuss.

konradkonrad commented 5 years ago

B) wealthy attackers can use source routes to lock/drain certain parts of the network by providing dead-end routes that touch as many routes as possible (mitigation may be possible by enforcing max lengths)

Again: source routing allows for amplified lock/drain attacks:

Attacker controls Alice and Zulu
Attacker creates route through the network that touches as many hops as possible from Alice->B->C..Y->Zulu, A_provided_route (say B..Y are H hops)
Alice sends X tokens with F tokens in mediation fee over A_provided_route.
Once/If locked transfer reaches Zulu, the secret will never be requested/revealed.
Now the Attacker has invested X + F locked tokens, to lock (X + F / 2) * H tokens throughout the network.

There are probably a couple of variations of this attack, some of those could be mitigated by client side offline-checks (i.e. checking for circular paths), but I hope the above illustrates why I am convinced, that you cannot trust other nodes routing information and you will always want to check for yourself for the best route from your mediation hop to the target.

palango commented 5 years ago

@heikoheiko We just discussed this during lunch again. While some of the incentivisation issues belong to the discussion of mediation fees there is this one issue with source routing that @konradkonrad laid out above in detail (after mentioning it in https://github.com/raiden-network/raiden/issues/3236#issuecomment-454853988 already).

konradkonrad commented 5 years ago

I guess there is one reasonable mitigation, that would need to be included in the PFS message (edit:) and offchain payment message spec:

source route needs to be signed by the PFS [needs to be included]
all nodes along the path that have the originating PFS in their "trusted services" whitelist [needs to be added] can trust the source provided route.

edit: this comes with a number of drawbacks in regards to potential partitioning of the network:

mediation fee schedule should reflect with/without trusted-PFS-source-route
trust set needs to be determined for clients (and needs "reputation" mechanisms in case some trusted PFS turn rogue)

heikoheiko commented 5 years ago

u will always want to check for yourself for the best route from your mediation hop to the target

don't think so. this is just one of a bag of possible attacks and malfunctions. mediating nodes will need to do some risk evaluation of transfers in general and then decide on the cost and willingness to mediate a transfer. in above case a quick "is provided path reasonably short" could be one check. note the default strategy for a healthy network is by nodes monitoring their neighbours. e.g. in above case Y would disconnect Z if it is answering pings, but not revealing secrets.

so in brief, yes there are attack vectors and we'll need to deal with them. but imho this doesn't lead to the conclusion, that all mediating nodes always need to check with a PFS if a provided route is the best.

raiden-network / raiden