interledgerjs / ilp-connector

Reference implementation of an Interledger connector.
Other
138 stars 53 forks source link

Drastically simplify routing tables? #363

Open michielbdejong opened 7 years ago

michielbdejong commented 7 years ago

When trying to connect a full mesh of 8 or 9 connectors, the burden of processing 7 incoming route updates, and producing 7 outgoing route updates, becomes more than a default-size VPS can handle per 30 seconds. I'm now doing a test with sending route updates only every 5 minutes, but it's still a root problem we probably need to address. Having 7 peers is not such a weird thing, and the connector should not use up 100% cpu when it does not even handle any payments or quote requests.

I think the routing table currently contains sourceLedger -> finalLedger routes. This means in a fully connected network with 8 nodes (so 8 destinations, and each node has 7 peers), from the point of view of one node, whenever a peer tells you that it can reach all 8 destinations, you discard the route to your own ledger, then for the remaining 7 create a route from your own ledger to there, as well as a route from each of the 6 other peer ledgers to there, so (1+6)*7=49 routes. These are stored per source ledger, and then combined per destination ledger.

Then, when it's time to broadcast routes, you just tell each peer which combined routes you have stored for them.

It's probably the combine step that's most expensive, because it requires the combination of an existing curve and a new curve, and I think it's executed 49 times for each incoming broadcast message, so that's 7*49 times per 30 seconds = 11.5 times per second.

A simpler routing table would change two things:

michielbdejong commented 7 years ago

Actually, it seems that it's worse, from log lines like:

[api] 2017-07-06T11:19:37.880Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9.  ledgerB: g.driskill.  nextHop: peer.Mfqyd.usd.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
[api] 2017-07-06T11:19:37.880Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9.  ledgerB: g.driskill.  nextHop: peer.FusfV.usd.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
[api] 2017-07-06T11:19:37.881Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9.  ledgerB: g.driskill.  nextHop: peer.ZNi8B.eur.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
[api] 2017-07-06T11:19:37.881Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9.  ledgerB: g.driskill.  nextHop: peer.ZNi8B.usd.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
[api] 2017-07-06T11:19:37.881Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9.  ledgerB: g.driskill.  nextHop: peer.egZYp.usd.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
[api] 2017-07-06T11:19:37.882Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9.  ledgerB: g.driskill.  nextHop: peer.ehA0x.usd.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
[api] 2017-07-06T11:19:37.882Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9.  ledgerB: g.driskill.  nextHop: peer.xeyPH.eur.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
[api] 2017-07-06T11:19:37.883Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9.  ledgerB: g.driskill.  nextHop: test.ilp-kit.michielbdejong.com.connector
[api] 2017-07-06T11:19:37.883Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9.  ledgerB: g.driskill.  nextHop: peer.KQp9t.usd.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw

It looks like it's storing numDest * numPeers^2 routes. I think only numDest routes are necessary (would just need to pick a default currency in which the "center" of the connector is denoted.

Also, https://github.com/interledgerjs/ilp-connector/issues/364 is a separate issue.

michielbdejong commented 7 years ago

confirmed that the broadcast task takes 500ms when peered with 3 nodes, but over a minute when peered with 7. I guess the problem escalates from the point where the broadcast task takes longer than 30 seconds.

michielbdejong commented 7 years ago

I’m considering a refactor of the connector where it just keeps just a table, with for each finalLedger: nextLedger, exchangeRateToReferenceCurrency, maxAmount, numHops. If a new route broadcast comes in, it decides whether it's better than the rate it has, without requiring more hops, and if so, it updates this super-simple routing table with the new values for nextLedger, exchangeRateToReferenceCurrency, maxAmount, numHops.

momerath42 commented 7 years ago

Overall, I'm in favor of this refactor in the short term, but would note the sacrifices it makes. This change would more thoroughly cement the shortest-path-preference heuristic, which I don't think is ideal in the long term. It also assumes that a connector's own exchange rates, between the ledgers it supports, are unimportant to it in route-selection (or else they have to be checked at selection time). Finally, the curve-simplification further pushes in the direction of assuming slow-to-change FX rates, and/or relative price-insensitivity.

michielbdejong commented 7 years ago

Relaying remarks from today's research team meeting: