Open michielbdejong opened 7 years ago
Actually, it seems that it's worse, from log lines like:
[api] 2017-07-06T11:19:37.880Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9. ledgerB: g.driskill. nextHop: peer.Mfqyd.usd.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
[api] 2017-07-06T11:19:37.880Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9. ledgerB: g.driskill. nextHop: peer.FusfV.usd.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
[api] 2017-07-06T11:19:37.881Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9. ledgerB: g.driskill. nextHop: peer.ZNi8B.eur.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
[api] 2017-07-06T11:19:37.881Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9. ledgerB: g.driskill. nextHop: peer.ZNi8B.usd.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
[api] 2017-07-06T11:19:37.881Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9. ledgerB: g.driskill. nextHop: peer.egZYp.usd.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
[api] 2017-07-06T11:19:37.882Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9. ledgerB: g.driskill. nextHop: peer.ehA0x.usd.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
[api] 2017-07-06T11:19:37.882Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9. ledgerB: g.driskill. nextHop: peer.xeyPH.eur.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
[api] 2017-07-06T11:19:37.883Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9. ledgerB: g.driskill. nextHop: test.ilp-kit.michielbdejong.com.connector
[api] 2017-07-06T11:19:37.883Z ilp-routing:routing-tables debug removing expired route ledgerA: peer.MGf69.usd.9. ledgerB: g.driskill. nextHop: peer.KQp9t.usd.9.P7M1USYMnrGlYIOyzQ4fh1D74HJUEKAoV_ZG1SR8xBw
It looks like it's storing numDest * numPeers^2
routes. I think only numDest
routes are necessary (would just need to pick a default currency in which the "center" of the connector is denoted.
Also, https://github.com/interledgerjs/ilp-connector/issues/364 is a separate issue.
confirmed that the broadcast task takes 500ms when peered with 3 nodes, but over a minute when peered with 7. I guess the problem escalates from the point where the broadcast task takes longer than 30 seconds.
I’m considering a refactor of the connector where it just keeps just a table, with for each finalLedger: nextLedger, exchangeRateToReferenceCurrency, maxAmount, numHops. If a new route broadcast comes in, it decides whether it's better than the rate it has, without requiring more hops, and if so, it updates this super-simple routing table with the new values for nextLedger, exchangeRateToReferenceCurrency, maxAmount, numHops.
Overall, I'm in favor of this refactor in the short term, but would note the sacrifices it makes. This change would more thoroughly cement the shortest-path-preference heuristic, which I don't think is ideal in the long term. It also assumes that a connector's own exchange rates, between the ledgers it supports, are unimportant to it in route-selection (or else they have to be checked at selection time). Finally, the curve-simplification further pushes in the direction of assuming slow-to-change FX rates, and/or relative price-insensitivity.
Relaying remarks from today's research team meeting:
When trying to connect a full mesh of 8 or 9 connectors, the burden of processing 7 incoming route updates, and producing 7 outgoing route updates, becomes more than a default-size VPS can handle per 30 seconds. I'm now doing a test with sending route updates only every 5 minutes, but it's still a root problem we probably need to address. Having 7 peers is not such a weird thing, and the connector should not use up 100% cpu when it does not even handle any payments or quote requests.
I think the routing table currently contains sourceLedger -> finalLedger routes. This means in a fully connected network with 8 nodes (so 8 destinations, and each node has 7 peers), from the point of view of one node, whenever a peer tells you that it can reach all 8 destinations, you discard the route to your own ledger, then for the remaining 7 create a route from your own ledger to there, as well as a route from each of the 6 other peer ledgers to there, so (1+6)*7=49 routes. These are stored per source ledger, and then combined per destination ledger.
Then, when it's time to broadcast routes, you just tell each peer which combined routes you have stored for them.
It's probably the combine step that's most expensive, because it requires the combination of an existing curve and a new curve, and I think it's executed 49 times for each incoming broadcast message, so that's 7*49 times per 30 seconds = 11.5 times per second.
A simpler routing table would change two things:
[0,0]
, then the curve's best point[s, d]
(highest destination amount per source amount), and then[Infinity, d]
. When combining two curves, pick the one with the shorter path, and if their path lengths are equal, pick the one with the highest ratiod / s
. Simply discard the other one. When joining two curves, simply multiply the exchange rates, sod_joined / s_joined = d_1 / s_1 * d_2 / s_2
.