weaveworks / mesh

A tool for building distributed applications.
Apache License 2.0
879 stars 107 forks source link

in fullly connected mesh topolgy, topology update gossip's can get chatty #116

Closed murali-reddy closed 4 years ago

murali-reddy commented 5 years ago

On each connection add/delete/established event from a peer mesh router broadcasts topology updates to the peers. In fully connected topology broadcast would be to the all nodes in mesh.

A received topolgy gossip is further relayed to the peers if its a new update. While this should not be a concern in a stable topology it can be problematic in some use-cases.

For e.g.

Considering #114, #115 which resuts in high cpu usage, combination chatty topology gossip results in cascading effect.

As number of peers in the mesh increases it significantly impacts scalability.

Following metrics were gathered with instrumented mesh on 150 node kubernetes cluster running weave-net using mesh. rx gossip broadcast are received topology gossip per second.

===================================================================
2019-09-17 7:22:0 Peers.garbageCollect(): 365
2019-09-17 7:22:0 routes.calculate()         -> routes.calculateBroadcast(): 59
2019-09-17 7:22:0 routes.lookupOrCalculate() -> routes.calculateBroadcast(): 335
2019-09-17 7:22:0 routes.calculateUnicast(): 119
2019-09-17 7:22:0 connectionMaker.refresh(): 63
2019-09-17 7:22:0 rx gossip unicast: 0
2019-09-17 7:22:0 rx gossip broadcast: 325
2019-09-17 7:22:0 gossip broadcast - relay broadcasts: 345
2019-09-17 7:22:0 gossip broadcast - topology updates: 1
===================================================================
2019-09-17 7:22:1 Peers.garbageCollect(): 347
2019-09-17 7:22:1 routes.calculate()         -> routes.calculateBroadcast(): 68
2019-09-17 7:22:1 routes.lookupOrCalculate() -> routes.calculateBroadcast(): 328
2019-09-17 7:22:1 routes.calculateUnicast(): 135
2019-09-17 7:22:1 connectionMaker.refresh(): 70
2019-09-17 7:22:1 rx gossip unicast: 0
2019-09-17 7:22:1 rx gossip broadcast: 316
2019-09-17 7:22:1 gossip broadcast - relay broadcasts: 324
2019-09-17 7:22:1 gossip broadcast - topology updates: 0
===================================================================
2019-09-17 7:22:2 Peers.garbageCollect(): 369
2019-09-17 7:22:2 routes.calculate()         -> routes.calculateBroadcast(): 61
2019-09-17 7:22:2 routes.lookupOrCalculate() -> routes.calculateBroadcast(): 313
2019-09-17 7:22:2 routes.calculateUnicast(): 124
2019-09-17 7:22:2 connectionMaker.refresh(): 64
2019-09-17 7:22:2 rx gossip unicast: 0
2019-09-17 7:22:2 rx gossip broadcast: 315
2019-09-17 7:22:2 gossip broadcast - relay broadcasts: 343
2019-09-17 7:22:2 gossip broadcast - topology updates: 0
===================================================================
2019-09-17 7:22:3 Peers.garbageCollect(): 336
2019-09-17 7:22:3 routes.calculate()         -> routes.calculateBroadcast(): 75
2019-09-17 7:22:3 routes.lookupOrCalculate() -> routes.calculateBroadcast(): 327
2019-09-17 7:22:3 routes.calculateUnicast(): 148
2019-09-17 7:22:3 connectionMaker.refresh(): 75
2019-09-17 7:22:3 rx gossip unicast: 0
2019-09-17 7:22:3 rx gossip broadcast: 322
2019-09-17 7:22:3 gossip broadcast - relay broadcasts: 326
2019-09-17 7:22:3 gossip broadcast - topology updates: 1
===================================================================
2019-09-17 7:22:4 Peers.garbageCollect(): 353
2019-09-17 7:22:4 routes.calculate()         -> routes.calculateBroadcast(): 69
2019-09-17 7:22:4 routes.lookupOrCalculate() -> routes.calculateBroadcast(): 344
2019-09-17 7:22:4 routes.calculateUnicast(): 138
2019-09-17 7:22:4 connectionMaker.refresh(): 71
2019-09-17 7:22:4 rx gossip unicast: 0
2019-09-17 7:22:4 rx gossip broadcast: 339
2019-09-17 7:22:4 gossip broadcast - relay broadcasts: 337
2019-09-17 7:22:4 gossip broadcast - topology updates: 1
===================================================================
2019-09-17 7:22:5 Peers.garbageCollect(): 323
2019-09-17 7:22:5 routes.calculate()         -> routes.calculateBroadcast(): 68
2019-09-17 7:22:5 routes.lookupOrCalculate() -> routes.calculateBroadcast(): 330
2019-09-17 7:22:5 routes.calculateUnicast(): 136
2019-09-17 7:22:5 connectionMaker.refresh(): 70
2019-09-17 7:22:5 rx gossip unicast: 0
2019-09-17 7:22:5 rx gossip broadcast: 328
2019-09-17 7:22:5 gossip broadcast - relay broadcasts: 311
2019-09-17 7:22:5 gossip broadcast - topology updates: 3
===================================================================
2019-09-17 7:22:6 Peers.garbageCollect(): 340
2019-09-17 7:22:6 routes.calculate()         -> routes.calculateBroadcast(): 78
2019-09-17 7:22:6 routes.lookupOrCalculate() -> routes.calculateBroadcast(): 320
2019-09-17 7:22:6 routes.calculateUnicast(): 156
2019-09-17 7:22:6 connectionMaker.refresh(): 82
2019-09-17 7:22:6 rx gossip unicast: 0
2019-09-17 7:22:6 rx gossip broadcast: 321
2019-09-17 7:22:6 gossip broadcast - relay broadcasts: 322
2019-09-17 7:22:6 gossip broadcast - topology updates: 0
===================================================================
2019-09-17 7:22:7 Peers.garbageCollect(): 321
2019-09-17 7:22:7 routes.calculate()         -> routes.calculateBroadcast(): 85
2019-09-17 7:22:7 routes.lookupOrCalculate() -> routes.calculateBroadcast(): 300
2019-09-17 7:22:7 routes.calculateUnicast(): 172
2019-09-17 7:22:7 connectionMaker.refresh(): 90
2019-09-17 7:22:7 rx gossip unicast: 0
2019-09-17 7:22:7 rx gossip broadcast: 296
2019-09-17 7:22:7 gossip broadcast - relay broadcasts: 309
2019-09-17 7:22:7 gossip broadcast - topology updates: 0
===================================================================
2019-09-17 7:22:8 Peers.garbageCollect(): 313
2019-09-17 7:22:8 routes.calculate()         -> routes.calculateBroadcast(): 81
2019-09-17 7:22:8 routes.lookupOrCalculate() -> routes.calculateBroadcast(): 308
2019-09-17 7:22:8 routes.calculateUnicast(): 161
2019-09-17 7:22:8 connectionMaker.refresh(): 85
2019-09-17 7:22:8 rx gossip unicast: 0
2019-09-17 7:22:8 rx gossip broadcast: 309
2019-09-17 7:22:8 gossip broadcast - relay broadcasts: 291
2019-09-17 7:22:8 gossip broadcast - topology updates: 1
===================================================================
2019-09-17 7:22:9 Peers.garbageCollect(): 316
2019-09-17 7:22:9 routes.calculate()         -> routes.calculateBroadcast(): 84
2019-09-17 7:22:9 routes.lookupOrCalculate() -> routes.calculateBroadcast(): 307
2019-09-17 7:22:9 routes.calculateUnicast(): 167
2019-09-17 7:22:9 connectionMaker.refresh(): 88
2019-09-17 7:22:9 rx gossip unicast: 0
2019-09-17 7:22:9 rx gossip broadcast: 302
2019-09-17 7:22:9 gossip broadcast - relay broadcasts: 306
2019-09-17 7:22:9 gossip broadcast - topology updates: 0
===================================================================
2019-09-17 7:22:10 Peers.garbageCollect(): 312
2019-09-17 7:22:10 routes.calculate()         -> routes.calculateBroadcast(): 83
2019-09-17 7:22:10 routes.lookupOrCalculate() -> routes.calculateBroadcast(): 278
2019-09-17 7:22:10 routes.calculateUnicast(): 166
2019-09-17 7:22:10 connectionMaker.refresh(): 85
2019-09-17 7:22:10 rx gossip unicast: 0
2019-09-17 7:22:10 rx gossip broadcast: 275
2019-09-17 7:22:10 gossip broadcast - relay broadcasts: 300
2019-09-17 7:22:10 gossip broadcast - topology updates: 2
===================================================================