Closed rkeene closed 1 year ago
Lee looked into this at a cursory level and indicated it may not be needed, let's confirm and close if no change is needed.
I think we should first see if other bandwidth reductions result in less packet loss. I don't have exact statistics, but I think packet loss on the network is relatively high right now, which would make reducing fanout too dangerous.
One way to capture this statistic could be to count the number of duplicate publish messages for a block compared to the fanout. If there’s no packet loss on average they’d be the same.
I’m curious what the plans are for this. I’ve been experimenting with capturing vote traffic from the websocket to see the delay across different reps and noticed that for most reps I get 4-6+ messages per rep for each block hash from the websocket.
If there are 300,000 blocks sent and 20 reps voting that’s 6M votes. Then if the node receives rebroadcasts from 3 other reps for the same vote that turns into 24M. If it’s on the higher side of 5 rebroadcasts it’s 48M etc. That’s a lot of extra data to process.
My assumption is each of the duplicates is from rebroadcasting votes, but curious to know if they could be from other actions like blocks being rebroadcast and causing another round of votes or something else etc.
I think the amount of extra traffic could be a significant impact on processing blocks and votes. With the change to TCP it seems like packet loss should be significantly less and therefore rebroadcasting/fanout could be reduced some.
I don’t know the best structure to make it more efficient but one thought could be to setup a form of mesh, ring, star hybrid network to optimize how traffic is distributed. I need to spend more time looking at the code to understand how the current process works for block and vote propagation, so if there any flow charts or workflows that exists that would be helpful.
Are there plans in the works or areas of data capture that would be helpful to evaluate this further?
Should this issue be closed, due to age, it being partially addressed in V21, and because there are a few more recent efforts in the same vein?:
Adjusting block rebroadcasting on arrival: https://github.com/nanocurrency/nano-node/issues/3419
Continuous backlog population: https://github.com/nanocurrency/nano-node/pull/3999
Adjusting vote broadcast intervals: https://github.com/nanocurrency/nano-node/pull/4010
Hinted elections redesign: https://github.com/nanocurrency/nano-node/pull/3944
Elections-Up branch: https://github.com/pwojcikdev/nano-node/tree/elections-up
CC: @dsiganos
Investigate whether the real-time network can be optimized by reducing fanout from a node when distributing messages. Currently it may be sqrt(nodeCount) or 40+, it's not clear. Validate that it's something reasonable and reduce it if it's not.