jsdelivr / globalping

A global network of probes to run network tests like ping, traceroute and DNS resolve
https://www.jsdelivr.com/globalping
248 stars 31 forks source link

perf: reduce data sent in probe sync #514

Closed MartinKolarik closed 5 months ago

MartinKolarik commented 5 months ago

501 overall improved the performance of the app but resulted in a higher redis load than expected. Most likely, it was because the previous pull-based mechanism only exchanged data when the instance was active (there was a request that needed the data), while the new sync runs all the time (as if there was at least one request every second on each instance) - so theoretically, the new approach was more efficient, but only in high-load scenarios.

This PR further improves the sync by storing the data in redis and only notifying other instances when there are changes (detected per-probe; similar idea previously suggested in https://github.com/jsdelivr/globalping/issues/419#issuecomment-1711557404), which reduces the size (not number, because of "keep alive") of exchanged messages.

I replaced the pub/sub communication with redis streams, which are essentially used as "pub/sub with history", meaning the sync doesn't break even if some messages get lost occasionally. The mechanism also handles evicted/flushed keys.

There was one issue with the "sync only changed probes" approach, as we have load stats that change often, unlike the rest of the probe data. We don't actually use the stats anywhere, but we might in the future, so I kept them and added a separate mechanism for them - they are exchanged directly via the stream in a very minimalistic form. If we eventually store them elsewhere, we might remove that part later.

The downside here is that the sync code got a lot more complex, but it should be worth the load reduction.