turt2live / matrix-monitor-bot

A bot to measure latency between homeservers, as perceived by users.
Apache License 2.0
32 stars 5 forks source link

Test scaling capabilities (and fix it) #19

Open turt2live opened 6 years ago

turt2live commented 6 years ago

There hasn't been any kind of major load testing done yet, however there's already growing pains with 5 servers in the room. Need documented performance characteristics for quite a few scenarios:

It's worth noting that HQ only has ~1000 servers, however there's some need to ensure this can scale well with overnight success.

Other scenarios to test:

turt2live commented 6 years ago

This is already fairly concerning with just 6 servers. Here's the number of messages sent (as seen by t2bot.io) by particular senders in the last 24 hours:

             sender                | count
-----------------------------------+-------
 @<redacted>                       | 14179
 @monitor:t2bot.io                 |  7080
 @monitor:t2l.io                   |  6941
 @monitor:maunium.net              |  5696
 @matrix-monitor-bot:kamax.io      |  5696
 @monitor:homeserver.today         |  5695
 @monitor:poddery.com              |  5679
 @<redacted>                       |  2744
 @monitor:dev.t2bot.io             |  1386
 @iot-weather:t2bot.io             |  1357
 @<redacted>                       |   988
 @<redacted>                       |   812
 @<redacted>                       |   658
 @<redacted>                       |   522
 @<redacted>                       |   486

6000 events is quite a lot. Even at 5 minute intervals we're sending 288 messages, but can expect an additional 1728 pongs for 6 servers, totaling 2016 events a day.

The iot user isn't redacted and used for perspective. It sends a weather event every minute (mostly). Most humans send ~100 messages a day.

turt2live commented 6 years ago

note: a lot of these metrics were counted before the improvement of how the bot tracks time. The number of events has been drastically reduced (although still in the hundreds), however this scalability testing is interesting beyond the monitoring bot.

The data collected from these tests (logs, prometheus dumps, etc) will be made public.