Work in progress: sinerider-scoring - support at minimum 20 RPM

alhardwarehyde commented 1 year ago

Description

what it says above

Screenshots

No response

Additional information

No response

grymmy commented 1 year ago

the sinerider-scoring service should, through vertical and/or horizontal scaling (or via using some sort of scalable 3rd party solution), support 20RPM

grymmy commented 1 year ago

we should look into the following:

adding a load balancer on DO
identify the vertical scale of the machine instance type we want to upgrade to (more cpus, SSD disk)
(optional) improve our deployment process if possible (dockerization/machine images/scripts if need be)
provision new instances behind load balancer
perform scoring load test / collect perf numbers

NOTE - this is one potential plan for scaling, please refer to @maxwofford 's prototype and evaluate it as a potential path forward.

JosiasAurel commented 1 year ago

Could get 20 concurrent requests locally using browserless. On DO I couldn't get pass 10 and I think it's due to hardware limits there. I think it's ffmpeg that overflows ram there

So our best bet is browserless and we could get even more.

grymmy commented 1 year ago

Assigned myself this issue alongside Josias. Progress thus far:

Provisioned 3 new DO droplets (note - these are 1VCPU, 1GB RAM instances - original scoring service was 2GB RAM)
Added a load balancer and configured it to point at the new cluster.

This afternoon I will be implementing a script to efficiently deploy to these machines as manually updating the code (like we've been doing w/ the one instance) will waste time & run us into errors.

grymmy commented 1 year ago

Update - we have now deployed the sinerider-scoring service as an App on DO. Instance sizes are 1gb RAM 1VCPU (xsmall). Using our load-testing script we observe these results using the following settings:

Settings:

10 instances in cluster
Load-testing script: python3 hailstorm.py https://sinerider-scoring-od9e5.ondigitalocean.app -d -r 5 -n 100 -t 32 (5 req/sec, 100 requests, 32 max parallel requests simultaneously, rate-limited requests are requeued)
sinerider-scoring ENV vars: RATE_LIMIT_MAX_REQUESTS=1, RATE_LIMIT_WINDOW_MS=15000, TICK_RATE=120, DRAW_MODULO=3

Results: 21.59 RPM

Results from this test (which did test accuracy as well as perf) did show a small percentage of failed (incorrect) responses. These problems are tracked by issue #534. Results from this test do not properly simulate the load expected from the bot services. At the time of this writing, the Twitter bot (nor the Reddit bot) send parallel requests to the sinerider-scoring service, and this needs to change to leverage the increased throughput in this service. This issue is tracked by #565.

hackclub / sinerider