mitodl / ol-infrastructure

Infrastructure automation code for use by MIT Open Learning
BSD 3-Clause "New" or "Revised" License
42 stars 4 forks source link

Use Redis serverless for Open edX deployments #2356

Open blarghmatey opened 2 months ago

blarghmatey commented 2 months ago

Description/Context

The combined load of caching and Celery tasks for Open edX systems periodically exhausts the configured Redis cluster in Elasticache. Elasticache now offers a serverless deployment of Redis that removes the upper and lower limit of capacity, removing the need to statically allocate the maximum needed cluster. The serverless offering only supports Redis 7.1 and higher, which is supported based on the default image used in Tutor.

Plan/Design

blarghmatey commented 2 months ago

Relevant discussion in Open edX forum - https://discuss.openedx.org/t/redis-memory-max-memory-page-load-times-and-useability-suffer-dramatically/12782/16

blarghmatey commented 2 months ago

The key eviction policy in Redis has been updated to use LRU on all keys, not just keys that have a TTL set. https://github.com/mitodl/ol-infrastructure/commit/e5d23ad8d1702b92113be205968672de08667971

Ardiea commented 1 month ago

I’m looking at the redis serverless, just from a high level to see if it makes sense. I’m not sure it does in all instances. I can’t find a good reference on ‘ECPUs’ and how to estimate it. I suspect the best way to get a feel for it it is just just do it in a CI environment and see how far off my naive estimate is vs reality (in either direction up/down).

Outstanding question: When the new lru-allitems config makes it to production (I don’t believe it is there yet), will the ~15GB -> something more reasonable? I suspect 15GB represents near max usage on the node just because it doesn’t evict or expire anything until it absolutely needs to.

edxapp-redis-mitx-ci

Current Env:

Node Costs:
cache.t3.small x 3 = 25.30 * 3 = $75.90/m

Data: ~ 40MB
Total Network Traffic (In + Out): 29GB

Serverless:
Storage: 1GB - min -> $90.00/m

ECPUs:
Naive Calc based on network traffic alone: 29,000,000,000 / 1,000,000 * 0.0034 = $98.6/m

Serverless Costs: ~$188/m

edxapp-redis-mitx-production
Current Env:

Node Costs: 
cache.r7g.4xlarge x 3 = 1273.85 x 3 = $3821.55/month

Data: ~ 15GB
Total Network Traffic (In + Out): 639GB

Serverless:
Storage ~15-16GB * 0.125GB/h = between $1350.00 - $1440.00/m

ECPUs:
Naive Calc based on network traffic alone: 639,000,000,000 / 1,000,000 * 0.0034 = $2172.6/m

Serverless Costs: $3522.60 - $3612.60/m