sciencehistory / scihist_digicoll

Science History Institute Digital Collections
Other
11 stars 0 forks source link

Change to different heroku redis provider for ActiveJob support, one with much higher redis connection limit #2296

Closed jrochkind closed 1 month ago

jrochkind commented 1 year ago

We are currently using the redis provider from heroku itself, but there are VARIOUS third-party ones listed on heroku marketplace.

The heroku redis provider we are using meters "number of redis connections" as their major price point in different plans.

The plan we are using in production has an 80 connection limit, and 300MB of memory (we aren't close to using all of it), for $30/month.

but for reasons we don't entirely understand resque workers make very inefficient use of redis connections -- like 2-4 connections per worker, or even more. The 80 connection limit is keeping us from ramping up past ~30 workers (15 dynos with two workers each) even when we have a lot of temporary bulk work to do.

I'm not sure why and seems like a bug... but we don't have to worry about it if we just switch to a different third-party heroku marketplace redis provider that isn't limiting redis connections like this. There are several that will give you 256-512 connections even in low-cost plans.

We'd have to look carefully to make sure we have everything else we need, and the plan looks good.

Some possibilities:

jrochkind commented 1 year ago

At some point the number of postgres connections we have in our Heroku PG plan will become the limiting factor, instead of redis.

We still wont' be able to have more workers than pg connections available -- although hopefully they'll use a more reasonable one pg connection per worker, instead of the 2-4+ redis connections they seem to consume! Although I"m not certain of that.

Our heroku standard-0 postgres plan has a 120 connection limit.

jrochkind commented 7 months ago

Oh we won't save much money, cause we were using a cheaper redis on staging before. Close to a wash.

eddierubeiz commented 7 months ago

@jrochkind, this afternoon I had to heroku config:unset REDIS_PROVIDER_ENV (previous value was STACKHERO_REDIS_URL_TLS). See Slack for more discussion.

jrochkind commented 7 months ago

OK, we tried putting stackhero redis on staging, and leaving it there for a few days to look at it... and it was actually pretty disastrous.

Apparently the connection to redis stopped working -- or maybe never was working? Although I'm pretty sure I tested it. OR our own app logic was connecting to an incorrect connection string, even if the right one was available.

This resulted in 3.5k redis connection errors over three days, which reported to honeybadger exceeded our HB quota.

https://app.honeybadger.io/projects/58989/faults/103953014

Oh wait, the actual error is:

Redis::CannotConnectError: Connection refused - connect(2) for 127.0.0.1:6379

That is not the stackhero connection string! So it was our logic that was faulty, for some reason falling through to the default localhost that we mean to be just in dev!

https://github.com/sciencehistory/scihist_digicoll/blob/e1c1d91a2a0e7d2068e249c3176f2e06c7b7315f/lib/scihist_digicoll/env.rb#L210

OK.. I think this is actually our fault in getting confused with code --I'm not sure what branch was actually deployed on staging when these errors happened, now I'm confused if I ever even PR'd the redis config changes to allow stackhero, having trouble finding them!

If it was using REDIS_PROVIDER_ENV, then it should have been my new branch... but there was a bug in the code that made it not find a redis url and default to localhost (which it should never do in production!).

Needs more investigation. For now I am going to remove stackhero redis from staging, until we return to this!

jrochkind commented 1 month ago

Good time to switch redis, email from heroku:

Hello 👋

Running up-to-date software versions is essential for maintaining a highly-available and secure fleet of Heroku Data for Redis instances.

Your Redis database redis-triangular-75789 on scihist-digicoll-production is running a deprecated version (6.2.14) and will not be supported after 02 Dec, 2024.

We urge you to upgrade to version 7.2 as we’re disallowing provisioning of the deprecated version from 02 Oct, 2024. If no action is taken, then we’ll perform forced upgrades after 31 Oct, 2024.

We perform heavy testing and validation work on all our Redis releases, including testing the upgrade procedure, and are careful to only recommend versions that we trust.

You can upgrade your Redis database to the latest default version by using heroku redis:upgrade command. You can read more about this procedure in our Dev Center article.

...

jrochkind commented 1 month ago

OK, deploy on production.

  1. Add $19/month smallest plan with:  heroku addons:create ah-redis-stackhero:ist-m4euc0 -r production

  2. Check dashboard or use provided CLI to wait for it to become avail (~2 minutes). Verify connect url is set with heroku config:get STACKHERO_REDIS_URL_TLS

  3. Ensrue there are no jobs currently in resque queue, and….

  4. heroku config:set REDIS_PROVIDER_ENV=STACKHERO_REDIS_URL_TLS -r production

  5. Wait ~6 minutes for production deploy to take effect -- you can tell it's pointed at new redis when resque admin shows fewer queues

  6. do an ingest, make sure it works, and jobs run

  7. remove Heroku Data for Redis from app, now we're using stackhero

jrochkind commented 1 month ago

updated some wiki docs at https://sciencehistory.atlassian.net/l/cp/2vBv3GmV