jenkins-infra / helpdesk

Open your Infrastructure related issues here for the Jenkins project
https://github.com/jenkins-infra/helpdesk/issues/new/choose
16 stars 10 forks source link

[get.jenkins.io, azure.updates.jenkins.io] MaxMind GeoIP Rate Limit hit when redeploying/upgrading `mirrorbits` chart #4240

Open dduportal opened 1 month ago

dduportal commented 1 month ago

Service(s)

get.jenkins.io, mirrors.jenkins.io, Other

Summary

We recently have been hit with the GeoIP MaxMind API rate limit.

Recently, @timja received alerts about this on his own account (which we realize we were using in production - fixed in #4195 ).

We continue receiving these alerts by email every day when we perform more than 2 deployments / day of mirrorbits (either get.jenkins.io or the new azure.updates.jenkins.io Update Center system.

These rate limits are blocking our production mirrorbits instances and threatens the service of outage.


Root cause is located on the "GeoIP" addition containers running on each mirrorbit pod:

Ref. https://support.maxmind.com/hc/en-us/articles/4408216129947-Download-and-Update-Databases

Reproduction steps

No response

dduportal commented 1 month ago

The idea would be to have a persistent volume to store the GeoIP data and remove the init + side containers.

=> shared between mirrorbits instances avoid duplicating the downloads and we keep the data instead of downloading it from MaxMin on each pod restart/re-create.

Sharing a database such as this one between pods means we have to mount it as readonly in mirrorbits to avoid any write tentative.

=> It's already the case for the emptyDir but we should also set up the PV/PVC to be a ReadOnlyMany

It means we need a way to populate and update the PV data content: the GeoIP side containers should not run and be duplicated for each instance.

=> We need to run it as a separate deployment than mirrorbits but with only 1 replica and with the PV mounted in read+write. This separated deployement would take care or initializing and updating the database replacing the init and side container in our pods.

dduportal commented 1 month ago

Two other challenges:

dduportal commented 1 month ago

Proposal: Let's start with a PV in non Premium Azurefile and we'll see how it behaves. If it costs too much, then we'll have to move to premium.

dduportal commented 1 month ago

Update:

=> Tested manually and worked (for populating the data). Need to validate the mirrorbits 4.x chart once installed

dduportal commented 1 month ago

=> Manual test on updates.jenkins.io did work 👍 Let's roll!

dduportal commented 1 month ago

Update: let's roll for updates.jenkins.io first: https://github.com/jenkins-infra/kubernetes-management/pull/5565

dduportal commented 1 month ago

Update:

dduportal commented 4 weeks ago

Caused https://github.com/jenkins-infra/helpdesk/issues/4261 due to the PVC errors:

The geoipupdate pod was stuck in CrashLoopBack since the yesterday's cluster upgrade https://github.com/jenkins-infra/helpdesk/issues/4161, but was also failing every 72 hours when trying to update the database.

The database files where stuck with SMB file handles in delete/concurrent writes 😡 . Visible in the Azure Storage Explored with:

Capture d’écran 2024-08-24 à 10 30 17 Capture d’écran 2024-08-24 à 10 35 21

and on the geoipdata Linux container with weird errors such as cp: can't create '/usr/share/GeoIP/GeoLite2-City.mmdb': No such file or directory

dduportal commented 4 weeks ago

The geoipdata updater has been uninstalled as per https://github.com/jenkins-infra/helpdesk/issues/4261#issuecomment-2308234709