Open dduportal opened 1 month ago
The idea would be to have a persistent volume to store the GeoIP data and remove the init + side containers.
=> shared between mirrorbits instances avoid duplicating the downloads and we keep the data instead of downloading it from MaxMin on each pod restart/re-create.
Sharing a database such as this one between pods means we have to mount it as readonly in mirrorbits to avoid any write tentative.
=> It's already the case for the emptyDir but we should also set up the PV/PVC to be a ReadOnlyMany
It means we need a way to populate and update the PV data content: the GeoIP side containers should not run and be duplicated for each instance.
=> We need to run it as a separate deployment than mirrorbits but with only 1 replica and with the PV mounted in read+write. This separated deployement would take care or initializing and updating the database replacing the init and side container in our pods.
Two other challenges:
Read*Many
but either we use a Premium (which requires to provision 100G ) or a Standard (pay by requests).Proposal: Let's start with a PV in non Premium Azurefile and we'll see how it behaves. If it costs too much, then we'll have to move to premium.
Update:
=> Tested manually and worked (for populating the data). Need to validate the mirrorbits
4.x chart once installed
=> Manual test on updates.jenkins.io did work 👍 Let's roll!
Update: let's roll for updates.jenkins.io first: https://github.com/jenkins-infra/kubernetes-management/pull/5565
Update:
Caused https://github.com/jenkins-infra/helpdesk/issues/4261 due to the PVC errors:
The geoipupdate
pod was stuck in CrashLoopBack
since the yesterday's cluster upgrade https://github.com/jenkins-infra/helpdesk/issues/4161, but was also failing every 72 hours when trying to update the database.
The database files where stuck with SMB file handles in delete/concurrent writes 😡 . Visible in the Azure Storage Explored with:
and on the geoipdata
Linux container with weird errors such as cp: can't create '/usr/share/GeoIP/GeoLite2-City.mmdb': No such file or directory
The geoipdata
updater has been uninstalled as per https://github.com/jenkins-infra/helpdesk/issues/4261#issuecomment-2308234709
azcopy
) => an infra.ci job ?
Service(s)
get.jenkins.io, mirrors.jenkins.io, Other
Summary
We recently have been hit with the GeoIP MaxMind API rate limit.
Recently, @timja received alerts about this on his own account (which we realize we were using in production - fixed in #4195 ).
We continue receiving these alerts by email every day when we perform more than 2 deployments / day of mirrorbits (either get.jenkins.io or the new azure.updates.jenkins.io Update Center system.
These rate limits are blocking our production mirrorbits instances and threatens the service of outage.
Root cause is located on the "GeoIP" addition containers running on each mirrorbit pod:
InitError
state. If it succeeds then it downloads the database once, stops and then the other pod containers are starting.Error
state.Ref. https://support.maxmind.com/hc/en-us/articles/4408216129947-Download-and-Update-Databases
Reproduction steps
No response