jenkins-infra / helpdesk

Open your Infrastructure related issues here for the Jenkins project
https://github.com/jenkins-infra/helpdesk/issues/new/choose
17 stars 10 forks source link

AKS: add cluster `publick8s` and migrate `prodpublick8s` public services on it #3351

Closed dduportal closed 1 year ago

dduportal commented 1 year ago

This issue tracks the work for spawning a new "public" AKS cluster for production to replace the former prodpublick8s.

Goals:

This issue is the "twin" of https://github.com/jenkins-infra/helpdesk/issues/2844 but for public network.


Some notes:

lemeurherve commented 1 year ago

With this migration we'll be able to close https://github.com/jenkins-infra/helpdesk/issues/3209

dduportal commented 1 year ago

Putting on hold: #2844 tracks the migration of release.ci from prodpublick8s to privatek8s, before proceeding forward here.

lemeurherve commented 1 year ago

weekly.jenkins.io migrated:

We noticed that while the LDAP wasn't accessible for Jenkins, it doesn't rendered the HTML set as welcome message.

dduportal commented 1 year ago

Next steps: migrating the services relying on a postgreSQL database.

To avoid any propagation of the network overlap, we need a flexible Postgres instance that the public-vnet network can access.

A new instance has to be created:

Alas, Azure Flexible servers do not support IPv6 virtual nets, so we'll have to find another solution. Current scenario is to create a dedicated virtual net, IPv4 only, and study the methods to access it privately.

dduportal commented 1 year ago

As per https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-networking:

=> as we want a managed database with no public endpoint, it means we need a dedicated vnet + (delegated) subnet for the new Postgres flexbible instance. Only unknown is the peering behavior with IPv6 / IPv4.

dduportal commented 1 year ago
dduportal commented 1 year ago

Keycloak is migrated with success to the new cluster:

Removed the former ingress to confirm and looks good for both @smerle33 and I with the private VPN connected

Next step: gotta remove the migrated resources from prodpublick8s to be really sure that nothing runs on the former cluster (starting with keycloak only as it would only iumpact infra team: gotta wait for annoucement for public services)

dduportal commented 1 year ago

Update:

Keycloak cleanup:

Removed the jenkins-weekly, javadoc, jenkinsisthewayio-redirector and wiki namespaces from `prodpublick8s (already migrated)

Removed the (empty) namespace archives from prodpublick8s

Next candidates: Plugin Helath Score and rating

dduportal commented 1 year ago

Migration of plugin-health score done, in mob-programming with the help of @smerle33 and @alecharp

✅ 🚀

dduportal commented 1 year ago

Update:

Capture d’écran 2023-05-26 à 17 49 31

Capture d’écran 2023-05-26 à 17 50 01

dduportal commented 1 year ago

Migration of the Incremental Publisher service (https://github.com/jenkins-infra/iep/tree/master/iep-009):

dduportal commented 1 year ago

Migration of the Rating service

dduportal commented 1 year ago

Migration of the Uplink service

Main challenge is the database size: 88% storage used of the available 85 Gb for the current PostgreSQL Single Server:

dduportal commented 1 year ago

Update:

dduportal commented 1 year ago

Update "Uplink":

on the (new.)ci.jenkins.io:

dduportal commented 1 year ago

Update in uplink:

dduportal commented 1 year ago

Update with uplink: as the "old" PostgreSQL is reachabale from the new cluster and has too much data, I've opened https://github.com/jenkins-infra/helpdesk/issues/3609 for the database migration.

Also, it seems that the "old" database is a Postgres 10 instance while our public-db is version 13.

Gotta migrate uplink today, keeping the same database as before to avoid too many changes.

dduportal commented 1 year ago

(updated) Plan for migrating Uplink:

lemeurherve commented 1 year ago

Migration of uplink.jenkins.io completed, no service interruption.

lemeurherve commented 1 year ago

Migration of the Reports service:

lemeurherve commented 1 year ago

Migration of https://reports.jenkins.io completed, no service interruption.

lemeurherve commented 1 year ago

Migration of accountapp service:

lemeurherve commented 1 year ago

Migration of https://accounts.jenkins.io completed, no service interruption.

lemeurherve commented 1 year ago

Migration of the LDAP service:

dduportal commented 1 year ago

Update on the LDAP: https://github.com/jenkins-infra/azure/pull/385#issuecomment-1577251779

=> there are missing elements to allow managing storage accounts in some of the publick8s networks

lemeurherve commented 1 year ago

All preliminary steps for the LDAP migration are completed, I'll proceed to the switch tomorrow.

lemeurherve commented 1 year ago

Although the intended redirections from accounts.jenkins.io & accounts.jenkins-ci.org to status.jenkins.io didn’t work as expected (SAN cert issue?), the LDAP migration has been successfully completed, no service interruption.

lemeurherve commented 1 year ago

Migration of mirrorbits (https://get.jenkins.io, https://mirrors.jenkins.io, https://mirrors.jenkins-ci.org, https://fallback.get.jenkins.io) service:

lemeurherve commented 1 year ago

Migration of plugin-site-issues service:

lemeurherve commented 1 year ago

Migration of mirrrorbits and plugin-site-issues completed, no service interruption.

As a precaution, we'll delete the mirrorbits namespace from prodpublick8s tomorrow or Wednesday.

lemeurherve commented 1 year ago

Migration of plugin-site service:

lemeurherve commented 1 year ago

Migration of jenkins.io service:

lemeurherve commented 1 year ago

plugin-site migration completed, no service interruption.

lemeurherve commented 1 year ago

jenkins.io migration completed, no service interruption.

lemeurherve commented 1 year ago

Cleanup of unused DNS records found while working on this issue

A records pointing to 52.167.253.43 (prodpublick8s public IP)

CNAME records pointing to publick.aks.jenkins.io (ie prodpublick8s cluster):

A records pointing to 10.0.2.5 (prodpublick8s private IP)

CNAME records pointing to private.aks.jenkins.io (ie prodpublick8s cluster):

Miscellaneous

*: need additional cleanup in:

lemeurherve commented 1 year ago

As we've noticed quite a lot of remaining requests still send to mirrorbits on prodpublick8s, we'll postpone the cluster deletion to next week, and @dduportal will see for the publication of a blogpost indicating the migration of this service to the new cluster.

dduportal commented 1 year ago

Namespaces removal: @lemeurherve and I paired and removed the following namespaces from prodpublick8s:

Remaining namespaces are required until https://github.com/jenkins-infra/helpdesk/issues/3351#issuecomment-1591672053 is fixed.

dduportal commented 1 year ago

As we've noticed quite a lot of remaining requests still send to mirrorbits on prodpublick8s, we'll postpone the cluster deletion to next week, and @dduportal will see for the publication of a blogpost indicating the migration of this service to the new cluster.

As discussed with the last infrastructure meeting:

dduportal commented 1 year ago

Update:

lemeurherve commented 1 year ago

Additional monitors added: https://github.com/jenkins-infra/datadog/pull/195

Potential improvements for later:

After 3 years and 27 days of good and faithful service, prodpublick8s is not anymore, closing this issue 🤗