improve upstream CI mirror reliability

artificial-intelligence commented 2 months ago

Problem description:

We invest quite some time in openstack kolla-ansible upstream CI to deal with external mirror breakage, e.g. some mirrors have quotas, some are just unreliable etc. so we need to do retries, which is not consistently done throughout the CI codebase.

This results in CI runs failing spuriously when we hit a code path that has no retries, not enough retries, or retries just don't work, because mirrors on the internet just don't work at the time at all.

This then leads to dev time being wasted on inspecting error for CI failure, retriggering CI, possibly rewriting CI code, to increase timeouts, increase retry counters. Also CI takes of course generally longer, because timers and timeouts get introduced leading to longer feedback loops if a code change passes CI, because not all CI checks can run locally.

a complete CI run can take over 2 hours, partly a result of this.

The idea is, to mirror more packages we currently install from the internet, directly on openstack infrastructure to improve reliability and speed of installation and to improve control and security.

As a first step a list of externally mirrored packages needs to be compiled, so we can decide what we can/want to move to openstack infrastructure.

artificial-intelligence commented 2 months ago

List of externally mirrored packages which are not installed from a distribution mirror or pypi.org, ordered by distribution (taken from https://github.com/openstack/kolla/blob/master/kolla/template/repos.yaml) :

Debian/Ubuntu:

erlang: https://ppa.launchpadcontent.net/rabbitmq/rabbitmq-erlang/ubuntu
fluentd: https://packages.treasuredata.com/lts/5/debian/bookworm
grafana: https://apt.grafana.com
influxdb: https://repos.influxdata.com/ubuntu
mariadb: https://dlm.mariadb.com/repo/mariadb-server/10.11/repo/debian
opensearch: https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt/
opensearch-dashboards: https://artifacts.opensearch.org/releases/bundle/opensearch-dashboards/2.x/apt/
proxysql: https://repo.proxysql.com/ProxySQL/proxysql-2.6.x/bookworm/
rabbitmq: https://ppa1.novemberain.com/rabbitmq/rabbitmq-server/deb/debian

For completeness, here is a list of packages we install from github (https://github.com/openstack/kolla/blob/master/kolla/common/sources.py), though I haven't personally observed any issues pulling these from github:

etcd: https://github.com/etcd-io/etcd
gnocchi-base: https://github.com/gnocchixyz/gnocchi
letsencrypt-lego: https://github.com/go-acme/lego
prometheus-alertmanager: https://github.com/prometheus/alertmanager
prometheus-blackbox-exporter: https://github.com/prometheus/blackbox_exporter
prometheus-cadvisor: https://github.com/google/cadvisor
prometheus-elasticsearch-exporter: https://github.com/prometheus-community/elasticsearch_exporter
prometheus-libvirt-exporter: https://github.com/inovex/prometheus-libvirt-exporter
prometheus-memcached-exporter: https://github.com/prometheus/memcached_exporter
prometheus-msteams: https://github.com/prometheus-msteams/prometheus-msteams
prometheus-mtail: https://github.com/google/mtail
prometheus-mysqld-exporter: https://github.com/prometheus/mysqld_exporter
prometheus-node-exporter: https://github.com/prometheus/node_exporter
prometheus-openstack-exporter: https://github.com/openstack-exporter/openstack-exporter
prometheus-ovn-exporter: https://github.com/greenpau/ovn_exporter
prometheus-v2-server: https://github.com/prometheus/prometheus

artificial-intelligence commented 2 months ago

as a starting point, build failures can be analyzed via the opensearch dashboard, which can be accessed via: https://docs.openstack.org/project-team-guide/testing.html#checking-status-of-other-job-results

osism / issues

improve upstream CI mirror reliability #1111