osism / issues

This repository is used for bug reports that are cross-project or not bound to a specific repository (or to an unknown repository).
https://www.osism.tech
1 stars 1 forks source link

improve upstream CI mirror reliability #1111

Open artificial-intelligence opened 2 weeks ago

artificial-intelligence commented 2 weeks ago

Problem description:

We invest quite some time in openstack kolla-ansible upstream CI to deal with external mirror breakage, e.g. some mirrors have quotas, some are just unreliable etc. so we need to do retries, which is not consistently done throughout the CI codebase.

This results in CI runs failing spuriously when we hit a code path that has no retries, not enough retries, or retries just don't work, because mirrors on the internet just don't work at the time at all.

This then leads to dev time being wasted on inspecting error for CI failure, retriggering CI, possibly rewriting CI code, to increase timeouts, increase retry counters. Also CI takes of course generally longer, because timers and timeouts get introduced leading to longer feedback loops if a code change passes CI, because not all CI checks can run locally.

a complete CI run can take over 2 hours, partly a result of this.

The idea is, to mirror more packages we currently install from the internet, directly on openstack infrastructure to improve reliability and speed of installation and to improve control and security.

As a first step a list of externally mirrored packages needs to be compiled, so we can decide what we can/want to move to openstack infrastructure.

artificial-intelligence commented 2 weeks ago

List of externally mirrored packages which are not installed from a distribution mirror or pypi.org, ordered by distribution (taken from https://github.com/openstack/kolla/blob/master/kolla/template/repos.yaml) :

Debian/Ubuntu:

For completeness, here is a list of packages we install from github (https://github.com/openstack/kolla/blob/master/kolla/common/sources.py), though I haven't personally observed any issues pulling these from github:

artificial-intelligence commented 2 weeks ago

as a starting point, build failures can be analyzed via the opensearch dashboard, which can be accessed via: https://docs.openstack.org/project-team-guide/testing.html#checking-status-of-other-job-results