suse-edge / edge-image-builder

Tool for creating and configuring a set of images to automate the deployment of Edge environments
Apache License 2.0
55 stars 26 forks source link

Possible? failure when the upstream registry is down #318

Closed e-minguez closed 7 months ago

e-minguez commented 7 months ago

After a successful execution of EIB I found my system wasn't deployed correctly as metallb wasn't working:

36m         Warning   Failed                 pod/metallb-controller-7cbd6cf47d-2qrh6    Failed to pull image "registry.opensuse.org/isv/suse/edge/metallb/images/metallb-controller:v0.13.10": rpc error: code = Unknown desc = failed to pull and unpack image "registry.opensuse.org/isv/suse/edge/metallb/images/metallb-controller:v0.13.10": failed to resolve reference "registry.opensuse.org/isv/suse/edge/metallb/images/metallb-controller:v0.13.10": failed to do request: Head "http://localhost:6545/v2/isv/suse/edge/metallb/images/metallb-controller/manifests/v0.13.10?ns=registry.opensuse.org": dial tcp [::1]:6545: connect: connection refused

Turns out, hauler wasn't serving anything:

# journalctl -u eib-embedded-registry
Mar 14 12:27:20 host1rke2 systemd[1]: Starting Load and Serve Embedded Registry...
Mar 14 12:27:21 host1rke2 hauler[2047]: 12:27PM INF loading content from [embedded-registry.tar.zst] to [store]
Mar 14 12:27:21 host1rke2 systemd[1]: Started Load and Serve Embedded Registry.
Mar 14 12:27:21 host1rke2 hauler[2091]: 12:27PM ERR Error: signed image index: stat /opt/hauler/store/index.json: no such file or directory
Mar 14 12:27:21 host1rke2 hauler[2091]: 12:27PM ERR main.go:74: error during command execution: signed image index: stat /opt/hauler/store/index.json: no such file or directory
Mar 14 12:27:21 host1rke2 hauler[2091]: Error: exit status 1
Mar 14 12:27:21 host1rke2 hauler[2091]: Usage:
Mar 14 12:27:21 host1rke2 hauler[2091]:   hauler store serve registry [flags]
Mar 14 12:27:21 host1rke2 hauler[2091]: Flags:
Mar 14 12:27:21 host1rke2 hauler[2091]:   -c, --config string      Path to a config file, will override all other configs
Mar 14 12:27:21 host1rke2 hauler[2091]:       --directory string   Directory to use for backend.  Defaults to $PWD/registry (default "registry")
Mar 14 12:27:21 host1rke2 hauler[2091]:   -h, --help               help for registry
Mar 14 12:27:21 host1rke2 hauler[2091]:   -p, --port int           Port to listen on. (default 5000)
Mar 14 12:27:21 host1rke2 hauler[2091]: Global Flags:
Mar 14 12:27:21 host1rke2 hauler[2091]:       --cache string       Location of where to store cache data (defaults to $XDG_CACHE_DIR/hauler)
Mar 14 12:27:21 host1rke2 hauler[2091]:   -l, --log-level string    (default "info")
Mar 14 12:27:21 host1rke2 hauler[2091]:   -s, --store string       Location to create store at (default "store")
Mar 14 12:27:21 host1rke2 hauler[2091]: 12:27PM ERR exit status 1
Mar 14 12:27:21 host1rke2 systemd[1]: eib-embedded-registry.service: Deactivated successfully.

The embedded-registry.tar.zst file was empty:

# ls -lh embedded-registry.tar.zst
-rw-r--r--. 1 root root 88 Mar 14 12:26 embedded-registry.tar.zst

The build was successful:

$ eib build --config-dir ${CI_PROJECT_DIR}/eib-temp --config-file eib.yaml --build-dir ${CI_PROJECT_DIR}/eib-temp/_build
SELinux is enabled in the Kubernetes configuration. The necessary RPM packages will be downloaded.
Downloading file: rancher-public.key 100% |█████| (2.4/2.4 kB, 14 MB/s)
Generating image customization components...
Identifier ................... [SUCCESS]
Custom Files ................. [SUCCESS]
Time ......................... [SKIPPED]
Network ...................... [SUCCESS]
Groups ....................... [SKIPPED]
Users ........................ [SUCCESS]
Proxy ........................ [SKIPPED]
Resolving package dependencies...
Rpm .......................... [SUCCESS]
Systemd ...................... [SUCCESS]
Elemental .................... [SKIPPED]
Suma ......................... [SKIPPED]
Embedded Artifact Registry ... [SUCCESS]
Keymap ....................... [SUCCESS]
Configuring Kubernetes component...
Downloading file: rke2-images-core.linux-amd64.tar.zst 100% || (776/776 MB, 45 MB/s)
Downloading file: rke2-images-cilium.linux-amd64.tar.zst 100% || (392/392 MB, 50 MB/s)
Downloading file: rke2-images-multus.linux-amd64.tar.zst 100% || (322/322 MB, 71 MB/s)
Downloading file: rke2.linux-amd64.tar.gz 100% |██| (34/34 MB, 81 MB/s)
Downloading file: sha256sum-amd64.txt 100% |████| (3.6/3.6 kB, 19 MB/s)
Kubernetes ................... [SUCCESS]
Certificates ................. [SKIPPED]
Building ISO image...
Kernel Params ................ [SKIPPED]
Image build complete!

I think it is maybe related to the fact that the upstream registry where the images were mirrored was down... but I'm not 100% sure about this.

I'm opening this just in case we want to investigate more.

dbw7 commented 7 months ago

Hello, this is a known issue with the old version of Hauler, the new version of Hauler has a fix for this so we just need to upgrade the package in OBS and then Hauler should fail properly when it isn't able to pull the images during the EIB build.

atanasdinov commented 7 months ago

Hauler is now upgraded to v1.0.1 which should resolve this issue.

Both OBS and IBS builds are updated and the respective EIB container images are rebuilt. This should no longer be a problem.