Revert "Dont use arch in container image tag:"

rpardini commented 3 months ago

Revert "Dont use arch in container image tag:"

This reverts commit 69a4c8725cd0a3585e40188ac82b262984c991f4:

"The service containers we build shouldnt use arch in the tags. This way the same tag will have mulitple archs at the registry layer."
this isn't really true, as can be seen from the --platform argument to buildx; also the GHA matrix'es are split per-arch
effectively, we had random arch (but not both) in each tag, depending on build speed/order in GHA
causes Error creating 000-rngd1: fork/exec /usr/bin/runc: exec format error (and many others) depending on luck or lack thereof

jacobweinstock commented 3 months ago

Hey @rpardini. Sorry, I'm not following here. If you look at the image below of quay.io for hook-docker (for example - https://quay.io/repository/tinkerbell/hook-docker?tab=tags), the first three tags have the arch in the tag. The second three don't have the arch in the tag but have 2 linux penguins representing the architectures of amd64 and arm64. https://github.com/tinkerbell/hook/commit/69a4c8725cd0a3585e40188ac82b262984c991f4 gets us the second 3.

Where are you seeing these Error creating 000-rngd1: fork/exec /usr/bin/runc: exec format error errors? if we're not seeing them in our CI then that's an issue.

rpardini commented 3 months ago

Hey @rpardini. Sorry, I'm not following here. If you look at the image below of quay.io for hook-docker (for example - quay.io/repository/tinkerbell/hook-docker?tab=tags), the first three tags have the arch in the tag.

Not really. Focus on the first one, feb72d82, it doesn't have penguins (thus is a single-arch image, without the arch in the name). That's what's being used before this PR...

Either way, take a look at https://github.com/tinkerbell/hook/blob/main/bash/hook-lk-containers.sh#L53 -- that spells out that container's --platform is specific, and not multi arch.

This change (adding arch to tag) was deliberate, to allow us to build the amd64 and arm64 separately and on native iron for each.

I understand we might want to offer a multi-arch image at the end, but just removing the arch from the tag just causes mismatches; we'd need to add a GHA step that unifies both after both are individually built, or change the GHA matrix to use qemu to build both arches at once. But until that happens we'd be randomly failing in one case or another.

Where are you seeing these Error creating 000-rngd1: fork/exec /usr/bin/runc: exec format error errors? if we're not seeing them in our CI then that's an issue.

I see those (and many more similar ones) just by running ./build.sh run (either on arm64 or amd64); this is just linuxkit run qemu. Same happens on a physical machine.

rpardini commented 3 months ago

Where are you seeing these Error creating 000-rngd1: fork/exec /usr/bin/runc: exec format error errors? if we're not seeing them in our CI then that's an issue.

Yeah AFAIK the CI doesn't actually run/test the built Hooks. Do you know if the self-hosted x64/arm64 runners support KVM? If so, we could come up with a test harness to test in GHA: start up mock syslog & tink-grpc servers, leave them running, start Hook under qemu, and wait/timeout on a TCP connection arriving at the mock grpc server. That would prove that the kernel & linuxkit works, and that tink-worker started.

jacobweinstock commented 3 months ago

@mergify queue

mergify[bot] commented 3 months ago

queue

🛑 The pull request has been removed from the queue `default`

The queue conditions cannot be satisfied due to failing checks. You can take a look at `Queue: Embarked in merge queue` check runs for more details. In case of a failure due to a flaky test, you should first retrigger the CI. Then, re-embark the pull request into the merge queue by posting the comment `@mergifyio refresh` on the pull request.

jacobweinstock commented 3 months ago

@mergifyio rebase

mergify[bot] commented 3 months ago

rebase

☑️ Nothing to do

- [ ] `queue-position=-1` [📌 rebase requirement] - [X] `-closed` [📌 rebase requirement] - [X] `-conflict` [📌 rebase requirement] - [X] any of: - [X] `#commits-behind>0` [📌 rebase requirement] - [ ] `#commits>1` [📌 rebase requirement] - [ ] `-linear-history` [📌 rebase requirement]

tinkerbell / hook