tinkerbell / hook

In-memory Operating System Installation Environment for Executing Tinkerbell Workflows
Apache License 2.0
101 stars 48 forks source link

Revert "Dont use arch in container image tag:" #219

Closed rpardini closed 3 months ago

rpardini commented 3 months ago

Revert "Dont use arch in container image tag:"

jacobweinstock commented 3 months ago

Hey @rpardini. Sorry, I'm not following here. If you look at the image below of quay.io for hook-docker (for example - https://quay.io/repository/tinkerbell/hook-docker?tab=tags), the first three tags have the arch in the tag. The second three don't have the arch in the tag but have 2 linux penguins representing the architectures of amd64 and arm64. https://github.com/tinkerbell/hook/commit/69a4c8725cd0a3585e40188ac82b262984c991f4 gets us the second 3.

image

Where are you seeing these Error creating 000-rngd1: fork/exec /usr/bin/runc: exec format error errors? if we're not seeing them in our CI then that's an issue.

rpardini commented 3 months ago

Hey @rpardini. Sorry, I'm not following here. If you look at the image below of quay.io for hook-docker (for example - quay.io/repository/tinkerbell/hook-docker?tab=tags), the first three tags have the arch in the tag.

Not really. Focus on the first one, feb72d82, it doesn't have penguins (thus is a single-arch image, without the arch in the name). That's what's being used before this PR...

Either way, take a look at https://github.com/tinkerbell/hook/blob/main/bash/hook-lk-containers.sh#L53 -- that spells out that container's --platform is specific, and not multi arch.

This change (adding arch to tag) was deliberate, to allow us to build the amd64 and arm64 separately and on native iron for each.

I understand we might want to offer a multi-arch image at the end, but just removing the arch from the tag just causes mismatches; we'd need to add a GHA step that unifies both after both are individually built, or change the GHA matrix to use qemu to build both arches at once. But until that happens we'd be randomly failing in one case or another.

Where are you seeing these Error creating 000-rngd1: fork/exec /usr/bin/runc: exec format error errors? if we're not seeing them in our CI then that's an issue.

I see those (and many more similar ones) just by running ./build.sh run (either on arm64 or amd64); this is just linuxkit run qemu. Same happens on a physical machine.

rpardini commented 3 months ago

Where are you seeing these Error creating 000-rngd1: fork/exec /usr/bin/runc: exec format error errors? if we're not seeing them in our CI then that's an issue.

Yeah AFAIK the CI doesn't actually run/test the built Hooks. Do you know if the self-hosted x64/arm64 runners support KVM? If so, we could come up with a test harness to test in GHA: start up mock syslog & tink-grpc servers, leave them running, start Hook under qemu, and wait/timeout on a TCP connection arriving at the mock grpc server. That would prove that the kernel & linuxkit works, and that tink-worker started.

jacobweinstock commented 3 months ago

@mergify queue

mergify[bot] commented 3 months ago

queue

🛑 The pull request has been removed from the queue default

The queue conditions cannot be satisfied due to failing checks. You can take a look at `Queue: Embarked in merge queue` check runs for more details. In case of a failure due to a flaky test, you should first retrigger the CI. Then, re-embark the pull request into the merge queue by posting the comment `@mergifyio refresh` on the pull request.
jacobweinstock commented 3 months ago

@mergifyio rebase

mergify[bot] commented 3 months ago

rebase

☑️ Nothing to do

- [ ] `queue-position=-1` [📌 rebase requirement] - [X] `-closed` [📌 rebase requirement] - [X] `-conflict` [📌 rebase requirement] - [X] any of: - [X] `#commits-behind>0` [📌 rebase requirement] - [ ] `#commits>1` [📌 rebase requirement] - [ ] `-linear-history` [📌 rebase requirement]