temporalio / docker-builds

Temporal service Docker images build
https://hub.docker.com/r/temporaliotest/auto-setup
MIT License
30 stars 59 forks source link

[Bug] Incorrect temporal-server binary architecture for linux/arm64 images #213

Closed michalkurzeja closed 5 months ago

michalkurzeja commented 5 months ago

What are you really trying to do?

We're running temporal server version 1.22.7 on AWS ECS, with Linux/ARM64 arch. Today, I've been trying to upgrade the server image to version 1.24.1.

Describe the bug

The container starts, then almost immediately fails when trying to execute /usr/local/bin/temporal-server binary with the error:

/etc/temporal/start-temporal.sh: line 16: /usr/local/bin/temporal-server: cannot execute binary file: Exec format error

This indicates that the host and the binary have incompatible architectures.

After some debugging, I found that, while the docker image is built for the right architecture (linux/arm64), the temporal-server binary it contains is compiled for x86-64.

Minimal Reproduction

I did an investigation on my M1 Mac (an arm64 CPU):

I ran the same steps for temporalio/server:1.23.1 and got the same result. The binary inside temporalio/server:1.22.7, however, returns the expected output:

temporal-server: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, Go BuildID=lMB36VsHl13o_zbRkAOX/KLFSmh-4BWu9xrJPZ8xV/D5mleHXajMoHkVUh0E_8/4zHnEAh1M4P-I87BNyNE, with debug_info, not stripped

Here, we can clearly see the arch is ARM aarch64.

A side note: The bugged images seem to run on M1 Macs just fine, probably thanks to Rosetta.

Environment/Versions

michalkurzeja commented 5 months ago

I can see that there was an attempt to fix this issue but it seems it has not been included in the latest releases: https://github.com/temporalio/docker-builds/pull/208.

robholland commented 5 months ago

What's strange is that only temporal-server is affected. The others are correct. Looking into this.

robholland commented 5 months ago

Ok, this is a timing thing. So the release process doesn't actually build new images, it copies images from temporaliotest. That means although the release was done after the cross-compile fix was merged, the images that were released were those built before the cross-compile fix was merged, due to the release validation process. I've confirmed that temporaliotest images being built currently have the correct arch for temporal-server. So the next release should fix this issue. I will check internally about re-building and releasing fixed images for those that were affected.

michalkurzeja commented 5 months ago

Confirmed, the re-released tags work on arm64 hosts now. Thanks!