knadh / listmonk

High performance, self-hosted, newsletter and mailing list manager with a modern dashboard. Single binary app.
https://listmonk.app
GNU Affero General Public License v3.0
15.1k stars 1.38k forks source link

ARM64: exec /usr/local/bin/docker-entrypoint.sh: exec format error #2114

Open mnbro opened 3 days ago

mnbro commented 3 days ago

Version:

thueske commented 3 days ago

Same here (arm64)

pbence commented 3 days ago

Same here, I inspected the docker image and it contains amd64 binaries (except the listmonk binary itself).

knadh commented 3 days ago

@lmmendes (ref: https://github.com/knadh/listmonk/pull/1892), @mr-karan would you be able to help with this?

lmmendes commented 3 days ago

@lmmendes (ref: https://github.com/knadh/listmonk/pull/1892), @mr-karan would you be able to help with this?

Will take a look at it today (end of day).

@pbence is correct that error message suggests that the wrong binary is packed inside the ARM64 docker image (as proven by him).

Thing here is understanding if we published the x86 image with the wrong tags to docker hub or if the build process built the build the ARM64 image but places there the x86 binary.

@pbence, @thueske and @mnbro are you using docker or podman, and are you using MacBook or running this in Linux (eg aws graviton?)

knadh commented 3 days ago

The goreleaser build config hasn’t changed between tbe last version and this one though. Building bins, packing them and posting on GitHub releases and uploading and tagging to DockerHub are all done automatically by goreleaser.

lmmendes commented 3 days ago

Initial debug.

I'm using MacBook M2 running Docker version 27.1.1, build 6312585

Pulling the latest version of listmonk:v4.0.1 and not forcing the "platform" so that Docker daemon is the one resolving it for me:

$ docker pull listmonk/listmonk:v4.0.1

The version of docker pulled seems to match the right architecture since I'm running an ARM64 cpu (M2):

$  docker inspect listmonk/listmonk | grep Architecture
"Architecture": "arm64",

Explicitly running the listmonk/listmonk:v4.0.1 images pulled before, I get the system running as expected (didn't configure anything, but the binary run without issues)

$ docker run listmonk/listmonk:v4.0.1
Launching listmonk with user=[root] group=[root] PUID=[0] PGID=[0]
2024/10/29 08:21:10.120634 main.go:107: v4.0.1 (f5dfb0c 2024-10-28T07:48:58Z, linux/arm64)
2024/10/29 08:21:10.120733 init.go:162: reading config: config.toml
2024/10/29 08:21:10.120846 init.go:302: connecting to db: localhost:5432/listmonk
2024/10/29 08:21:10.121556 init.go:306: error connecting to DB: dial tcp [::1]:5432: connect: connection refused

In the end of the day, I will go a bit deeper on this, but it seems to be running ok, will ask another person meanwhile with ARM powered PC to try to repeat this instructions and see if they work for them.

@pbence and @mnbro are you running this on your local machine using Docker or are your trying to deploy this to Kubernetes cluster via Helm chart?

cc/ @knadh

thueske commented 2 days ago

@pbence, @thueske and @mnbro are you using docker or podman, and are you using MacBook or running this in Linux (eg aws graviton?)

Using Alpine Linux with Docker on Hetzner ARM64 servers.

/home/thueske [thueske@server] [9:40]
> docker inspect listmonk/listmonk:v4.0.1 | grep Architecture
        "Architecture": "arm64",

/home/thueske [thueske@server] [9:40]
> docker run listmonk/listmonk:v4.0.1
exec /usr/local/bin/docker-entrypoint.sh: exec format error

Additional details:

/home/thueske [thueske@server] [9:41]
> uname -a
Linux server.myserver.de 6.6.54-0-virt #1-Alpine SMP PREEMPT_DYNAMIC 2024-10-04 16:47:58 aarch64 Linux

/home/thueske [thueske@server] [9:41]
> docker -v
Docker version 26.1.5, build a72d7cdbeb991662bf954bfb8d02274124af21e3

(On my MacBook M3 Pro everything works.)

lmmendes commented 2 days ago

@thueske can you run the following on your MacBook and Hetzner instance please:

Enter the listmonk/listmonk:v4.0.1 container running the bash shell and overriding the defined entry point:

$ docker run -ti --rm --entrypoint /bin/sh listmonk/listmonk:v4.0.1

You should be dropped inside the docker container inside the listmonk folder and see the following files:

$  ls -la
total 16888
drwxr-xr-x    1 root     root          4096 Oct 28 07:55 .
drwxr-xr-x    1 root     root          4096 Oct 29 08:52 ..
-rw-r--r--    1 root     root           695 Oct 28 07:55 config.toml
-rwxr-xr-x    1 root     root      17278744 Oct 28 07:55 listmonk

Install now the file util so that we can check the architecture of the binary

$ apk add file

Now just run file listmonk and let's see the architecture of the binary it self, mine in this case on the MacBook is ARM aarch64 as expected.

$ file listmonk
listmonk: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, Go BuildID=ygcIUfpAhsJMmgmT3iCT/uBMxQx8g7XSCnMHe4JO2/SG6wnHwzpWl3V4L9o51H/mVC54gUGRZRDbxP6GhjB, stripped

Will try to run this on a linux machine and probably register on Hetzner to debug later.

thueske commented 2 days ago

Hetzner

The funny thing is that I can't even execute the command...

/home/thueske [thueske@server] [9:59]
> docker run -ti --rm --entrypoint /bin/sh listmonk/listmonk:v4.0.1
exec /bin/sh: exec format error
lmmendes commented 2 days ago

Hetzner

The funny thing is that I can't even execute the command...

/home/thueske [thueske@server] [9:59]
> docker run -ti --rm --entrypoint /bin/sh listmonk/listmonk:v4.0.1
exec /bin/sh: exec format error

@thueske can you delete any existing docker image from your Hetzner host and "force" the pull of the linux/arm64 being explicit with the architecture?

docker pull --platform=linux/arm64 listmonk/listmonk:v4.0.1

The error is strange, can't be sure if the issue is with the docker version installed or with the pulled docker image.

thueske commented 2 days ago

Same error as before.

lmmendes commented 2 days ago

@thueske last question for the day, can you please check if the listmonk/listmonk image that you are running has the ID=943d02b009bb

 docker images | grep listmonk/listmonk
listmonk/listmonk             v4.0.1          943d02b009bb   25 hours ago    28.6MB
listmonk/listmonk             latest          b4602d730116   8 months ago    25.3MB
lmmendes commented 2 days ago

The issue seems to be with the build (docker image) and not the binary inside the docker container.

I created a server in Hetzner (Ampere with ARM64 architecture) using Ubuntu 24.04.1 LTS:

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04.1 LTS
Release:    24.04
Codename:   noble

Running on a ARM64 architecture

$ uname -a
Linux ubuntu-4gb-nbg1-1 6.8.0-45-generic #45-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 30 12:26:41 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Checking the docker version (client=24.0.7 server=24.0.7):

$ docker version
Client:
 Version:           24.0.7
 API version:       1.43
 Go version:        go1.22.2
 Git commit:        24.0.7-0ubuntu4.1
 Built:             Fri Aug  9 02:33:20 2024
 OS/Arch:           linux/arm64
 Context:           default

Server:
 Engine:
  Version:          24.0.7
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.22.2
  Git commit:       24.0.7-0ubuntu4.1
  Built:            Fri Aug  9 02:33:20 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.7.12
  GitCommit:
 runc:
  Version:          1.1.12-0ubuntu3.1
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:

Pulling the docker image

$ docker pull listmonk/listmonk:v4.0.1

Trying to run docker image, we can that running the image fails

$ docker run -ti --rm --entrypoint /bin/sh listmonk/listmonk:v4.0.1
exec /bin/sh: exec format error

The image is the expected:

$ docker images
REPOSITORY          TAG       IMAGE ID       CREATED        SIZE
listmonk/listmonk   v4.0.1    943d02b009bb   26 hours ago   28.6MB

Now trying I'm going to copy the binary from the docker image and run it locally to see if the issue is with the "build" docker image (linux or entrypoint) or the binary:

docker create --name temp_container listmonk/listmonk:v4.0.1
$ docker cp temp_container:/listmonk/listmonk .

Running the extracted listmonk binary from the docker image with ID=943d02b009bb inside the Ubuntu linux host and not the docker.

./listmonk
2024/10/29 09:33:48.323613 main.go:107: v4.0.1 (f5dfb0c 2024-10-28T07:48:58Z, linux/arm64)
2024/10/29 09:33:48.323732 init.go:162: reading config: config.toml
2024/10/29 09:33:48.323775 init.go:165: config file not found. If there isn't one yet, run --new-config to generate one.

The issue is with the packaging of the docker image, we need to review the cicd pipeline.

Jalmeida1994 commented 2 days ago

Hey everyone, hope you're doing well!

After further investigation, it appears that the issue is with the Docker image packaging rather than the listmonk binary itself. Specifically, the Docker image tagged for arm64 contains amd64 system binaries, which leads to the exec format error when running on ARM64 systems.

Here's what we found inside the Docker container:

/bin/sh: symbolic link to /bin/busybox
/bin/busybox: ELF 64-bit LSB pie executable, **x86-64**, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, stripped

This output indicates that /bin/busybox (and therefore /bin/sh) is an x86-64 binary, not an ARM64 binary. As a result, the container fails to start on ARM64 architectures because it tries to execute incompatible binaries.

While the listmonk binary is arm64:

listmonk: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, Go BuildID=ygcIUfpAhsJMmgmT3iCT/uBMxQx8g7XSCnMHe4JO2/SG6wnHwzpWl3V4L9o51H/mVC54gUGRZRDbxP6GhjB, stripped

The Dockerfile might be using the wrong base image architecture. Even though the image is tagged as arm64, the base image (alpine:latest) defaults to amd64 unless specified otherwise.

The build process may not be correctly passing the target platform information, causing it to pull the amd64 base image. The --platform=$BUILDPLATFORM directive tells Docker to pull the base image for the build platform, not the target platform. We would need to investigate further how buildX works.

lmmendes commented 2 days ago

The issue is fixed pull-request #2123 with the help of @Jalmeida1994 <3

cc/ @knadh

mnbro commented 2 days ago

@lmmendes @Jalmeida1994 Thank you very much for fixing the bug!

Could you provide an estimate on when the new Docker image with the fix will be available in the registry?

knadh commented 2 days ago

Merged, thank you!

I'll wait until this weekend to see if there are any more issues with the last release (there are a couple minor, but annoying ones that have already been fixed) and publish v4.0.2.