kubernetes / node-problem-detector

This is a place for various problem detectors running on the Kubernetes nodes.
Apache License 2.0
2.85k stars 616 forks source link

Intel binary present on the ARM #827

Closed Marta-Panfilova-ASL closed 6 months ago

Marta-Panfilova-ASL commented 9 months ago

We are getting following error from NPD in our nodes /bin/sh: 1: exec: /node-problem-detector: Exec format error

Crashing because of Intel/AMD 64bit binary present on aarch64/ARM64 docker image.

root@:/# file node-problem-detector node-problem-detector: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=bd9bd59069612b94e84e1c60d73ebf61d618cd44, for GNU/Linux 3.2.0, with debug_info, not stripped

root@:/# unae me -a Linux 654213c1a942 6.4.16-orbstack-00105-g14094bfeec09 #1 SMP Mon Sep 18 21:45:38 UTC 2023 aarch64 GNU/Linux

root@:/# file /bin/bash /bin/bash: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=96c22dfc3c74ccf4ba77d9cce6fc2c5e635456c1, for GNU/Linux 3.7.0, stripped

autarchprinceps commented 9 months ago

I get the same issue with at the least registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.14 Seems like you guys didn't build the binary on each architecture this time. This used to work, but I couldn't quite say which version it broke at. Maybe it was more a change in your CI system too, unrelated to the particular version. Definitively a straight up bug, this image will not work like this on ARM. It still works for x86 of course, but like this there isn't a point to building it multiarch for ARM at all, which it is:

Name:   registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.14 (Type: application/vnd.docker.distribution.manifest.list.v2+json)
Digest: sha256:76da9ce953a0d50e9e14fa29acdc44852a828710c49af6df04ee131cf3c0c7e2
 * Contains 2 manifest references (2 images, 0 attestation):
[1]     Type: application/vnd.docker.distribution.manifest.v2+json
[1]   Digest: sha256:98034f47cdfa331105265f025c0dbff781e8c15436f726eceacec52644a7c014
[1]   Length: 1488
[1] Platform:
[1]    -      OS: linux
[1]    -    Arch: amd64
[1] # Layers: 6
     layer 01: digest = sha256:c4e25f9975ade78ed0838b6cbf3f4fe1c6bf817811f3c8e676b1cbb6a7104950
                 type = application/vnd.docker.image.rootfs.diff.tar.gzip
     layer 02: digest = sha256:22cdf6360626d401a884ae910323267a7d7876e5717ed11d9d13bbc6cf8cb0db
                 type = application/vnd.docker.image.rootfs.diff.tar.gzip
     layer 03: digest = sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1
                 type = application/vnd.docker.image.rootfs.diff.tar.gzip
     layer 04: digest = sha256:a19a5af3e67a96f03e92287598725e391cf7df49799ceb0f1abc1c88fdfb8e82
                 type = application/vnd.docker.image.rootfs.diff.tar.gzip
     layer 05: digest = sha256:f239dff246d22a7de3be61ff3679722e66982089630b0591a057d01889f0c986
                 type = application/vnd.docker.image.rootfs.diff.tar.gzip
     layer 06: digest = sha256:cad51d6b0d0b6b9ac3b3f1a0051eb93e35d1b84c5e25ad2482f8b71a49e58a01
                 type = application/vnd.docker.image.rootfs.diff.tar.gzip

[2]     Type: application/vnd.docker.distribution.manifest.v2+json
[2]   Digest: sha256:38ddabcad5a96bfb4af3bf6e61348198f7da4441b08a17c9ad0ea9bdb3a8689a
[2]   Length: 1488
[2] Platform:
[2]    -      OS: linux
[2]    -    Arch: arm64
[2] # Layers: 6
     layer 01: digest = sha256:f91cc52e2c97c480189bb2da87653dd9abfe2679c36fc41b9db68878827748b4
                 type = application/vnd.docker.image.rootfs.diff.tar.gzip
     layer 02: digest = sha256:86c650b37460bdd568db653fe0ea8d628fe802bb15b47c5c2879fb2fd80b1d50
                 type = application/vnd.docker.image.rootfs.diff.tar.gzip
     layer 03: digest = sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1
                 type = application/vnd.docker.image.rootfs.diff.tar.gzip
     layer 04: digest = sha256:a19a5af3e67a96f03e92287598725e391cf7df49799ceb0f1abc1c88fdfb8e82
                 type = application/vnd.docker.image.rootfs.diff.tar.gzip
     layer 05: digest = sha256:f239dff246d22a7de3be61ff3679722e66982089630b0591a057d01889f0c986
                 type = application/vnd.docker.image.rootfs.diff.tar.gzip
     layer 06: digest = sha256:cad51d6b0d0b6b9ac3b3f1a0051eb93e35d1b84c5e25ad2482f8b71a49e58a01
                 type = application/vnd.docker.image.rootfs.diff.tar.gzip
runningman84 commented 9 months ago

The problem does not exist for older versions like registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.13

hakman commented 9 months ago

This was fixed in https://github.com/kubernetes/node-problem-detector/pull/801, just that a new release will be needed.

runningman84 commented 9 months ago

@hakman do you have any ETA for the release? We use flux to auto apply minor updates and this broke a lot of clusters...

hakman commented 9 months ago

There are very few people that can do releases. This is why https://github.com/kubernetes/node-problem-detector/pull/819.

disconn3ct commented 9 months ago

Are you saying that releases on hold until that feature ticket is resolved? This fix has been accepted, there shouldn't be much of a process block. Releases happened before somehow, and this is a major breakage.

tuxpeople commented 9 months ago

I would suggest thinking about fixing the release process, as a release process not allowing to publish an important fix for a major breakage for two weeks is undoubtedly broken.

disconn3ct commented 7 months ago

Two months without a release. This takes "breaking the deploy and going home" to a whole new level.

hakman commented 6 months ago

@vteratipally @Random-Liu Can we do something about a NPD release?

hakman commented 6 months ago

NPD v0.8.15 has been released:

% crane digest registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.15
sha256:15a8bad79ae26124109e6fdb696fd8536b2043f4ece8ff48f03073157d6a2c2d

/close

k8s-ci-robot commented 6 months ago

@hakman: Closing this issue.

In response to [this](https://github.com/kubernetes/node-problem-detector/issues/827#issuecomment-1881738779): >NPD v0.8.15 has been released: >``` >% crane digest registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.15 >sha256:15a8bad79ae26124109e6fdb696fd8536b2043f4ece8ff48f03073157d6a2c2d >``` >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
vteratipally commented 6 months ago

@hakman

Could you please publish the release notes

hakman commented 6 months ago

Could you please publish the release notes

@vteratipally I think only approvers have permission to publish releases.

disconn3ct commented 6 months ago

NPD v0.8.15 has been released:

Is it a release if it is only partially done? Can this be opened until the release is actually complete?