Dynamically provision Stateful Persistent Replicated Cluster-wide Fabric Volumes & Filesystems for Kubernetes that is provisioned from an optimized NVME SPDK backend data storage stack.
Apache License 2.0
724
stars
105
forks
source link
Mayastor 2.7.0 docker images will not start (`exec format error`) #1697
Mayastor 2.7.0 docker images issue a exec format error when starting. This was not the case for 2.6.1 images. This issue is present on only 1 of our cluster nodes (total: 3). All systems are identical, all running Xeon CPUs.
To Reproduce
(prod) worker1 ~ ❱ ctr image pull docker.io/openebs/mayastor-io-engine:v2.7.0
docker.io/openebs/mayastor-io-engine:v2.7.0: resolved |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:b5633ab59f26e0e54870b779f7c6d0a6349bf12bc2c353d472f411d3028ebcc8: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:f20fde36ee60f29d5a216d65c9ddd077b032c7959d9e06a3db1924bd55d517a8: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:8d057d7d555c3319fd6fe5d467ecf54799773f8754be857d1871570fe0b82341: done |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.8 s total: 0.0 B (0.0 B/s)
unpacking linux/amd64 sha256:b5633ab59f26e0e54870b779f7c6d0a6349bf12bc2c353d472f411d3028ebcc8...
done: 6.168198ms
(prod) worker1 ~ ❱ ctr run docker.io/openebs/mayastor-io-engine:v2.7.0 test
exec /bin/io-engine: exec format error
I've only ever seen this error in relation to incompatible docker image architectures, but I cannot see how that could be the case here.
Expected behavior
Image should start as it did with the 2.6.1 image:
(prod) worker1 ~ ❱ ctr run docker.io/openebs/mayastor-io-engine:v2.6.1 test
[2024-07-17T19:12:38.298253713+00:00 INFO io_engine:io-engine.rs:242] Engine responsible for managing I/Os version 1.0.0, revision 58b7ecc18b2f (v2.6.1+0)
[2024-07-17T19:12:38.298351803+00:00 INFO io_engine:io-engine.rs:221] free_pages 2MB: 7729 nr_pages 2MB: 8192
... etc etc ...
OS Info (erroring system)
Distro: Ubuntu 22.04
Kernel version: 5.15.0-116-generic
MayaStor revision or container image: 2.7.0
Linux worker1 5.15.0-116-generic #126-Ubuntu SMP Mon Jul 1 10:14:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
OS Info (working systems)
Distro: Ubuntu 22.04
Kernel version: 5.15.0-113-generic<- Note a slightly different kernel version
MayaStor revision or container image: 2.7.0
Linux worker1 5.15.0-116-generic #126-Ubuntu SMP Mon Jul 1 10:14:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Updates
Update 1: I removed the imaged, pruned, and re-pulled a fresh download. Same error. Doesn't seem to be a corruption issue.
Update 2: I rebooted 5.15.0-113-generic node into kernel 5.15.0-116-generic and the container still starts. So it looks like there is something weird up with this first node.
Update 3: Drained the affected node of pods, did a nerdctl system prune --all. Image still wouldn't start
Update 4: Pulling in a developing sha from the docker registry instead. No idea what has happened to this system, but I think it is related to this containerd issue. My system did suffer an unexpected reboot (caused, I'm fairly sure, by Mayastor/nvme), which may be the cause. In any case, time to close this issue.
Description
Mayastor
2.7.0
docker images issue aexec format error
when starting. This was not the case for2.6.1
images. This issue is present on only 1 of our cluster nodes (total: 3). All systems are identical, all running Xeon CPUs.To Reproduce
I've only ever seen this error in relation to incompatible docker image architectures, but I cannot see how that could be the case here.
Expected behavior
Image should start as it did with the
2.6.1
image:OS Info (erroring system)
Ubuntu 22.04
5.15.0-116-generic
2.7.0
Linux worker1 5.15.0-116-generic #126-Ubuntu SMP Mon Jul 1 10:14:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
OS Info (working systems)
Ubuntu 22.04
5.15.0-113-generic
<- Note a slightly different kernel version2.7.0
Linux worker1 5.15.0-116-generic #126-Ubuntu SMP Mon Jul 1 10:14:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Updates
Update 1: I removed the imaged, pruned, and re-pulled a fresh download. Same error. Doesn't seem to be a corruption issue.
Update 2: I rebooted
5.15.0-113-generic
node into kernel5.15.0-116-generic
and the container still starts. So it looks like there is something weird up with this first node.Update 3: Drained the affected node of pods, did a
nerdctl system prune --all
. Image still wouldn't startUpdate 4: Pulling in a developing sha from the docker registry instead. No idea what has happened to this system, but I think it is related to this containerd issue. My system did suffer an unexpected reboot (caused, I'm fairly sure, by Mayastor/nvme), which may be the cause. In any case, time to close this issue.