Closed crobby closed 1 year ago
Another change was https://github.com/kserve/modelmesh-serving/pull/151 where the quickstart etcd version was updated to v3.5.4
from the latest
tag which actually corresponds to quite an old version (~v3.3.8
) . If you swap to an older version or even just the latest
tag again, does the deployment come up properly? I am wondering if some change with the newer version of the etcd image caused this.
Another change was #151 where the quickstart etcd version was updated to
v3.5.4
from thelatest
tag which actually corresponds to quite an old version (~v3.3.8
) . If you swap to an older version or even just thelatest
tag again, does the deployment come up properly? I am wondering if some change with the newer version of the etcd image caused this.
I do see the same problem when using the latest tag of the etcd image. It looks like the non-deployment (naked pod) version in 0.8.0 was running as root, but running as part of a deployment, it runs as non-root. When I look in the container, it seem to be trying to create the data directory in / which will always fail for non-root.
@crobby you could try adding workingDir: $HOME
to the container spec. Alternatively you could add a --data-dir $HOME/etcd
cmd line flag.
@crobby you could try adding
workingDir: $HOME
to the container spec. Alternatively you could add a--data-dir $HOME/etcd
cmd line flag.
That doesn't seem to do the trick. At runtime, it is running as the (randomized) user: 1000670000
Here is the entry from /etc/passwd for that user inside the container
1000670000:x:1000670000:0:1000670000 user:/:/sbin/nologin
Looks like / is the $HOME, which doesn't seem to have much of a chance of being writable by anything other than root.
OK how about adding
- --data-dir
- /tmp/etcd.data
to the container args
? The standalone container is only intended for dev/temporary use anyhow.
OK how about adding
- --data-dir - /tmp/etcd.data
to the container
args
? The standalone container is only intended for dev/temporary use anyhow.
I will give that a try on Monday, thanks. What sort of setup would you recommend for production use?
OK how about adding
- --data-dir - /tmp/etcd.data
to the container
args
? The standalone container is only intended for dev/temporary use anyhow.I will give that a try on Monday, thanks. What sort of setup would you recommend for production use?
using /tmp/"whatever" does appear to work. Thanks.
What sort of setup would you recommend for production use?
@crobby a small multi-member etcd cluster with TLS configured. I'm not sure whether there's an OpenShift operator for this apart from the one managing the Kube-backing etcd. There is a public operator https://github.com/improbable-eng/etcd-cluster-operator but that does not appear to be actively maintained. I think there are also example helm charts out the which could be used.
Note that the system will recover fine if data in etcd is lost so persistence isn't critical. The recommendation here is for stability/scalability/security.
The etcd deployment/pod fails to start on OpenShift
To Reproduce Steps to reproduce the behavior:
Expected behavior All pods come up without errr
Environment (please complete the following information):
Client Version: 4.11.0-202207191902.p0.g7075089.assembly.stream-7075089 Kustomize Version: v4.5.4 Server Version: 4.11.0-rc.6 Kubernetes Version: v1.24.0+9546431
Additional context I'm guessing this is likely due to OpenShift pods being ran as a user other than root, but I'm not sure why the old version 0.8.0, which just ran a pod instead of a deployment did not have this problem.