Closed profhase closed 3 years ago
Experiencing the same issue with this latest version.
The issue appears to be a result of the following change:
Directories created for local-path-provisioner now have more restrictive permissions (#3548)
As a result, local-path
persistent volumes appear to now only be writable by containers running as root
, unless you explicitly change their permissions out-of-band or with a root initContainer
.
Further, the local-path
provisioner by design seems to not support securityContext.fsGroup
in order to mitigate a possible privilege escalation (see: https://github.com/rancher/local-path-provisioner/issues/7#issuecomment-466609004), so we can't simply tell it to create the volume with the correct permissions.
Unless I am missing something, it seems this latest feature introduces a hard requirement of running a root container (either as an initContainer
or the service container itself) in order to use local-path persistent volumes, which is not ideal.
Yeah, it looks like the 0700 permissions are applied to every volume directory, even though the original plan was to apply these permissions to the parent storage folder (--default-local-storage-path
) only.
https://github.com/k3s-io/k3s/issues/2348#issuecomment-811446826
I'm pretty sure ensuring /var/lib/rancher/k3s/storage (and maybe /var/lib/rancher/k3s/data?) have permissions 700 would prevent non-root users from accessing the volumes while allowing them to be used by containers (no matter what user the container runs as).
https://github.com/rancher/local-path-provisioner/issues/182
So volume itself is 0777, but the parent directory secured with 0700 and accessible by root only.
I wonder why the PR that got merged didn't implement it this way.
@dereknola can you take a look at this? It appears that with the permissions change, LocalStorage no longer supports containers that don't run as root.
Yeah, I'll take a look.
I can confirm this behavior. This problem cripples all deployments with PVCs and non-root-containers, which make up about 60% of my complete workload.
Is the only workaround ATM to use an init-container?
@georglauterbach You could also downgrade to K3s 1.21.2 until this is fixed.
@ChristianCiach how do I do that in the best way possible ?:)
Btw. thanks for the fast reply :D
PS: I figured it out. Thanks for the hint nevertheless :)
338f9cae3f5004e8a00489bf865025b76484b510
$ stat -c %a /var/lib/rancher/k3s/storage/
701
$ sudo stat -c %a /var/lib/rancher/k3s/storage/pvc-35801d3f-b6fc-45a8-b3e3-e7aba21343ba_default_postgres-awx-demo-postgres-0
777
ubuntu@maxnode:/var/lib/rancher/k3s/storage$ cd /var/lib/rancher/k3s/storage/pvc-35801d3f-b6fc-45a8-b3e3-e7aba21343ba_default_postgres-awx-demo-postgres-0 && ls
data
ubuntu@maxnode:/var/lib/rancher/k3s/storage/pvc-35801d3f-b6fc-45a8-b3e3-e7aba21343ba_default_postgres-awx-demo-postgres-0$ cd /var/lib/rancher/k3s/storage && ls
ls: cannot open directory '.': Permission denied
$ kubectl get pods -l "app.kubernetes.io/managed-by=awx-operator"
NAME READY STATUS RESTARTS AGE
awx-demo-postgres-0 1/1 Running 0 4m25s
awx-demo-9975db9b6-x9zdw 4/4 Running 0 4m15s
$ kubectl get svc -l "app.kubernetes.io/managed-by=awx-operator"
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
awx-demo-postgres ClusterIP None <none> 5432/TCP 4m29s
awx-demo-service NodePort 10.43.2.11 <none> 80:30474/TCP 4m20s
$ k get pv,pvc -A NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE persistentvolume/pvc-a762866b-aa11-4477-a4c9-1e55a8a7767c 8Gi RWO Delete Bound default/postgres-awx-demo-postgres-0 local-path 43s
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE default persistentvolumeclaim/postgres-awx-demo-postgres-0 Bound pvc-a762866b-aa11-4477-a4c9-1e55a8a7767c 8Gi RWO local-path 45s
- Confirmed root containers continue to work as before but now the subdirectories also have 777 permissions as expected
I take it that this will be available on v1.21.4 and not on v1.21.3? How would we get this fix before .4 release?
I am a bit surprised about that, too. I think this bug is bad enough to justify an early v1.21.3+k3s2
bugfix release.
I personally don't care too much if it'd be in a v1.21.3+k3s2
or a v1.21.4
release, but right now, since the release (18 days ago), the (only) default storage class is broken - so everyone not explicitly pinning a version will get a broken cluster.
Upstream is putting out new patches (v1.21.4) this Wednesday, so we're going to wait for that instead of doing a whole extra release cycle just for this one issue.
Upstream is putting out new patches (v1.21.4) this Wednesday, so we're going to wait for that instead of doing a whole extra release cycle just for this one issue.
Upstream meaning Kubernetes itself?
Up to when we can expect a solution ???
@samip5 Yes, upstream meaning Kubernetes.
@samip5 Yes, upstream meaning Kubernetes.
So how does it help us with broken provisioner on k3s?
The k3s release schedule is generally in lock-step with upstream Kubernetes. So k3s v1.21.4 release comes after Kubernetes has released v1.21.4. K3s releases integrate the changes made to upstream.
You can revert to the previous release, or wait until v1.21.4 is released within the next day or two.
K3s v1.21.4 is now out with the fix for this issue. https://github.com/k3s-io/k3s/releases/tag/v1.21.4%2Bk3s1
Is there a matching k3d release as k3d seems also to be effected
K3d can run any k3s release, just use the --image flag to specify the image and tag you want.
@brandond fantastic, I will use that workaround thanks. Usually just like to use the latest version k3d recommends as default. This is what ultimately broke our CI
Hi, I have now what it seems to be the same issue with v1.22.2-rc1+k3s2
My cluster is a k3s
and inside, I deploy a virtual cluster with vcluster
which is again another k3s
. This cluster deploys a PVC and the local-path creates the PV, but as read-only
.
Here the error I retrieve:
vcluster time="2021-12-28xxxxxxxx" level=fatal msg="failed to evacuate root cgroup: mkdir /sys/fs/cgroup/init: read-only file system"
More details in this other issue: https://github.com/loft-sh/vcluster/issues/264#issuecomment-1002281877
Any idea if it will be fixed in the next releases?
Thank you
@antonioberben that appears to be a completely different problem, related to cgroups. Can you open a new issue?
Hi,
I'm running into the same problem as noted in this issue's OP. I could fix the permissions issue by running:
chmod 777 /var/lib/rancher/k3s/storage/*
The permissions were previously set to 755.
I'm running v1.25.12+k3s1.
Is this a regression? If not, what could cause the storage to be set with 755, instead of 777?
Scott
The /var/lib/rancher/k3s/storage/
directory should be 700. Subdirectories should be 777. These permissions are set when the LocalPath volume is created, and older releases of K3s used different permissions. Confirm that you're on an up-to-date release of K3s and that your local-path-config configmap shows the correct permissions in the setup script.
@brandond - Yea, I meant the folders under /storage/
. They were all set to 755. As I also noted, I'm running v1.25.12+k3s1. In the end, it was a fresh install, as I wiped out the node, reinstalled Ubuntu 22.04 and then k3s.
This is what is in the local-path-config
configMap.
#!/bin/sh
while getopts "m:s:p:" opt
do
case $opt in
p)
absolutePath=$OPTARG
;;
s)
sizeInBytes=$OPTARG
;;
m)
volMode=$OPTARG
;;
esac
done
mkdir -m 0777 -p ${absolutePath}
chmod 700 ${absolutePath}/..
Looks correct to me?
What would happen, if the user doing the k3s install isn't root? Could that also possibly cause this issue? I didn't want to mess with testing it, as now everything is working, thus my "laziness". :shrug:
Scott
If you're not root when running the install script, the script will use sudo to become root. You can check out the docs section on rootless operation if you are curious about running it as an unprivileged user - but what you're asking about wouldn't cause this.
Old releases of K3s used different permissions. As I mentioned, the permissions are set when the volumes are created, so its likely they were created on a different version of K3s?
they were created on a different version of K3s
Yes, I believe originally they were created with v1.23x.
Scott
Environmental Info: K3s Version:
Node(s) CPU architecture, OS, and Version:
Cluster Configuration: Single node
Describe the bug: Postgres does not come up due to
mkdir: cannot create directory ‘/var/lib/postgresql/data’: Permission denied
Steps To Reproduce:
Expected behavior: postgres comes up
Actual behavior: postgres crashes
Additional context / logs:
mkdir: cannot create directory ‘/var/lib/postgresql/data’: Permission denied