kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
110.19k stars 39.43k forks source link

Setting defaultMode is not Fully Respected When Pod.spec.securityContext.runAsUser is Set #57923

Open pnovotnak opened 6 years ago

pnovotnak commented 6 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

Setting Pod.spec.securityContext.runAsUser causes the group read permission bit to be set on secrets exposed via volumes, even if Pod.spec.volumes[x].defaultMode is set to 256.

See also: https://github.com/openshift/origin/issues/16424

What you expected to happen:

Given a defaultMode of 256 the file mode should be 0400 but it is 0440 instead.

How to reproduce it (as minimally and precisely as possible):

Create the following objects and observe the logs of the created pod:

---
apiVersion: v1
data:
  test: dGVzdA==
kind: Secret
metadata:
  name: test-secret
type: Opaque

---
apiVersion: v1
kind: Pod
metadata:
  generateName: issue-repro-
spec:
  securityContext:
    runAsUser: 1000
    fsGroup: 1000
  containers:
  - image: busybox
    name: busybox
    imagePullPolicy: IfNotPresent
    args:
    - "ls"
    - "-alR"
    - "/tmp/dummy-secret"
    volumeMounts:
    - mountPath: /tmp/dummy-secret
      name: test-secret

  volumes:
  - name: test-secret
    secret:
      defaultMode: 256
      secretName: test-secret

Anything else we need to know?:

Environment:

liggitt commented 6 years ago

/sig storage

liggitt commented 6 years ago

potential fix in https://github.com/kubernetes/kubernetes/pull/57935

liggitt commented 6 years ago

hmm, reading further, it looks like this is working as designed for fsGroup, though that design doesn't play nicely with the mode options (the API doc for those mode fields explicitly says you might end up with different options when combined with fsGroup).

different containers can run as different uids, yet still share a single secret volume. the only way for them to all be able to read the data from that volume when running as different uids is to have a common fsGroup set, and have group read permissions applied to the data in that volume. that means expanding the file permissions to include group permissions, even when that was not requested or desired.

the only real way I see to fix this is to change the 1-volume:*-containers relationship and start making per-container (or per-uid) volumes for these types, setting file owner to the resolved uid of the container (which is not always possible to resolve), but that's a major redesign/departure of the volume subsystem

fejta-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

sebasmannem commented 6 years ago

I’m also running into an issue with ownership / permissions on a Kubernetes secret. I have a container (that is run by k8s cron, but that’s beside the point), that runs as uid=999 (pod security: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups). I want to use a secret, being a SSL-certificate private key for client cert auth. Using Client certificates requires permissions to be 0600., which means that ownership must be set to 999: also (since uid=999 can only read a file with permission 0600 if he’s owner). I can’t seem to be getting it to work. For now I worked around the issue by reading the file, and writing the contents to a new file on tmpfs with proper ownership / permissions. But copying contents of secrets to other locations feels a bit contra dictional to what we want to achieve (higher security).

Is this the same issue / related? Or should I submit a new?

jrnt30 commented 6 years ago

This is very similar to an issue we are experiencing post upgrade to 1.8.11 that changes some of the mount options from RW to RO. We are using a projected volume to group several disparate secrets/configmaps into a logical volume for the underlying container.

We had fiddled with permissions before but couldn't get things "just right" so we have been running a chown in our entrypoint and the using su-exec to degrade the permissions of the process that is actually running. The changes introduced in #58720 made this option no longer viable as that projected volume is in RO mode now.

We attempted addressing this by:

Unfortunately we have yet to find a permutation that allows us to do what is probably "the right thing" which is:

Are other people doing essentially the suggestion above and just skirting around the issue by moving files around in their entrypoint?

fejta-bot commented 6 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten /remove-lifecycle stale

sebasmannem commented 6 years ago

/remove-lifecycle rotten

sebasmannem commented 6 years ago

@liggitt , @pnovotnak , can you answer the question of @jrnt30 ?

Are other people doing essentially the suggestion above and just skirting around the issue by moving files around in their entrypoint?

liggitt commented 6 years ago

Are other people doing essentially the suggestion above and just skirting around the issue by moving files around in their entrypoint?

I'll have to defer to @kubernetes/sig-storage-misc for that question

fejta-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 5 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 5 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 5 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/kubernetes/issues/57923#issuecomment-437706158): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
gladiatr72 commented 5 years ago

/remove-lifecycle rotten

drewhemm commented 5 years ago

I’m also running into an issue with ownership / permissions on a Kubernetes secret. I have a container (that is run by k8s cron, but that’s beside the point), that runs as uid=999 (pod security: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups). I want to use a secret, being a SSL-certificate private key for client cert auth. Using Client certificates requires permissions to be 0600., which means that ownership must be set to 999: also (since uid=999 can only read a file with permission 0600 if he’s owner). I can’t seem to be getting it to work. For now I worked around the issue by reading the file, and writing the contents to a new file on tmpfs with proper ownership / permissions. But copying contents of secrets to other locations feels a bit contra dictional to what we want to achieve (higher security).

Is this the same issue / related? Or should I submit a new?

Hi @sebasmannem, I have run into the exact same issue, also only in a cron job. Same code works for a regular pod. Did you manage to find a more elegant solution?

jingxu97 commented 5 years ago

@sebasmannem @jrnt30, sorry for the delay. But could you describe more details about the problems? When you try to use defaultMode and/or fsgroup in the Pod spec, what is the resulting permission of the files? Thanks!

benhartley commented 5 years ago

I have hit this issue too. The following config excerpt results in files with 0440 instead of 0400 as expected:

kind: Deployment
...
spec:
  template:
    spec:
      securityContext:
        runAsUser: 472
        fsGroup: 472
...
      containers:
        - name: fip
          securityContext:
            runAsUser: 0
          volumeMounts:
          - name: foo
            mountPath: /etc/bip
...
      volumes:
        - name: foo
          secret:
            secretName: bar
            defaultMode: 256
jsafrane commented 5 years ago

IMO, fsGroup and defaultMode conflicts in Secrets (and other AtomicWriter volumes). AtomicWriter.Write prepares the volume with the right defaultMode first, i.e. a file is chmod to 400 (oct) and only after that SetVolumeOwnership is called to apply fsGroup:

https://github.com/kubernetes/kubernetes/blob/657a1a1a3457bc599005b1ca30c338c03e9d4aa0/pkg/volume/secret/secret.go#L251-L261

SetVolumeOwnership then chmods the file to 660 (or 440 for read-only files), ignoring any defaultMode:

https://github.com/kubernetes/kubernetes/blob/7a9f21bbb828a0f58e6c51234c1ba0e16efb6727/pkg/volume/volume_linux.go#L76-L86

jsafrane commented 5 years ago

/reopen

k8s-ci-robot commented 5 years ago

@jsafrane: Reopened this issue.

In response to [this](https://github.com/kubernetes/kubernetes/issues/57923#issuecomment-489007146): >/reopen > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
msau42 commented 5 years ago

@jsafrane what do you think should be the solution? Fail if both fsgroup and default mode are set? Or have default mode override fsgroup?

Or should we need to to come up with a new API to specify volume ownership that is:

gnufied commented 5 years ago

The API documentation of DefaultMode fwiw explicitly says it only guarantees file creation with given DeaultMode and specified DefaultMode may conflict with fsGroup:

    // Mode bits to use on created files by default. Must be a value between
    // 0 and 0777.
    // Directories within the path are not affected by this setting.
    // This might be in conflict with other options that affect the file
    // mode, like fsGroup, and the result can be other mode bits set.

I do not think we should change meaning of DefaultMode just yet.

jingxu97 commented 5 years ago

discussed with @mesaugat, one proposal to fix this is that for volume types of secrets, configMap etc, we should set up the ownership along with mode together while writing the files (see code https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/atomic_writer.go#L396), so that no need to call SetVolumeOwnership function after finish writing the data at all. This way, it avoids SetVolumeOwnership overwrites the defaultMode which was set up earlier.

liggitt commented 5 years ago

I explored something similar in https://github.com/kubernetes/kubernetes/pull/57935/files and couldn't get it to work with containers running as different uids with an fsGroup

jingxu97 commented 5 years ago

@liggitt thanks for the information. I think we might still honor fsGroup and defualtMode setting with the following restrictions

  1. If user has different containers that run as different uids, they should not set defaultMode to 0400. This is considered as an invalid setting, could be validated and disallowed?
  2. otherwise, user could set fsGroup and defaultMode freely. Although fsGroup seems does not bring much in this case.
liggitt commented 5 years ago

If user has different containers that run as different uids, they should not set defaultMode to 0400. This is considered as an invalid setting, could be validated and disallowed?

We can't currently tighten validation in a way that would invalidate data already allowed to be persisted in etcd. xref https://github.com/kubernetes/kubernetes/issues/64841

Also, we can't detect in the API validation what the natural uid is for containers that didn't specify a uid in the pod spec.

If we change how this is handled, I think it could only be done at runtime.

jsafrane commented 5 years ago

There should be a way how to provide e.g. ssh key as 0600.

IMO, if user specifies Mode or DefaultMode, we should honor it and not to overwrite it during fsGroup. I know that Mode description mentions "This might be in conflict with other options that affect the file mode, like fsGroup, and the result can be other mode bits set.", I know it may break existing apps, but why user wrote 0600 and expected 0640 instead? To me it looks like a bug in Pod object.

We can change AtomicWriter to do chown and chmod using fsGroup if Mode/DefaultMode is not set.

jingxu97 commented 5 years ago

Just noticed another issue https://github.com/kubernetes/kubernetes/issues/34982, maybe also relevant

wjam commented 4 years ago

@jsafrane I've been looking through the code and it appears that all uses of AtomicWriter.Write - secrets, configmaps, downward apis, projections & cephfs - either specify a file mode directly, as with cephfs, or the default when the user specifies no modes (i.e. staging/src/k8s.io/api/core/v1/types/SecretVolumeSourceDefaultMode) are all 0644. Also, the behaviour of pkg/volume/SetVolumeOwnership for configmaps, downward apis & projections is to enforce that all files have at least 0440 permissions.

If we accept that if the user asks for something they should get it, then that means that the behaviour should be:

  1. Perform chown on file to ensure it's owned by the correct group
  2. If the user has specified a Mode in the KeyToPath struct then use that.
  3. If the user has specified a DefaultMode in the *VolumeSource structs, then use that.
  4. If a user hasn't specified a DefaultMode, then the value will be automatically set to the default for that source which is currently 0644 everywhere - 0644 is already wider than what volume.SetVolumeOwnership would have enforced.
zrss commented 4 years ago

/cc

myidpt commented 4 years ago

As Istio is moving to SDS by default, which uses the projected volume with the "defaultMode" set, this issue breaks our customers. It would be great to get this resolved.

incfly commented 4 years ago

/cc @incfly

mgxian commented 4 years ago

Dose somebody track this issue ? How to resolved this bug?

devlinyvonne commented 4 years ago

Is there a plan to resolve this ticket? Our organisation is currently using Istio in our kubernetes cluster, we have cronjobs that are broken because the permissions on our id_rsa file has been changed and we are no longer able to clone a git repo. As the cronjobs can't run it is blocking code changes from being promoted to our production clusters which is in turn impacting our customers. I'm sure our company isn't the only consumer of istio/kubernetes that is impacted by this problem

howardjohn commented 4 years ago

Any updates on this? This continues to break users that are attempting to us features like ProjectedVolumeMounts with sidecar containers: https://github.com/istio/istio/issues/26882

pboushy commented 4 years ago

I've now run into this issue in 2 different scenarios that are frustrating:

  1. Mount jmx.password in to a tomcat docker image to provide the username/passwords for JMX remote functionality.
  2. Mounting certificate files into jamfllc/jamf-pki-proxy which requires the key/pem to be 0600.

Is there a timeline on this issue being fixed? It appears it has existed without any resolution since January 2018.

karthick-kk commented 3 years ago

bump. I need to mount ssh key pairs on to a pod running as an elevated privilege user to use along with an glusterfs mount.

rblaine95 commented 3 years ago

The workaround I used for this was to run an initContainer which mounts and SSH Private Key in a temp directory, then copies it to an emptyDir and chmod's it appropriately. The container that then needs the private key mounts the same emptyDir volume

apiVersion: v1
kind: Deployment
metadata:
  name: my-example-deployment
spec:
  template:
    spec:
      initContainers:
        - name: prep-id-rsa
          image: busybox:1.30
          command:
            - sh
            - -c
            - |-
              cp /tmp/id_rsa /well/known/dir/id_rsa
              chmod 0600 /well/known/dir/id_rsa
          volumeMounts:
            - name: id-rsa
              mountPath: /tmp/id_rsa
              subPath: id_rsa
            - name: empty-dir
              mountPath: /well/known/dir
      containers:
        - name: my-container
          image: alpine:3
          command:
            - sh
            - -c
            - |-
              ls -la /tmp/id_rsa
              ls -la /root/.ssh
          volumeMounts:
            - name: id-rsa
              mountPath: /tmp/id_rsa
              subPath: id_rsa
            - name: empty-dir
              mountPath: /root/.ssh
          resources:
            requests:
              cpu: 1m
              memory: 1Mi
            limits:
              cpu: 1m
              memory: 1Mi
          securityContext:
            runAsUser: 1000
      volumes:
        - name: id-rsa
          secret:
            secretName: my-ssh-private-key
        - name: empty-dir
          emptyDir:
            medium: memory
msau42 commented 3 years ago

I think the next step in investigating a solution for this issue is to see if we can generalize the work done for projected service account tokens to other volume types.

Jiawei0227 commented 3 years ago

/cc

abhisheksinghbaghel commented 3 years ago

/cc

Jiawei0227 commented 3 years ago

For anyone who is following this thread: we are planning to resolve this issue in 1.22 release. Here is the enhancement: https://github.com/kubernetes/enhancements/pull/2606. Feel free to take a look if you are interested. Thanks!

andrewchen5678 commented 2 years ago

I saw that the pull request above is closed rather than merged, does it mean it is not resolved?

xing-yang commented 1 year ago

/triage accepted

rata commented 1 year ago

The problem is not that defaultMode is not respected when runAsUser is set. It is not respected when fsGroup is set, which is documented on one hand. On the other hand, it would be nice for the most specific setting to take precedence.

The issue here is that fsGroup set means all volumes should have this GID and read permissions for the group. When defaultMode is set, this mode is used. When both are set, it is not obvious what should take precedence.

This was documented in the KEP and in the defaulMode field documentation. But I do think the bahviour might be better if the "closest" setting to the volume takes precedence.

k8s-triage-robot commented 8 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

msau42 commented 6 months ago

/triage accepted

lin-crl commented 1 month ago

The copy-key workaround is fine if it's for short-term. We have a situation where tls cert and private key are mounted from secrets, and copied to another directory and chmod the keys to 400. However it becomes an issue, as when cert are rotated, since Secrets are not directly mounted, the cert rotation requires additional work to copy them over again. Is anyone aware of it? Would like to see the root cause addressed.