kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
110.7k stars 39.57k forks source link

"CreateContainerConfigError: failed to prepare subPath for volumeMount" error with configMap volume #61076

Closed Silvenga closed 6 years ago

Silvenga commented 6 years ago

/kind bug

Fix status:

What happened: Upgraded from v1.9.2 to v1.9.4. I began to bring up the cluster (still messing with different strategies on upgrade) and noticed that every pod that would mount a configMap via subPath would throw a similar error to the one below:

Mar 12 22:30:27 node02 kubelet[1124]: E0312 22:30:27.537327    1124 kubelet_pods.go:248] failed to prepare subPath for volumeMount "config" of container "mumble": subpath "/var/lib/kubelet/pods/66fa673c-266d-11e8-8ebf-00155d00a406/volumes/kubernetes.io~configmap/config/..2018_03_13_03_19_55.572152209/mumble.ini" not within volume path "/var/lib/kubelet/pods/66fa673c-266d-11e8-8ebf-00155d00a406/volumes/kubernetes.io~configmap/config"
Mar 12 22:30:27 node02 kubelet[1124]: E0312 22:30:27.537452    1124 kuberuntime_manager.go:734] container start failed: CreateContainerConfigError: failed to prepare subPath for volumeMount "config" of container "mumble"
Mar 12 22:30:27 node02 kubelet[1124]: E0312 22:30:27.537548    1124 pod_workers.go:186] Error syncing pod 66fa673c-266d-11e8-8ebf-00155d00a406 ("mumble-74798bc4c-xjwrn_default(66fa673c-266d-11e8-8ebf-00155d00a406)"), skipping: failed to "StartContainer" for "mumble" with CreateContainerConfigError: "failed to prepare subPath for volumeMount \"config\" of container \"mumble\""

What you expected to happen: To mount the configMap.

Existing behavior: https://stackoverflow.com/questions/48561338/how-to-correctly-mount-configmap-with-subpath-in-kubernetes-not-update-configs https://stackoverflow.com/questions/44325048/kubernetes-configmap-only-one-file

How to reproduce it (as minimally and precisely as possible):

kind: ConfigMap
apiVersion: v1
metadata:
  name: mumble-config
data:
  mumble.ini: |
    # Murmur configuration file.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: mumble
  labels:
    app: mumble
spec:
  template:
    metadata:
      labels:
        app: mumble
    spec:
      containers:
      - image: custom-image
        name: mumble
        volumeMounts:
        - name: config
          mountPath: /data/mumble.ini
          subPath: mumble.ini
      volumes:
      - name: config
        configMap:
          name: mumble-config

Anything else we need to know?:

Looking further into it it looks like a possible regression caused by #60813?

https://github.com/kubernetes/kubernetes/pull/61045/files#diff-16665fc8caff20aa7d63896dc4e3dd7fR295

Environment:

Silvenga commented 6 years ago

Downgraded to v1.9.3, the problem does not appear to exist.

andreimc commented 6 years ago

I had the same issue, I am not sure if this behaviour is intended and we should use Pod Security Policies to allow the mounting of config maps ?

liggitt commented 6 years ago

@kubernetes/sig-storage-bugs /assign @jsafrane @msau42

liggitt commented 6 years ago

it looks like a possible regression caused by #60813?

thanks for the report. yes, it is a regression related to the fix for #60813 and a backstep detection function that hits a false positive with the directory structure used by configmap/secret volumes

a fix is in progress in https://github.com/kubernetes/kubernetes/pull/61080

andyzhangx commented 6 years ago

Need to mention that this issue also exists in Windows, and the PR also covers for windows.

jberkus commented 6 years ago

@liggitt if this is a 1.9.4 issue, why the 1.10 milestone?

liggitt commented 6 years ago

it's a recently (yesterday) introduced regression that is release blocking and needs cherry picking to 1.7.x, 1.8.x, and 1.9.x

dims commented 6 years ago

@jberkus this problem was triggered by the change made for the CVE yesterday (which had patch for 1.10/master which was backported to 1.9,1.8,1.7 branches). when we shipped 1.9.4, someone noticed it. So this problem exists in v.1.10/master as well. So we start here and then do backports again (i believe @liggitt has filed backports already)

saitejar commented 6 years ago

Is there a workaround for this issue?

liggitt commented 6 years ago

Is there a workaround for this issue?

There is not.

edit: actually, you can mount the entire configmap (or secret) elsewhere in your container and symlink the path you care about to the location you want (in an initContainer, etc). as @joelsmith noted above, that's actually a better approach since it lets you receive updates to the configmap data that propagate into the mounted volume.

liggitt commented 6 years ago

keeping this open until release branches are fixed as well

liggitt commented 6 years ago

@jberkus fix merged into master, moving this back to the v1.9 milestone as this issue is no longer release blocking for 1.10

Joseph-Irving commented 6 years ago

Hi. I've upgrade to the fixed version of 1.9.4 which has solved the configmap subpath problem but I'm encountering the same issue when using subPaths with an emptyDir.

Warning  Failed                 1m (x2 over 2m)  kubelet, ip-172-23-10-171.eu-west-1.compute.internal  Error: failed to prepare subPath for volumeMount "flannel-net-conf" of container "kube-flannel"

where the volume looks like:

volumes:
- emptyDir: {}
  name: flannel-net-conf

and the volume mount is this

 volumeMounts:
  - mountPath: /etc/kube-flannel/net-conf.json
    name: flannel-net-conf
    subPath: net-conf.json

This works fine on 1.9.3. Should I open a new issue about this?

jsafrane commented 6 years ago

@Joseph-Irving, I can't reproduce the issue. This pod starts and creates an empty directory /etc/kube-flannel/net-conf.json in the container.

apiVersion: v1
kind: Pod
metadata:
  name: volumetest
spec:
  containers:
  - name: container-test
    image: busybox
    args:
    - sleep
    - "86400"
    volumeMounts:
    - mountPath: /etc/kube-flannel/net-conf.json
      name: flannel-net-conf
      subPath: net-conf.json

  volumes:
  - emptyDir: {}
    name: flannel-net-conf

net-conf.json is a directory because subPath net-conf.json does not exists in emptydir and it is assumed that's a directory. Do you have an init container that fills net-conf.json before the real container starts? Can you please start a new issue and post full pod spec that can reproduce the issue there so we can track it separately?

Joseph-Irving commented 6 years ago

@jsafrane Yeah exactly that, we have an init container which creates a file and sticks it there before flannel boots up. Sure I'll create a new issue

Joseph-Irving commented 6 years ago

issue created @jsafrane https://github.com/kubernetes/kubernetes/issues/61178

k8s-github-robot commented 6 years ago

[MILESTONENOTIFIER] Milestone Issue: Up-to-date for process

@Silvenga @jsafrane @liggitt @msau42

Issue Labels - `sig/storage`: Issue will be escalated to these SIGs if needed. - `priority/critical-urgent`: Never automatically move issue out of a release milestone; continually escalate to contributor and SIG through all available channels. - `kind/bug`: Fixes a bug discovered during the current release.
Help
guillelb commented 6 years ago

Same issue upgrading from v1.7.11 to v1.7.14

Mounting a configMap:

failed to prepare subPath for volumeMount

liggitt commented 6 years ago

yes, this applies to 1.7.14, 1.8.9, and 1.9.4. point releases to address are scheduled for 3/19

alvaroaleman commented 6 years ago

FYI, in GKE this doesn't seem to interfere with deploying new pods but keeps exiting pods in status "Terminating" after they were requested to get deleted:

Mar 19 09:19:53 gke-app-cluster-app-cluster-pool-78a17572-lxk8 kubelet[1331]: E0319 09:19:53.528768    1331 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/secret/c89a1204-2b4f-11e8-aca8-42010a9c0114-<redacted>\" (\"c89a1204-2b4f-11e8-aca8-42010a9c0114\")" failed. No retries permitted until 2018-03-19 09:21:55.528737036 +0000 UTC m=+3052.192633373 (durationBeforeRetry 2m2s). Error: "error cleaning subPath mounts for volume \"<redacted>\" (UniqueName: \"kubernetes.io/secret/c89a1204-2b4f-11e8-aca8-42010a9c0114-<redacted>\") pod \"c89a1204-2b4f-11e8-aca8-42010a9c0114\" (UID: \"c89a1204-2b4f-11e8-aca8-42010a9c0114\") : error checking /var/lib/kubelet/pods/c89a1204-2b4f-11e8-aca8-42010a9c0114/volume-subpaths/<redacted>/cipher/0 for mount: lstat /var/lib/kubelet/pods/c89a1204-2b4f-11e8-aca8-42010a9c0114/volume-subpaths/<redacted>/cipher/0/..: not a directory
liggitt commented 6 years ago

FYI, in GKE this doesn't seem to interfere with deploying new pods but keeps exiting pods in status "Terminating" after they were requested to get deleted:

Mar 19 09:19:53 gke-app-cluster-app-cluster-pool-78a17572-lxk8 kubelet[1331]: E0319 09:19:53.528768    1331 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/secret/c89a1204-2b4f-11e8-aca8-42010a9c0114-<redacted>\" (\"c89a1204-2b4f-11e8-aca8-42010a9c0114\")" failed. No retries permitted until 2018-03-19 09:21:55.528737036 +0000 UTC m=+3052.192633373 (durationBeforeRetry 2m2s). Error: "error cleaning subPath mounts for volume \"<redacted>\" (UniqueName: \"kubernetes.io/secret/c89a1204-2b4f-11e8-aca8-42010a9c0114-<redacted>\") pod \"c89a1204-2b4f-11e8-aca8-42010a9c0114\" (UID: \"c89a1204-2b4f-11e8-aca8-42010a9c0114\") : error checking /var/lib/kubelet/pods/c89a1204-2b4f-11e8-aca8-42010a9c0114/volume-subpaths/<redacted>/cipher/0 for mount: lstat /var/lib/kubelet/pods/c89a1204-2b4f-11e8-aca8-42010a9c0114/volume-subpaths/<redacted>/cipher/0/..: not a directory

The subpath cleanup issue is tracked in https://github.com/kubernetes/kubernetes/issues/61178 and will also be fixed in the point releases planned for today.

msau42 commented 6 years ago

Gke already released last week with the configmap patch. I am looking into the cleanup issue.

msau42 commented 6 years ago

Ah sorry I can't read. https://github.com/kubernetes/kubernetes/issues/61178 should take care of it.

liggitt commented 6 years ago

/close

obriensystems commented 6 years ago

We are seeing this under Rancher 1.6.13 and 1.6.14 in ONAP in our master branch ahead of the ONS conference https://jira.onap.org/browse/OOM-813

obriensystems commented 6 years ago

have a question for why we backported the 1.8.9 upgrade into 1.6.14 and 1.6.13 which were ok running 1.8.5 - the workaround in ONAP is to use rancher 1.6.12 until 1.6.14 is re-fixed (occurred 5 days ago during the release of 1.6.15)

wrong project - meant to post to rancher on https://github.com/rancher/rancher/issues/12178

which is working around this

prune998 commented 6 years ago

Had the cleanup issue on GKE for few days (Google support looking into it... hope they will find this thread), but I'm now facing the 'failed to prepare subPath for volumeMount' error... This is weird as some of my pods started fine and some other did not...

I'm on 1.9.4-gke.1... will the 1.9.5 release soon ?

msau42 commented 6 years ago

@prune998 this issue could show up during a container restart too. This fix is planned to roll out in GKE this week.

rmorrise commented 6 years ago

Please make sure the fixed version is available on minikube windows! :heart:

msau42 commented 6 years ago

@rmorrise you should probably notify minikube maintainers to make sure they update their kubernetes versions.

dhawal55 commented 6 years ago

@liggitt is this fixed in gke release 1.8.10-gke.0? Don't see any mention in the release notes.

msau42 commented 6 years ago

@dhawal55 yes gke 1.8.10-gke.0 has the fix.

lcortess commented 6 years ago

Hi guys I think this is still happening, I am using kubernetes with google cloud and I've updated to 1.9.6-gke.1 and I got this problem, the solution was the downgrade to 1.9.3-gke.0 :s

msau42 commented 6 years ago

@lcortess can you paste your Pod spec?

lcortess commented 6 years ago

Hi @msau42 this my pod spec

{
    "kind": "Pod",
    "apiVersion": "v1",
    "metadata": {
        "name": "myserver-deployment-xxxxx-xxxxx",
        "generateName": "myserver-deployment-xxxxx-",
        "namespace": "default",
        "labels": {
            "app": "myserver",
            "stage": "production",
            "tier": "backend"
        },
        "ownerReferences": [{
            "apiVersion": "extensions/v1beta1",
            "kind": "ReplicaSet",
            "name": "myserver-deployment-xxxxx",
            "controller": true,
            "blockOwnerDeletion": true
        }]
    },
    "spec": {
        "volumes": [{
            "name": "default-token-xxx",
            "secret": {
                "secretName": "default-token-xxx",
                "defaultMode": 420
            }
        }],
        "containers": [{
            "name": "myserver",
            "image": "myserver:v1.0.0",
            "ports": [{
                "containerPort": 5000,
                "protocol": "TCP"
            }],
            "env": [],
            "volumeMounts": [{
                "name": "default-token-xxx",
                "readOnly": true,
                "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
            }],
            "terminationMessagePath": "/dev/termination-log",
            "terminationMessagePolicy": "File",
            "imagePullPolicy": "IfNotPresent"
        }],
        "restartPolicy": "Always",
        "terminationGracePeriodSeconds": 30,
        "dnsPolicy": "ClusterFirst",
        "serviceAccountName": "default",
        "serviceAccount": "default",
        "securityContext": {},
        "imagePullSecrets": [{
            "name": "docker-secrets"
        }],
        "schedulerName": "default-scheduler",
    }
}
msau42 commented 6 years ago

Ah, @lcortess I think you are hitting this issue with read only volumes: https://github.com/kubernetes/kubernetes/issues/62752

But actually, after 1.9.4, all secret volumes are mounted read only, so you don't need to explicitly specify the readOnly flag for secret volumes.

hightoxicity commented 6 years ago

Still the same issue in 1.10.2 with a daemonset (having 2 running pods, it works for the first one but not the second one).

msau42-tmp commented 6 years ago

@hightoxicity can you post your pod spec?

rossedman commented 6 years ago

Hitting this on 1.8.9-rancher1.

msau42 commented 6 years ago

@rossedman there is a subpath cleanup/container restart issue that was fixed in 1.8.10

hightoxicity commented 6 years ago

@msau42-tmp Hi, I restarted all the control plane daemons, and it seems it fixed the issue... Thx.

nazisangg commented 6 years ago

Hitting this on "OpenShift Master: v3.7.23; Kubernetes Master: v1.7.6+a08f5eeb62"

The spec is: spec: containers:

BrendanThompson commented 6 years ago

I too am facing this problem whilst trying to use subPath with volumeMounts.

App Version
Kubernetes v1.11.1
Docker v1.13.1
Ubuntu 16.04.4
msau42 commented 6 years ago

@BrendanThompson can you open a new issue and paste your pod spec into it?

Aisuko commented 5 years ago

Is there any information actually show which version has been fixed the issue? I still have this issue in my zookeeper chart.

Is there only includes 1.9.5? My kubernetes version, I think I need to upgrade my kubernetes version.

➜  zookeeper git:(dev) ✗ kubectl version 
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-03-01T23:34:27Z", GoVersion:"go1.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T20:55:30Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
➜  zookeeper git:(dev) ✗ kubectl describe po  dev-zookeeper-server-0 --namespace zookeeper
Name:           dev-zookeeper-server-0
Namespace:      zookeeper
Node:           node7/10.116.18.76
Start Time:     Tue, 02 Apr 2019 21:34:32 +0800
Labels:         app=zookeeper
                chart=zookeeper-1.4.2
                controller-revision-hash=dev-zookeeper-server-87495d7f7
                heritage=Tiller
                release=dev
                statefulset.kubernetes.io/pod-name=dev-zookeeper-server-0
Annotations:    <none>
Status:         Pending
IP:             
Controlled By:  StatefulSet/dev-zookeeper-server
Containers:
  dev-zookeeper:
    Container ID:  
    Image:         docker.io/bitnami/zookeeper:3.4.13
    Image ID:      
    Ports:         2181/TCP, 2888/TCP, 3888/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Command:
      bash
      -ec
      # Execute entrypoint as usual after obtaining ZOO_SERVER_ID based on POD hostname
      HOSTNAME=`hostname -s`
      if [[ $HOSTNAME =~ (.*)-([0-9]+)$ ]]; then
        ORD=${BASH_REMATCH[2]}
        export ZOO_SERVER_ID=$((ORD+1))
      else
        echo "Failed to get index from hostname $HOST"
        exit 1
      fi
      . /opt/bitnami/base/functions
      . /opt/bitnami/base/helpers
      print_welcome_page
      . /init.sh
      nami_initialize zookeeper
      exec tini -- /run.sh

    State:          Waiting
      Reason:       CreateContainerConfigError
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:      250m
      memory:   256Mi
    Liveness:   tcp-socket :client delay=30s timeout=5s period=10s #success=1 #failure=6
    Readiness:  tcp-socket :client delay=5s timeout=5s period=10s #success=1 #failure=6
    Environment:
      ZOO_PORT_NUMBER:        2181
      ZOO_TICK_TIME:          2000
      ZOO_INIT_LIMIT:         10
      ZOO_SYNC_LIMIT:         5
      ZOO_MAX_CLIENT_CNXNS:   60
      ZOO_SERVERS:            dev-zookeeper-0.dev-zookeeper-headless.zookeeper.svc.cluster.local:2888:3888
      ZOO_ENABLE_AUTH:        yes
      ZOO_CLIENT_USER:        quantex
      ZOO_CLIENT_PASSWORD:    <set to the key 'client-password' in secret 'dev-zookeeper'>  Optional: false
      ZOO_SERVER_USERS:       quantex
      ZOO_SERVER_PASSWORDS:   <set to the key 'server-password' in secret 'dev-zookeeper'>  Optional: false
      ZOO_HEAP_SIZE:          1024
      ALLOW_ANONYMOUS_LOGIN:  yes
    Mounts:
      /bitnami/zookeeper from data (rw)
      /opt/bitnami/zookeeper/conf/zoo.cfg from config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8r6xd (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-dev-zookeeper-server-0
    ReadOnly:   false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      dev-zookeeper
    Optional:  false
  default-token-8r6xd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-8r6xd
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age                From               Message
  ----     ------                 ----               ----               -------
  Normal   Scheduled              68s                default-scheduler  Successfully assigned dev-zookeeper-server-0 to node7
  Normal   SuccessfulMountVolume  67s                kubelet, node7     MountVolume.SetUp succeeded for volume "config"
  Normal   SuccessfulMountVolume  67s                kubelet, node7     MountVolume.SetUp succeeded for volume "default-token-8r6xd"
  Normal   SuccessfulMountVolume  67s                kubelet, node7     MountVolume.SetUp succeeded for volume "pvc-d221a9ce-5527-11e9-a07c-0050569e1842"
  Warning  Failed                 50s (x7 over 65s)  kubelet, node7     Error: failed to prepare subPath for volumeMount "config" of container "dev-zookeeper"
  Normal   SandboxChanged         49s (x7 over 65s)  kubelet, node7     Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled                 48s (x8 over 65s)  kubelet, node7     Container image "docker.io/bitnami/zookeeper:3.4.13" already present on machine