[BUG] RWX volume attachment takes a long time

shuo-wu commented 3 years ago

Describe the bug Launching a workload that requires a new RWX volume takes a long time (4~5 minutes). The direct cause is the unexpected slow RWX volume attachment. Here is the original issue report: https://rancher-users.slack.com/archives/CC2UQM49Y/p1628589991044200?thread_ts=1628252452.025500&cid=CC2UQM49Y

To Reproduce Steps to reproduce the behavior:

Expected behavior The RWX volume can be attached in 5~30 seconds typically.

Log

Events:
  Type     Reason              Age                  From                     Message
  ----     ------              ----                 ----                     -------
  Normal   Scheduled           5m34s                default-scheduler        Successfully assigned monitoring/promotheus-prometheus-alertmanager-669c59cc84-jpb8k to X
  Warning  FailedMount         3m31s                kubelet                  Unable to attach or mount volumes: unmounted volumes=[storage-volume], unattached volumes=[promotheus-prometheus-alertmanager-token-gpjg4 config-volume storage-volume]: timed out waiting for the condition
  Warning  FailedAttachVolume  116s (x9 over 4m4s)  attachdetach-controller  AttachVolume.Attach failed for volume "alertmanager-data" : rpc error: code = DeadlineExceeded desc = volume alertmanager-data failed to attach to node X
  Warning  FailedMount         74s                  kubelet                  Unable to attach or mount volumes: unmounted volumes=[storage-volume], unattached volumes=[storage-volume promotheus-prometheus-alertmanager-token-gpjg4 config-volume]: timed out waiting for the condition

Environment:

Longhorn version:
Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
- Number of management node in the cluster:
- Number of worker node in the cluster:
Node config
- OS type and version:
- CPU per node:
- Memory per node:
- Disk type(e.g. SSD/NVMe):
- Network bandwidth between the nodes:
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
Number of Longhorn volumes in the cluster:

Additional context Add any other context about the problem here.

shuo-wu commented 3 years ago

I have checked the support bundle (the user linked it to issue #2829 in the email). The attachment looks fine on the Longhorn side. Here is the analysis:

For the RWX volume alertmanager-data :

Kubernetes requests volume attachment via CSI attacher

2021-08-10T09:58:44.518418375Z I0810 09:58:44.517594       1 connection.go:183] GRPC request: {"node_id":"sky-210015-ds","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":5}},"volume_context":{"diskSelector":"","nodeSelector":"","numberOfReplicas":"3","staleReplicaTimeout":"20"},"volume_id":"alertmanager-data"}

Longhorn CSI Plugin receives attachment call from Kubernetes

2021-08-10T09:58:44.519989195Z time="2021-08-10T09:58:44Z" level=info msg="ControllerPublishVolume: req: {\"node_id\":\"sky-210015-ds\",\"volume_capability\":{\"AccessType\":{\"Mount\":{\"fs_type\":\"ext4\"}},\"access_mode\":{\"mode\":5}},\"volume_context\":{\"diskSelector\":\"\",\"nodeSelector\":\"\",\"numberOfReplicas\":\"3\",\"staleReplicaTimeout\":\"20\"},\"volume_id\":\"alertmanager-data\"}"

Longhorn manager starts to attach the volume then starts a share manager pod. The volume is for the data store. The share manager pod is actually an NFS server. Once the share manager pod is up and running. The RWX volume attachment is considered as successful:

2021-08-10T09:58:49.404877319Z time="2021-08-10T09:58:49Z" level=info msg="Event(v1.ObjectReference{Kind:\"Volume\", Namespace:\"longhorn-system\", Name:\"alertmanager-data\", UID:\"6de827c9-4482-49d6-b308-2ae7b919d6d2\", APIVersion:\"longhorn.io/v1beta1\", ResourceVersion:\"21360477\", FieldPath:\"\"}): type: 'Normal' reason: 'Attached' volume alertmanager-data has been attached to sky-210004-ds"
2021-08-10T09:58:44.657043679Z time="2021-08-10T09:58:44Z" level=info msg="Created pod for share manager" controller=longhorn-share-manager node=sky-210004-ds owner=sky-210004-ds pod=share-manager-alertmanager-data shareManager=alertmanager-data state=starting volume=alertmanager-data

The problem is, somehow your cluster spent 4 minutes on starting this share manager pod (from 2021-08-10T09:58:44 to 2021-08-10T10:02:55). There is no error log for this share manager pod on the Longhorn side, which means the share manager workflow runs as expected.

Here is the pod state. No failure or error during initialization:

  status:
    phase: Running
    conditions:
    - type: Initialized
      status: "True"
      lastprobetime: "0001-01-01T00:00:00Z"
      lasttransitiontime: "2021-08-10T10:02:05Z"
      reason: ""
      message: ""
    - type: Ready
      status: "True"
      lastprobetime: "0001-01-01T00:00:00Z"
      lasttransitiontime: "2021-08-10T10:02:55Z"
      reason: ""
      message: ""
    - type: ContainersReady
      status: "True"
      lastprobetime: "0001-01-01T00:00:00Z"
      lasttransitiontime: "2021-08-10T10:02:55Z"
      reason: ""
      message: ""
    - type: PodScheduled
      status: "True"
      lastprobetime: "0001-01-01T00:00:00Z"
      lasttransitiontime: "2021-08-10T10:02:05Z"
      reason: ""
      message: ""
    message: ""
    reason: ""
    nominatednodename: ""
    hostip: 172.20.0.2
    podip: 10.42.3.220
    podips:
    - ip: 10.42.3.220
    starttime: "2021-08-10T10:02:05Z"
    initcontainerstatuses: []
    containerstatuses:
    - name: share-manager
      state:
        waiting: null
        running:
          startedat: "2021-08-10T10:02:06Z"
        terminated: null
      lastterminationstate:
        waiting: null
        running: null
        terminated: null
      ready: true
      restartcount: 0
      image: longhornio/longhorn-share-manager:v1_20210416
      imageid: docker-pullable://longhornio/longhorn-share-manager@sha256:c0528766fa0fd8c21632e76b0a79844fa484b3f58c53c855ac4575f4076abad1
      containerid: docker://c40ef6a3b5a25665e5a3a19af96bcdb2241027324cdd3d69cc587ebcfbf63208
      started: true
    qosclass: BestEffort
    ephemeralcontainerstatuses: []

shuo-wu commented 3 years ago

The next is:

Try to reproduce it. And monitor the resource usage when the issue happens
Provide the kubelet logs when the issue happens or containing the logs around 2021-08-10T09:58 to 2021-08-10T10:03

shuo-wu commented 3 years ago

cc @qdupuy

qdupuy commented 3 years ago

Environment:

Longhorn version: v1.1.2
Installation method (e.g. Rancher Catalog App/Helm/Kubectl): helm
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: RKE v1.20.9-rancher1-1
    Number of management node in the cluster: 3
    Number of worker node in the cluster: 1
Node config
    OS type and version: Ubuntu 21.04
    CPU per node: 12CPU
    Memory per node: 32Go
    Disk type(e.g. SSD/NVMe): NVMe
    Network bandwidth between the nodes: 1000 Mbps
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Bare-metal
Number of Longhorn volumes in the cluster: 20

qdupuy commented 3 years ago

I was able to reproduce the bug with the RWO mode.

I set up 2 replicas and migrated my pods to another node via the "nodeName: server" condition

Events:
  Type     Reason              Age                  From                     Message
  ----     ------              ----                 ----                     -------
  Warning  FailedAttachVolume  14m                  attachdetach-controller  Multi-Attach error for volume "pvc-a11eda9a-12ac-401e-96ec-1664690ce302" Volume is already used by pod(s) X-8658649d77-76jpb
  Warning  FailedMount         11m                  kubelet                  Unable to attach or mount volumes: unmounted volumes=[X], unattached volumes=[default-token-n56xz X]: timed out waiting for the condition
  Warning  FailedMount         71s (x3 over 9m12s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[X], unattached volumes=[X default-token-n56xz]: timed out waiting for the condition

What more do you need so that we can correct the problem?

shuo-wu commented 3 years ago

The case you reproduced (pod migration) is different from the original issue... We can check the pod migration case later. Now can you provide the kubelet log on node sky-210004-ds? The log should contain the message around 2021-08-10T09:58Z to 2021-08-10T10:03Z

qdupuy commented 3 years ago

Some logs :

Aug 10 06:01:05 eu-west-1 kernel: [1525043.747111] EXT4-fs warning (device sdg): htree_dirblock_to_tree:993: inode #12: lblock 0: comm kubelet: error -5 reading directory block
Aug 10 06:01:05 eu-west-1 kernel: [1525043.747245] EXT4-fs warning (device sde): htree_dirblock_to_tree:993: inode #12: lblock 0: comm kubelet: error -5 reading directory block
Aug 10 06:01:05 eu-west-1 kernel: [1525043.751945] EXT4-fs warning (device sde): htree_dirblock_to_tree:993: inode #12: lblock 0: comm kubelet: error -5 reading directory block
Aug 10 06:01:05 eu-west-1 kernel: [1525043.752429] EXT4-fs warning (device sdg): htree_dirblock_to_tree:993: inode #12: lblock 0: comm kubelet: error -5 reading directory block
Aug 10 06:01:05 eu-west-1 systemd[1]: var-lib-kubelet-pods-e07c8dbc\x2d665f\x2d4e25\x2d9da4\x2da5af096b7674-volume\x2dsubpaths-pvc\x2d2074432e\x2d05d2\x2d4c6e\x2db30c\x2de3c5c28bce67-volume\x2dpermissions-0.mount: Succeeded.
Aug 10 06:01:05 eu-west-1 systemd[1]: var-lib-kubelet-pods-8aba8446\x2d5bc8\x2d4381\x2d99a2\x2d5bc1aa8c1ee6-volume\x2dsubpaths-pvc\x2dd62d4874\x2d0769\x2d4603\x2d9b29\x2dce89fa4c57fe-volume\x2dpermissions-0.mount: Succeeded.
Aug 10 06:01:05 eu-west-1 systemd[1]: var-lib-kubelet-pods-8aba8446\x2d5bc8\x2d4381\x2d99a2\x2d5bc1aa8c1ee6-volumes-kubernetes.io\x7esecret-default\x2dtoken\x2dsj4cz.mount: Succeeded.
Aug 10 06:01:05 eu-west-1 systemd[1]: var-lib-kubelet-pods-e07c8dbc\x2d665f\x2d4e25\x2d9da4\x2da5af096b7674-volumes-kubernetes.io\x7esecret-default\x2dtoken\x2dw6xkl.mount: Succeeded.
Aug 10 06:01:05 eu-west-1 systemd[1]: var-lib-kubelet-pods-e07c8dbc\x2d665f\x2d4e25\x2d9da4\x2da5af096b7674-volume\x2dsubpaths-pvc\x2d2074432e\x2d05d2\x2d4c6e\x2db30c\x2de3c5c28bce67-wordpress-0.mount: Succeeded.
Aug 10 06:01:05 eu-west-1 systemd[1]: var-lib-kubelet-pods-8aba8446\x2d5bc8\x2d4381\x2d99a2\x2d5bc1aa8c1ee6-volume\x2dsubpaths-pvc\x2dd62d4874\x2d0769\x2d4603\x2d9b29\x2dce89fa4c57fe-wordpress-0.mount: Succeeded.
Aug 10 06:01:05 eu-west-1 systemd[1]: var-lib-kubelet-pods-e07c8dbc\x2d665f\x2d4e25\x2d9da4\x2da5af096b7674-volumes-kubernetes.io\x7ecsi-pvc\x2d2074432e\x2d05d2\x2d4c6e\x2db30c\x2de3c5c28bce67-mount.mount: Succeeded.
Aug 10 06:01:05 eu-west-1 systemd[1]: var-lib-kubelet-pods-8aba8446\x2d5bc8\x2d4381\x2d99a2\x2d5bc1aa8c1ee6-volumes-kubernetes.io\x7ecsi-pvc\x2dd62d4874\x2d0769\x2d4603\x2d9b29\x2dce89fa4c57fe-mount.mount: Succeeded.
Aug 10 06:02:41 eu-west-1 systemd[1]: var-lib-kubelet-pods-8aba8446\x2d5bc8\x2d4381\x2d99a2\x2d5bc1aa8c1ee6-volumes-kubernetes.io\x7esecret-default\x2dtoken\x2dsj4cz.mount: Succeeded.
Aug 10 06:03:30 eu-west-1 systemd[1]: var-lib-kubelet-pods-e07c8dbc\x2d665f\x2d4e25\x2d9da4\x2da5af096b7674-volumes-kubernetes.io\x7esecret-default\x2dtoken\x2dw6xkl.mount: Succeeded.
Aug 10 06:18:45 eu-west-1 systemd[1]: var-lib-kubelet-pods-22ad2309\x2dfaf1\x2d4651\x2d84af\x2de20132fcffc6-volumes-kubernetes.io\x7esecret-default\x2dtoken\x2dsj4cz.mount: Succeeded.
Aug 10 06:19:33 eu-west-1 systemd[1]: var-lib-kubelet-pods-d975b0c3\x2de4e8\x2d4090\x2d99c5\x2dcd6f03f8bf70-volumes-kubernetes.io\x7esecret-default\x2dtoken\x2dw6xkl.mount: Succeeded.
Aug 10 06:25:10 eu-west-1 systemd[1]: var-lib-kubelet-pods-c086c029\x2d81ab\x2d4f79\x2dbd98\x2db9456ded9468-volumes-kubernetes.io\x7esecret-default\x2dtoken\x2dw6xkl.mount: Succeeded.
Aug 10 06:25:10 eu-west-1 systemd[1]: var-lib-kubelet-pods-e3cd18cc\x2db300\x2d41d5\x2da7d5\x2d8d909498dc07-volumes-kubernetes.io\x7esecret-default\x2dtoken\x2dsj4cz.mount: Succeeded.
Aug 10 06:34:00 eu-west-1 systemd[1]: var-lib-kubelet-pods-11b4bc63\x2d3486\x2d47ab\x2db1f9\x2d0bb4935aad90-volumes-kubernetes.io\x7esecret-default\x2dtoken\x2dsj4cz.mount: Succeeded.
Aug 10 06:35:36 eu-west-1 systemd[1]: var-lib-kubelet-pods-48c17ee7\x2dd214\x2d495d\x2db252\x2d260ec4614d17-volumes-kubernetes.io\x7esecret-default\x2dtoken\x2dw6xkl.mount: Succeeded.
Aug 10 06:42:01 eu-west-1 systemd[1]: var-lib-kubelet-pods-1a2d8a07\x2d8eea\x2d404a\x2db755\x2d937cdedec814-volumes-kubernetes.io\x7esecret-default\x2dtoken\x2dsj4cz.mount: Succeeded.
Aug 10 06:43:38 eu-west-1 systemd[1]: var-lib-kubelet-pods-24d48432\x2d7379\x2d4de9\x2d9614\x2d5688127cea6d-volumes-kubernetes.io\x7esecret-default\x2dtoken\x2dw6xkl.mount: Succeeded.
Aug 10 10:37:51 eu-west-1 systemd[1]: var-lib-kubelet-pods-88aefca1\x2d25d8\x2d403b\x2d94f7\x2d9a102fd2e4e6-volumes-kubernetes.io\x7esecret-grafana\x2dtoken\x2dmhprw.mount: Succeeded.
Aug 10 10:37:51 eu-west-1 systemd[1]: var-lib-kubelet-pods-88aefca1\x2d25d8\x2d403b\x2d94f7\x2d9a102fd2e4e6-volumes-kubernetes.io\x7esecret-grafana\x2dauth\x2dgoogle\x2dsecret\x2dmount.mount: Succeeded.
Aug 10 10:38:44 eu-west-1 systemd[1]: var-lib-kubelet-pods-88aefca1\x2d25d8\x2d403b\x2d94f7\x2d9a102fd2e4e6-volumes-kubernetes.io\x7ecsi-grafana\x2ddata-mount.mount: Succeeded.
Aug 10 10:58:43 eu-west-1 systemd[1]: var-lib-kubelet-pods-2a6bcf49\x2d7d44\x2d427d\x2dbce6\x2dd9d810557429-volumes-kubernetes.io\x7esecret-grafana\x2dtoken\x2dmhprw.mount: Succeeded.
Aug 10 10:58:43 eu-west-1 systemd[1]: var-lib-kubelet-pods-2a6bcf49\x2d7d44\x2d427d\x2dbce6\x2dd9d810557429-volumes-kubernetes.io\x7esecret-grafana\x2dauth\x2dgoogle\x2dsecret\x2dmount.mount: Succeeded.
Aug 10 10:58:43 eu-west-1 systemd[1]: var-lib-kubelet-pods-2a6bcf49\x2d7d44\x2d427d\x2dbce6\x2dd9d810557429-volume\x2dsubpaths-config-grafana-0.mount: Succeeded.
Aug 10 10:58:43 eu-west-1 systemd[1]: var-lib-kubelet-pods-2a6bcf49\x2d7d44\x2d427d\x2dbce6\x2dd9d810557429-volumes-kubernetes.io\x7ecsi-grafana\x2ddata-mount.mount: Succeeded.
Aug 10 10:58:43 eu-west-1 systemd[1]: var-lib-kubelet-pods-2a6bcf49\x2d7d44\x2d427d\x2dbce6\x2dd9d810557429-volume\x2dsubpaths-config-grafana-2.mount: Succeeded.
Aug 10 11:00:20 eu-west-1 systemd[1]: var-lib-kubelet-pods-2a6bcf49\x2d7d44\x2d427d\x2dbce6\x2dd9d810557429-volumes-kubernetes.io\x7esecret-grafana\x2dauth\x2dgoogle\x2dsecret\x2dmount.mount: Succeeded.
Aug 10 11:00:20 eu-west-1 systemd[1]: var-lib-kubelet-pods-2a6bcf49\x2d7d44\x2d427d\x2dbce6\x2dd9d810557429-volumes-kubernetes.io\x7esecret-grafana\x2dtoken\x2dmhprw.mount: Succeeded.
Aug 10 11:00:20 eu-west-1 systemd[1]: var-lib-kubelet-pods-2a6bcf49\x2d7d44\x2d427d\x2dbce6\x2dd9d810557429-volumes-kubernetes.io\x7ecsi-grafana\x2ddata-mount.mount: Succeeded.
Aug 10 11:43:06 eu-west-1 systemd[1]: Started Kubernetes transient mount for /var/lib/kubelet/pods/5c2e193a-a608-441e-bf93-50525a35c509/volumes/kubernetes.io~csi/pvc-82910441-daa5-4179-8da7-e1283a8030b6/mount.
Aug 10 11:44:27 eu-west-1 systemd[1]: var-lib-kubelet-pods-2bc35293\x2d588d\x2d43c9\x2d8f7d\x2da360ba0700c2-volumes-kubernetes.io\x7esecret-longhorn\x2dservice\x2daccount\x2dtoken\x2dvg5sn.mount: Succeeded.

sky-210004-ds is a worker node.

This command docker logs kubelet --since 2021-08-10T09:58Z --until 2021-08-10T18:03Z provide nothing onto sky-210004-ds.

However, on the other nodes I have a result. Do you want them?

shuo-wu commented 3 years ago

The timestamps I provide may not match the format. There should be logs in kubelet during pod creation.

qdupuy commented 3 years ago

In the logs into sky-210004-ds, I see this very often :

E0816 07:35:21.574295 1811601 reconciler.go:193] operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume "storage" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^grafana-data") pod "63382382-76a5-4165-a514-b0628fddb153" (UID: "63382382-76a5-4165-a514-b0628fddb153") : UnmountVolume.NewUnmounter failed for volume "storage" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^grafana-data") pod "63382382-76a5-4165-a514-b0628fddb153" (UID: "63382382-76a5-4165-a514-b0628fddb153") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/63382382-76a5-4165-a514-b0628fddb153/volumes/kubernetes.io~csi/grafana-data/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/63382382-76a5-4165-a514-b0628fddb153/volumes/kubernetes.io~csi/grafana-data/vol_data.json]: open /var/lib/kubelet/pods/63382382-76a5-4165-a514-b0628fddb153/volumes/kubernetes.io~csi/grafana-data/vol_data.json: no such file or directory

shuo-wu commented 3 years ago

Based on the error message, the issue triggered by volume storage is related to https://github.com/kubernetes/kubernetes/issues/101911 I will check what happens and if it gets fixed later.

BTW, volume storage does not exist in the support bundle. I am not sure what happens for the volume. Hence I prefer to check the volume I mentioned above.

qdupuy commented 3 years ago

Hello,

do you want me to regenerate a new support bundle?

shuo-wu commented 3 years ago

Does your current cluster still contain the info/logs about volume storage? If YES, please re-generate it.

qdupuy commented 3 years ago

I have redone a knot, and I still have several errors on longhorn that I have already given on this and the other topic but also on slack.

I will send you a new email.

qdupuy commented 3 years ago

etcd logs :

2021-08-16 13:47:30.632868 W | etcdserver: read-only range request "key:\"/registry/leases/longhorn-system/external-attacher-leader-driver-longhorn-io\" " with result "range_response_count:1 size:538" took too long (196.88205ms) to execute
2021-08-16 13:47:31.153097 W | etcdserver: read-only range request "key:\"/registry/leases/kube-system/kube-controller-manager\" " with result "range_response_count:1 size:522" took too long (222.49227ms) to execute
2021-08-16 13:47:31.381512 W | etcdserver: read-only range request "key:\"/registry/podsecuritypolicy/\" range_end:\"/registry/podsecuritypolicy0\" count_only:true " with result "range_response_count:0 size:9" took too long (260.703295ms) to execute
2021-08-16 13:47:31.381646 W | etcdserver: read-only range request "key:\"/registry/leases/kube-system/kube-scheduler\" " with result "range_response_count:1 size:494" took too long (390.475513ms) to execute
2021-08-16 13:47:31.907976 W | etcdserver: read-only range request "key:\"/registry/events/longhorn-system/pvc-a11eda9a-12ac-401e-96ec-1664690ce302-s1h-c-1629121440-t7xqd.169bcd717a313867\" " with result "range_response_count:1 size:977" took too long (222.982466ms) to execute
2021-08-16 13:47:31.914962 W | etcdserver: read-only range request "key:\"/registry/crd.projectcalico.org/globalnetworksets/\" range_end:\"/registry/crd.projectcalico.org/globalnetworksets0\" count_only:true " with result "range_response_count:0 size:7" took too long (143.578683ms) to execute
2021-08-16 13:47:31.914979 W | etcdserver: read-only range request "key:\"/registry/leases/kube-system/kube-scheduler\" " with result "range_response_count:1 size:494" took too long (227.134138ms) to execute
2021-08-16 13:49:40.912039 I | mvcc: store.index: compact 24058185

email sended

qdupuy commented 3 years ago

up

shuo-wu commented 3 years ago

In the support bundle, there is neither volume storage nor the attachment/mount error. Please:

Check if the errors or events exist before generating & sending the support bundle
Specify the problematic volume name and the time slot (UTC time, a rough range is enough)
Attach the kubelet log of the period that the issue happens

longhorn / longhorn

[BUG] RWX volume attachment takes a long time #2879