openebs / mayastor

Dynamically provision Stateful Persistent Replicated Cluster-wide Fabric Volumes & Filesystems for Kubernetes that is provisioned from an optimized NVME SPDK backend data storage stack.
Apache License 2.0
754 stars 109 forks source link

rpc error: code = NotFound (node not found) #1083

Closed orboan closed 1 year ago

orboan commented 2 years ago

Describe the bug

When the client pod starts, I get this in its logs:

Events:
  Type     Reason              Age                  From                       Message
  ----     ------              ----                 ----                       -------
  Normal   Scheduled           3m11s                jupyterhub-user-scheduler  Successfully assigned labs/jupyter-orboan to kube-07
  Warning  FailedMount         69s                  kubelet                    Unable to attach or mount volumes: unmounted volumes=[docker-data mayastor-home], unattached volumes=[docker-data mayastor-home docker-sock]: timed out waiting for the condition
  Warning  FailedAttachVolume  59s (x9 over 3m11s)  attachdetach-controller    AttachVolume.Attach failed for volume "pvc-8811c307-79e8-4ddf-b71a-fc2db83137c1" : rpc error: code = NotFound desc = error in response: status code '404 Not Found', content: 'RestJsonError { details: "Node 'kube-07' not found", kind: NotFound }'
  Warning  FailedAttachVolume  59s (x9 over 3m11s)  attachdetach-controller    AttachVolume.Attach failed for volume "pvc-946fff09-ab3b-40b0-90eb-d39474888382" : rpc error: code = NotFound desc = error in response: status code '404 Not Found', content: 'RestJsonError { details: "Node 'kube-07' not found", kind: NotFound }'

To Reproduce Steps to reproduce the behavior:

Kubernetes cluster is a 10 node on-premises cluster managed with kubeadm. Only one node has been configured as a msn, with a msp.

Once followed all instructions in documentation, with these few minor adjustments:

All commands stated in documentation to check if everything is fine, are indeed fine, msp is online etc.

When the client pod app is started, the volume is created or reused as expected but never attached, with the previous error shown in the logs of the client pod.

With the previous instructions (with moac etc) it all worked fine.

Expected behavior It is expected the volume is attached and client app can make use of the volume.

Screenshots If applicable, add screenshots to help explain your problem.

OS info (please complete the following information):

Additional context Add any other context about the problem here.

tiagolobocastro commented 2 years ago

Seems the node is not seen by the control plane. Could you please get kubectl-mayastor and run kubectl-mayastor get nodes ?

bluke commented 2 years ago

I am not the original poster, but I'm experimenting with a similar use case.

kubectl-mayastor get nodes only list MSN, i.e. nodes labeled openebs.io/engine=mayastor that are currently running the mayastor pod. It does not any of the other nodes of the cluster, from which we are hoping to access storage.

vharsh commented 2 years ago

I believe this is happening because the Kubernetes' NodeNames & Kubernetes' Hostnames are different. @bluke or @orboan can you confirm if this is true on your system?

orboan commented 2 years ago

Hello,

I am so sorry but I moved to longhorn. I no longer have the mayastor deployed so I cannot try out or chek out your suggestions.

vholer commented 2 years ago

I'm experiencing the same issue as well. As @bluke, I'm trying to use the volumes on nodes, which are not labeled with openebs.io/engine=mayastor and they are not even listed in kubectl-mayastor get nodes. I.e., I have a Mayastor CSI pod there, but not Mayastor pod. I guess, the disaggregated deployment when there are dedicated storage nodes separated from the rest workload nodes and workload nodes are accesing the data completely remotely from storage nodes, is not supported.

Assuming from https://github.com/openebs/mayastor#data-plane

Data plane: Each node you wish to use for storage or storage services will have to run a Mayastor daemon set. Mayastor itself has three major components: the Nexus, a local storage component, and the mayastor-csi plugin.

and, based on the fact that non-pinned volumes were recently disabled: https://github.com/openebs/mayastor-control-plane/commit/615fd405364d0ff8f615ca7c0157642b3049fa5f

GlennBullingham commented 2 years ago

@vholer

I guess, the disaggregated deployment when there are dedicated storage nodes separated from the rest workload nodes and workload nodes are accesing the data completely remotely from storage nodes, is not supported.

That is correct. Support for the disaggregated deployment mode is planned and will target a release after version 2.0.0