openebs / mayastor

Dynamically provision Stateful Persistent Replicated Cluster-wide Fabric Volumes & Filesystems for Kubernetes that is provisioned from an optimized NVME SPDK backend data storage stack.
Apache License 2.0
755 stars 110 forks source link

mayastor-csi fails start on hosts with FQDN #1081

Closed bbockelm closed 1 year ago

bbockelm commented 2 years ago

Describe the bug

On a host where the Kubernetes hostname uses the FQDN, mayastor-csi fails to launch due to a disagreement between the CSI pod and the node label.

Here's the relevant log line:

[root@svc-1 ~]# kubectl logs mayastor-csi-l26nr -n mayastor --container csi-driver-registrar
I0127 22:39:40.259075       1 main.go:113] Version: v2.1.0-0-g80d42f24
I0127 22:39:40.357826       1 connection.go:153] Connecting to unix:///csi/csi.sock
I0127 22:39:40.653876       1 node_register.go:52] Starting Registration Server at: /registration/io.openebs.csi-mayastor-reg.sock
I0127 22:39:40.654065       1 node_register.go:61] Registration Server started at: /registration/io.openebs.csi-mayastor-reg.sock
I0127 22:39:40.654229       1 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: ""
I0127 22:39:41.780296       1 main.go:80] Received GetInfo call: &InfoRequest{}
I0127 22:39:42.150507       1 main.go:90] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: error updating Node object with CSI driver node info: error updating node: timed out waiting for the condition; caused by: detected topology value collision: driver reported "kubernetes.io/hostname":"svc-1" but existing label is "kubernetes.io/hostname":"svc-1.example.com",}

(log line altered to remove the actual hostname)

To Reproduce Steps to reproduce the behavior:

Tried with csi-node-driver-registrar:v2.4.0 as well. Same behavior. If I relabel the host to just have "svc-1", then the CSI is happy (but other internal operations fail because some components use use the short name and others FQDN).

Haven't found any workaround except installing the host from scratch with --hostname-override.

Expected behavior

mayastor-csi pod should start up cleanly.

OS info (please complete the following information):

Additional context Add any other context about the problem here.

hansh0801 commented 2 years ago

I am also experiencing the same issue.

kpoos commented 2 years ago

Unfortunately this bug is still there in 1.0.1 as well. Our systems are affected unfortunately. As far as I remember, it was there around the version 0.8.1 also...

csi-driver-registrar E0421 07:54:33.904677 1 main.go:92] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: error updating Node object with CSI driver node info: error updating node: timed out waiting for the condition; caused by: detected topology value collision: driver reported "kubernetes.io/hostname":"storage-int-04" but existing label is "kubernetes.io/hostname":"storage-int-04.k8s-int.sub.domain", restarting registration container.

Is there any other workaround about this than the node reinstallation with --hostname-override?

adityanmishra commented 2 years ago

We are also facing the same issue. I am using a KOPS cluster in AWS where the kubernetes.io/hostname is actually the hostname of the node. And due k8s hardening standards hostname-override is out of question for us. Is there any other temporary solution available to fix this issue

Abhinandan-Purkait commented 2 years ago

Hi, we have made a fix for this and the change is currently on develop. So for managed services where hostname differs from node name, we can remove these flags from the https://github.com/openebs/mayastor/blob/78c39719064c4600070f617756d957f69027061e/deploy/csi-daemonset.yaml#L51 https://github.com/openebs/mayastor/blob/78c39719064c4600070f617756d957f69027061e/deploy/mayastor-daemonset.yaml#L56 csi-ds and mayastor-ds respectively, and it should be able to pick up the hostnames. FYI: You would need to pass the hostname as nodename while pool creation as with the above flag removed the mayastor registers itself with the hostname.

tz-torchai commented 2 years ago

Hi, reporting the same issue for 1.0.2: https://github.com/openebs/mayastor/issues/1144#issuecomment-1173278964

Would be happy to provide any info to help this get resolved

tz-torchai commented 2 years ago

So for managed services where hostname differs from node name

Which hostname is referred to here? The output of hostname on a server or the one in etc/host? Sometimes they are different.

Winter-Guerra commented 2 years ago

I'm going to try the v1.0.2 patch and report back. However, this commit ( https://github.com/openebs/mayastor/commit/5f3e6bcc52f95d68cb3a1fd55fb14ea1d4c4f1ba ) seems related. It seems to be responsible for stripping out the first component for the FQDN hostname (or any hostname with . characters in the hostname).

This issue is similar (if not identical) to issue #1144

Abhinandan-Purkait commented 2 years ago

@Winter-Guerra Yes, rightly pointed out, the fix has been made, now we register using the nodename, the csi-driver code has now been moved to mayastor-control-plane. You can check the code from here.

Huweicai commented 2 years ago

Still facing the problem following the official deployment guide: https://mayastor.gitbook.io/introduction/quickstart/deploy-mayastor

Any progress on it? It's really frustrated for people willing to try mayastor.

Abhinandan-Purkait commented 2 years ago

Still facing the problem following the official deployment guide: https://mayastor.gitbook.io/introduction/quickstart/deploy-mayastor

Any progress on it? It's really frustrated for people willing to try mayastor.

Hi, we are to release 2.0.0 version of mayastor soon with this fix. Please try it on that.

CornWorld commented 1 year ago

Same problem when I used it on the first day, and I felt very sad.

tiagolobocastro commented 1 year ago

This should be working on 2.x, please reopen if otherwise.