kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
111.09k stars 39.67k forks source link

CSI `NodeGetId(...)` call should be supported until CSI v1.0 #68688

Closed saad-ali closed 6 years ago

saad-ali commented 6 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug /kind feature

What happened: CSI Plugin calls GetNodeInfo(...) to get node ID information (see https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/csi/csi_plugin.go#L139). If that call is not implemented by driver, registration with kubelet will fail.

What you expected to happen: Per CSI Spec, the deprecated GetNodeId(...) call should be supported until CSI 1.0 is released (see https://github.com/container-storage-interface/spec/blob/master/spec.md#rpc-interface). Therefore if a GetNodeInfo(...) by Kubernetes fails (because it is not implemented by the driver), we should fall back and try GetNodeId(...) call instead.

How to reproduce it (as minimally and precisely as possible): Use any CSI driver that implements GetNodeId(...) but not GetNodeInfo(...) -- @xing-yang says OpenSDS driver is currently hitting this, and may be used for repro.

Anything else we need to know?: When CSI moves to 1.0 we can drop this fall back logic. If a fix is available within 1.12 time frame, we can include it, but 1.12 release should NOT block on this. /sig storage

Environment:

saad-ali commented 6 years ago

@xing-yang please verify this is the issue you hit. Specifically see if you can find fmt.Errorf("error during CSI NodeGetInfo() call: %v", err) in your kubelet logs.

saad-ali commented 6 years ago

CC @hoegaarden who volunteered to take a look

xing-yang commented 6 years ago

@saad-ali I don't see "error during CSI NodeGetInfo" in my kubelet logs. I don't see glog.Infof(log("Register new plugin with name: %s at endpoint: %s", pluginName, endpoint)) so it seems that RegisterPlugin is not called.

I do see a message: "Plugin Watcher Start at /var/lib/kubelet/plugins". I don't know why RegisterPlugin is not called. I'll continue to check.

xing-yang commented 6 years ago

Here's an update. I implemented NodeGetInfo.

I followed documentation about the Kubelet Plugin Watcher: https://kubernetes-csi.github.io/docs/Setup.html#kubelet-plugin-watcher to get my plugin registered.

After that I don’t see the “driver name csi-opensdsplugin not found in the list of registred CSI drivers” any more. I see that now RegisterPlugin gets called and I see the following message in kubelet.log:

I0914 14:44:50.457102 107038 csi_plugin.go:119] kubernetes.io/csi: Register new plugin with name: csi-opensdsplugin at endpoint: /var/lib/kubelet/plugins/csi-opensdsplugin/csi.sock

So this confirmed that missing NodeGetInfo is the problem.

I still have problems but I think that’s a config issue on my side. I added the following as cluster role rule for attacher and node:

However I’m still getting the following errors in the logs:

kubelet.log: I0914 16:12:31.294303 18408 reflector.go:169] Listing and watching v1alpha1.CSIDriver from k8s.io/csi-api/pkg/client/informers/externalversions/factory.go:117 E0914 16:12:31.296470 18408 reflector.go:134] k8s.io/csi-api/pkg/client/informers/externalversions/factory.go:117: Failed to list v1alpha1.CSIDriver: csidrivers.csi.storage.k8s.io is forbidden: User "system:serviceaccount:kube-system:attachdetach-controller" cannot list resource "csidrivers" in API group "csi.storage.k8s.io" at the cluster scope

kube-controller-manager.log: I0914 16:11:51.854821 18436 reflector.go:169] Listing and watching v1alpha1.CSIDriver from k8s.io/csi-api/pkg/client/informers/externalversions/factory.go:117 E0914 16:11:51.857504 18436 reflector.go:134] k8s.io/csi-api/pkg/client/informers/externalversions/factory.go:117: Failed to list v1alpha1.CSIDriver: csidrivers.csi.storage.k8s.io is forbidden: User "system:node:127.0.0.1" cannot list resource "csidrivers" in API group "csi.storage.k8s.io" at the cluster scope

xing-yang commented 6 years ago

Did more investigation on this. Here's an update.

I set the following feature gate to true on my local cluster: alpha: CSICRDAutoInstall CSISkipAttach CSIPodInfo

I got some new errors in kubelet.log and kube-controller-manager.log:

I0916 07:22:06.001084 11879 reflector.go:169] Listing and watching v1alpha1.CSIDriver from k8s.io/csi-api/pkg/client/informers/externalversions/factory.go:117 E0916 07:22:06.002077 11879 reflector.go:134] k8s.io/csi-api/pkg/client/informers/externalversions/factory.go:117: Failed to list v1alpha1.CSIDriver: the server could not find the requested resource (get csidrivers.csi.storage.k8s.io)

I0916 07:25:24.485984 11841 reflector.go:169] Listing and watching v1alpha1.CSIDriver from k8s.io/csi-api/pkg/client/informers/externalversions/factory.go:117 E0916 07:25:24.487405 11841 reflector.go:134] k8s.io/csi-api/pkg/client/informers/externalversions/factory.go:117: Failed to list v1alpha1.CSIDriver: the server could not find the requested resource (get csidrivers.csi.storage.k8s.io)

In addition, I found the following in kube-controller-manager.log:

E0916 07:22:01.872334 11841 attach_detach_controller.go:695] failed to create CSIDrivers CRD: &v1beta1.CustomResourceDefinition{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(time.Location)(nil)}}, DeletionTimestamp:(v1.Time)(nil), DeletionGracePeriodSeconds:(int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1beta1.CustomResourceDefinitionSpec{Group:"", Version:"", Names:v1beta1.CustomResourceDefinitionNames{Plural:"", Singular:"", ShortNames:[]string(nil), Kind:"", ListKind:"", Categories:[]string(nil)}, Scope:"", Validation:(v1beta1.CustomResourceValidation)(nil), Subresources:(v1beta1.CustomResourceSubresources)(nil), Versions:[]v1beta1.CustomResourceDefinitionVersion(nil), AdditionalPrinterColumns:[]v1beta1.CustomResourceColumnDefinition(nil)}, Status:v1beta1.CustomResourceDefinitionStatus{Conditions:[]v1beta1.CustomResourceDefinitionCondition(nil), AcceptedNames:v1beta1.CustomResourceDefinitionNames{Plural:"", Singular:"", ShortNames:[]string(nil), Kind:"", ListKind:"", Categories:[]string(nil)}, StoredVersions:[]string(nil)}}, err: &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:""}, Status:"Failure", Message:"customresourcedefinitions.apiextensions.k8s.io is forbidden: User \"system:serviceaccount:kube-system:attachdetach-controller\" cannot create resource \"customresourcedefinitions\" in API group \"apiextensions.k8s.io\" at the cluster scope", Reason:"Forbidden", Details:(*v1.StatusDetails)(0xc4212af560), Code:403}}

So I made this change to add RBAC rule for customresourcedefinitions: https://github.com/kubernetes/kubernetes/pull/68714

After that, my CSI plugin is functional. I can create/attach/mount volumes successfully now.

verult commented 6 years ago

/cc

verult commented 6 years ago
xing-yang commented 6 years ago

Thanks @verult.

Tested latest code. If NodeGetInfo is implemented, we just need to make necessary changes when deploying driver-registrar sidecar container so KubeletPluginsWatcher can work. NodeGetInfo is needed for kubelet to register CSI plugins.

Also tested with and without "CSINodeInfo=true,CSIDriverRegistry=true" and manually installed CRDs using when they are enabled.

wnxn commented 6 years ago

I have the same problem when deploying my CSI plugin based on CSI v0.2.0 on Kubernetes v.1.12.0. Daemonset pods of CSI plugin cannot launch successfully.

Kubelet logs:

Oct 09 11:00:37 master kubelet[2594]: I1009 11:00:37.091415    2594 csi_plugin.go:111] kubernetes.io/csi: Trying to register a new plugin with name: csi-qingcloud endpoint: /var/lib/kubelet/plugins/csi-qingcloud/csi.sock versions: 0.2.0,0.3.0
Oct 09 11:00:37 master kubelet[2594]: I1009 11:00:37.091445    2594 csi_plugin.go:119] kubernetes.io/csi: Register new plugin with name: csi-qingcloud at endpoint: /var/lib/kubelet/plugins/csi-qingcloud/csi.sock
Oct 09 11:00:37 master kubelet[2594]: E1009 11:00:37.152534    2594 plugin_watcher.go:115] error plugin registration failed with err: error updating CSI node info in the cluster: error adding CSI driver node info: driverNodeID must not be empty: rpc error: code = Unavailable desc = transport is closing when handling create event: "/var/lib/kubelet/plugins/csi-qingcloud-reg.sock": CREATE
saad-ali commented 6 years ago

Considering Kubernetes v1.12 is already out, and we planned to deprecate the old call in CSI 1.0/Kubernetes 1.13 anyway, we'll pull of the bandaid and deprecate it now.

If you are running Kubernetes 1.13, make sure your CSI driver implements the NodeGetInfo call.

saad-ali commented 6 years ago

@wnxn looks like you might be running in to a different issue: Kubernetes v1.13 also enables Kubelet device plugin registration by default. Before upgrading to v1.13, ensure the driver-registrar CSI sidecar container for your CSI driver is configured to handle plugin registration (set the --kubelet-registration-path parameter on driver-registrar to expose a new unix domain socket to handle Kubelet Plugin Registration).

From https://github.com/kubernetes-csi/driver-registrar/blob/master/cmd/driver-registrar/main.go:

    kubeletRegistrationPath = flag.String("kubelet-registration-path", "",
        `Enables Kubelet Plugin Registration service, and returns the specified path as "endpoint" in "PluginInfo" response.
         If this option is set, the driver-registrar expose a unix domain socket to handle Kubelet Plugin Registration, 
         this socket MUST be surfaced on the host in the kubelet plugin registration directory (in addition to the CSI driver socket). 
         If plugin registration is enabled on kubelet (kubelet flag KubeletPluginsWatcher is set), then this option should be set
         and the value should be the path of the CSI driver socket on the host machine.`)
wnxn commented 6 years ago

Hi,@saad-ali Because KubeletPluginsWatcher goes Beta and is enabled by default in Kubernetes v1.12. After disabling this feater gate, I can run CSI plugin based on CSI spec v0.2.0 in Kubernetes v1.12.2. When Kubelet enable KubeletPluginsWatcher, Kubelet would call GetNodeInfo gRPC call to register CSI plugin. GetNodeInfo gRPC call was first proposed at CSI spec v0.3.0. Therefore, user should disable KubeletPluginsWatcher feature gate when running CSI plugin based on CSI spec v0.2.0 in Kubernetes v1.12.

CSI plugin Kubernetes version feature-gate CSI interface used for registration
based on CSI v0.2.0 v1.12 KubeletPluginsWatcher=false NodeGetId
based on CSI v0.3.0 v1.12 KubeletPluginsWatcher=true NodeGetInfo
zhucan commented 5 years ago

@saad-ali I have implement the NodeGetInfo call. But it still has this issue. My kubernetes version is 1.14.3.