Closed eyanez111 closed 2 years ago
Try: kubectl -n ntnx-system logs csi-provisioner-ntnx-plugin-0 -c ntnx-csi-plugin
This is what I got
kubectl -n ntnx-system logs csi-provisioner-ntnx-plugin-0 -c ntnx-csi-plugin.
error: container ntnx-csi-plugin. is not valid for pod csi-provisioner-ntnx-plugin-0
apparently is not valid
Ok I actually made it work... I just needed to wait after I deployed the pods again:
kubectl -n ntnx-system logs csi-provisioner-ntnx-plugin-0 -c ntnx-csi-plugin I0112 01:19:26.393641 1 ntnx_driver.go:84] Enabling volume access mode: SINGLE_NODE_WRITER I0112 01:19:26.393716 1 ntnx_driver.go:84] Enabling volume access mode: MULTI_NODE_READER_ONLY I0112 01:19:26.393720 1 ntnx_driver.go:84] Enabling volume access mode: MULTI_NODE_MULTI_WRITER I0112 01:19:26.393723 1 ntnx_driver.go:94] Enabling controller service capability: CREATE_DELETE_VOLUME I0112 01:19:26.393727 1 ntnx_driver.go:94] Enabling controller service capability: EXPAND_VOLUME I0112 01:19:26.393729 1 ntnx_driver.go:94] Enabling controller service capability: CLONE_VOLUME I0112 01:19:26.393732 1 ntnx_driver.go:94] Enabling controller service capability: CREATE_DELETE_SNAPSHOT I0112 01:19:26.393738 1 ntnx_driver.go:104] Enabling node service capability: GET_VOLUME_STATS I0112 01:19:26.393741 1 ntnx_driver.go:104] Enabling node service capability: STAGE_UNSTAGE_VOLUME I0112 01:19:26.393750 1 ntnx_driver.go:104] Enabling node service capability: EXPAND_VOLUME I0112 01:19:26.393755 1 ntnx_driver.go:145] Driver: csi.nutanix.com I0112 01:19:26.393963 1 server.go:98] Listening for connections on address: &net.UnixAddr{Name:"//var/lib/csi/sockets/pluginproxy/csi.sock", Net:"unix"} 2022-01-12T01:19:26.829Z identity.go:23: [INFO] Using default GetPluginInfo 2022-01-12T01:19:26.83Z identity.go:39: [INFO] Using default GetPluginCapabilities 2022-01-12T01:19:26.902Z identity.go:23: [INFO] Using default GetPluginInfo 2022-01-12T01:19:26.902Z identity.go:39: [INFO] Using default GetPluginCapabilities 2022-01-12T01:19:26.943Z identity.go:23: [INFO] Using default GetPluginInfo 2022-01-12T01:19:27.204Z identity.go:23: [INFO] Using default GetPluginInfo
does this mean it will work now?
Thanks Francisco Yanez
I did find also a few things as I pointed to another container:
kubectl -n ntnx-system logs csi-provisioner-ntnx-plugin-0 -c csi-resizer
I0112 01:19:25.823514 1 main.go:90] Version : v1.2.0
I0112 01:19:25.823555 1 feature_gate.go:243] feature gates: &{map[]}
I0112 01:19:25.825050 1 connection.go:153] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
I0112 01:19:26.825774 1 common.go:111] Probing CSI driver for readiness
I0112 01:19:26.825792 1 connection.go:182] GRPC call: /csi.v1.Identity/Probe
I0112 01:19:26.825797 1 connection.go:183] GRPC request: {}
I0112 01:19:26.829425 1 connection.go:185] GRPC response: {}
I0112 01:19:26.829475 1 connection.go:186] GRPC error: <nil>
I0112 01:19:26.829484 1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginInfo
I0112 01:19:26.829487 1 connection.go:183] GRPC request: {}
I0112 01:19:26.829803 1 connection.go:185] GRPC response: {"name":"csi.nutanix.com","vendor_version":"v1.1.0"}
I0112 01:19:26.829845 1 connection.go:186] GRPC error: <nil>
I0112 01:19:26.829852 1 main.go:138] CSI driver name: "csi.nutanix.com"
I0112 01:19:26.829860 1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0112 01:19:26.829863 1 connection.go:183] GRPC request: {}
I0112 01:19:26.830312 1 connection.go:185] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"VolumeExpansion":{"type":1}}}]}
I0112 01:19:26.830400 1 connection.go:186] GRPC error: <nil>
I0112 01:19:26.830412 1 connection.go:182] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0112 01:19:26.830414 1 connection.go:183] GRPC request: {}
I0112 01:19:26.830788 1 connection.go:185] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":5}}}]}
I0112 01:19:26.830901 1 connection.go:186] GRPC error: <nil>
I0112 01:19:26.831035 1 main.go:166] ServeMux listening at ":9810"
I0112 01:19:26.831184 1 controller.go:251] Starting external resizer csi.nutanix.com
I0112 01:19:26.831424 1 reflector.go:219] Starting reflector *v1.PersistentVolumeClaim (10m0s) from k8s.io/client-go/informers/factory.go:134
I0112 01:19:26.831437 1 reflector.go:255] Listing and watching *v1.PersistentVolumeClaim from k8s.io/client-go/informers/factory.go:134
I0112 01:19:26.831523 1 reflector.go:219] Starting reflector *v1.PersistentVolume (10m0s) from k8s.io/client-go/informers/factory.go:134
I0112 01:19:26.831534 1 reflector.go:255] Listing and watching *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:134
I0112 01:19:26.931397 1 shared_informer.go:270] caches populated
I0112 01:19:26.931479 1 controller.go:291] Started PVC processing "centralized-logging/data-my-cluster-zookeeper-0"
I0112 01:19:26.931493 1 controller.go:291] Started PVC processing "centralized-logging/data-my-cluster-zookeeper-1"
I0112 01:19:26.931497 1 controller.go:291] Started PVC processing "centralized-logging/data-my-cluster-zookeeper-2"
W0112 01:19:26.931523 1 controller.go:318] PV "" bound to PVC centralized-logging/data-my-cluster-zookeeper-2 not found
W0112 01:19:26.931496 1 controller.go:318] PV "" bound to PVC centralized-logging/data-my-cluster-zookeeper-0 not found
W0112 01:19:26.931513 1 controller.go:318] PV "" bound to PVC centralized-logging/data-my-cluster-zookeeper-1 not found
I0112 01:27:27.839571 1 reflector.go:530] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.PersistentVolumeClaim total 0 items received
I0112 01:29:08.840254 1 reflector.go:530] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.PersistentVolume total 0 items received
I0112 01:29:26.839352 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
I0112 01:29:26.839428 1 controller.go:291] Started PVC processing "centralized-logging/data-my-cluster-zookeeper-0"
W0112 01:29:26.839438 1 controller.go:318] PV "" bound to PVC centralized-logging/data-my-cluster-zookeeper-0 not found
I0112 01:29:26.839475 1 controller.go:291] Started PVC processing "centralized-logging/data-my-cluster-zookeeper-1"
W0112 01:29:26.839482 1 controller.go:318] PV "" bound to PVC centralized-logging/data-my-cluster-zookeeper-1 not found
I0112 01:29:26.839487 1 controller.go:291] Started PVC processing "centralized-logging/data-my-cluster-zookeeper-2"
W0112 01:29:26.839499 1 controller.go:318] PV "" bound to PVC centralized-logging/data-my-cluster-zookeeper-2 not found
I0112 01:35:46.841463 1 reflector.go:530] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.PersistentVolumeClaim total 0 items received
I0112 01:36:19.842247 1 reflector.go:530] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.PersistentVolume total 0 items received
I0112 01:39:26.840519 1 reflector.go:381] k8s.io/client-go/informers/factory.go:134: forcing resync
I0112 01:39:26.840641 1 controller.go:291] Started PVC processing "centralized-logging/data-my-cluster-zookeeper-0"
W0112 01:39:26.840657 1 controller.go:318] PV "" bound to PVC centralized-logging/data-my-cluster-zookeeper-0 not found
I0112 01:39:26.840680 1 controller.go:291] Started PVC processing "centralized-logging/data-my-cluster-zookeeper-1"
I0112 01:39:26.840697 1 controller.go:291] Started PVC processing "centralized-logging/data-my-cluster-zookeeper-2"
W0112 01:39:26.840700 1 controller.go:318] PV "" bound to PVC centralized-logging/data-my-cluster-zookeeper-1 not found
W0112 01:39:26.840707 1 controller.go:318] PV "" bound to PVC centralized-logging/data-my-cluster-zookeeper-2 not found
I then checked those same PVC that says not found and see that are still pending:
paoc@LAP-FYANEZ:~/csi-driver-nutanix$ kubectl get pvc -n centralized-logging
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-my-cluster-zookeeper-0 Pending 154m
data-my-cluster-zookeeper-1 Pending 154m
data-my-cluster-zookeeper-2 Pending 154m
kubectl -n ntnx-system logs csi-provisioner-ntnx-plugin-0 -c csi-snapshotter
I0112 01:19:25.941125 1 main.go:87] Version: v3.0.3
I0112 01:19:25.942442 1 connection.go:153] Connecting to unix:///csi/csi.sock
W0112 01:19:26.944268 1 metrics.go:333] metrics endpoint will not be started because `metrics-address` was not specified.
I0112 01:19:26.944288 1 common.go:111] Probing CSI driver for readiness
I0112 01:19:26.945008 1 snapshot_controller_base.go:111] Starting CSI snapshotter
kubectl -n ntnx-system logs csi-provisioner-ntnx-plugin-0 -c liveness-probe
I0112 01:19:27.203386 1 main.go:149] calling CSI driver to discover driver name
I0112 01:19:27.204450 1 main.go:155] CSI driver name: "csi.nutanix.com"
I0112 01:19:27.204467 1 main.go:183] ServeMux listening at ":9807"
And then I checked the logs on the other pods and found this errors:
kubectl -n ntnx-system logs csi-node-ntnx-plugin-hxq2p -c driver-registrar
I0112 01:19:25.464520 1 main.go:113] Version: v2.2.0
I0112 01:19:25.465204 1 main.go:137] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0112 01:19:25.465227 1 connection.go:153] Connecting to unix:///csi/csi.sock
I0112 01:19:26.466657 1 main.go:144] Calling CSI driver to discover driver name
I0112 01:19:26.466683 1 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginInfo
I0112 01:19:26.466688 1 connection.go:183] GRPC request: {}
I0112 01:19:26.469112 1 connection.go:185] GRPC response: {"name":"csi.nutanix.com","vendor_version":"v1.1.0"}
I0112 01:19:26.469168 1 connection.go:186] GRPC error: <nil>
I0112 01:19:26.469174 1 main.go:154] CSI driver name: "csi.nutanix.com"
I0112 01:19:26.469206 1 node_register.go:52] Starting Registration Server at: /registration/csi.nutanix.com-reg.sock
I0112 01:19:26.469352 1 node_register.go:61] Registration Server started at: /registration/csi.nutanix.com-reg.sock
I0112 01:19:26.469399 1 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: ""
I0112 01:19:26.659194 1 main.go:80] Received GetInfo call: &InfoRequest{}
I0112 01:19:26.680026 1 main.go:90] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
kubectl -n ntnx-system logs csi-node-ntnx-plugin-hxq2p -c csi-node-ntnx-plugin
I0112 01:19:25.864461 1 ntnx_driver.go:84] Enabling volume access mode: SINGLE_NODE_WRITER
I0112 01:19:25.864561 1 ntnx_driver.go:84] Enabling volume access mode: MULTI_NODE_READER_ONLY
I0112 01:19:25.864564 1 ntnx_driver.go:84] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I0112 01:19:25.864567 1 ntnx_driver.go:94] Enabling controller service capability: CREATE_DELETE_VOLUME
I0112 01:19:25.864570 1 ntnx_driver.go:94] Enabling controller service capability: EXPAND_VOLUME
I0112 01:19:25.864573 1 ntnx_driver.go:94] Enabling controller service capability: CLONE_VOLUME
I0112 01:19:25.864575 1 ntnx_driver.go:94] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I0112 01:19:25.864579 1 ntnx_driver.go:104] Enabling node service capability: GET_VOLUME_STATS
I0112 01:19:25.864582 1 ntnx_driver.go:104] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0112 01:19:25.864584 1 ntnx_driver.go:104] Enabling node service capability: EXPAND_VOLUME
I0112 01:19:25.864590 1 ntnx_driver.go:145] Driver: csi.nutanix.com
I0112 01:19:25.865024 1 server.go:98] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
2022-01-12T01:19:26.468Z identity.go:23: [INFO] Using default GetPluginInfo
2022-01-12T01:19:26.66Z node.go:215: [INFO] NodeGetInfo called with req: &csi.NodeGetInfoRequest{XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
2022-01-12T01:19:26.698Z identity.go:23: [INFO] Using default GetPluginInfo
kubectl -n ntnx-system logs csi-node-ntnx-plugin-hxq2p -c liveness-probe
I0112 01:19:26.698098 1 main.go:149] calling CSI driver to discover driver name
I0112 01:19:26.699082 1 main.go:155] CSI driver name: "csi.nutanix.com"
I0112 01:19:26.699100 1 main.go:183] ServeMux listening at ":9808"
Thanks Francisco yanez
Please follow the example from here https://github.com/nutanix/csi-plugin/tree/master/example/ABS to test your deployment.
Sorry I am a bit confused. I need to do those instead of the helm or after the helm installation?
also on this example:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: acs-abs
provisioner: csi.nutanix.com
parameters:
csi.storage.k8s.io/provisioner-secret-name: ntnx-secret
csi.storage.k8s.io/provisioner-secret-namespace: kube-system
csi.storage.k8s.io/node-publish-secret-name: ntnx-secret
csi.storage.k8s.io/node-publish-secret-namespace: kube-system
csi.storage.k8s.io/controller-expand-secret-name: ntnx-secret
csi.storage.k8s.io/controller-expand-secret-namespace: kube-system
csi.storage.k8s.io/fstype: ext4 --> **how can I determine this?**
dataServiceEndPoint: 10.5.65.156:3260
storageContainer: default-container-30293
storageType: NutanixVolumes --> **Also how do I know what is the type of my nutanix cluster?**
#whitelistIPMode: ENABLED
#chapAuth: ENABLED
allowVolumeExpansion: true
reclaimPolicy: Delete
Thanks, I think we are close. Francisco Yanez
You need to do this after deploying CSI driver using helm. Spend some time going through: https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_5:csi-csi-plugin-storage-c.html and https://kubernetes.io/docs/concepts/storage/
Hello @eyanez111 you are definitely everywhere :-D
are you always on your rancher cluster ? how did you install the CSI driver ? he is available from the rancher partner marketplace and you can directly create storage class during the configuration with a wizard
exemple
Hello @eyanez111 you are definitely everywhere :-D
are you always on your rancher cluster ? how did you install the CSI driver ? he is available from the rancher partner marketplace and you can directly create storage class during the configuration with a wizard
exemple
Hello @tuxtof,
Yes I am pretty active here lately 👍 ... As per your questions:
are you always on your rancher cluster ? Mainly yes, I am doing tons of testing to see if we take this cluster to production. We also have Karbon but those are our dev, SAT and UAT environments.
how did you install the CSI driver ? Using this tutorial https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_5:CSI-Volume-Driver-v2_5
I did install it manually using helm, would it be better if I uninstall and do it via the marketplace?
I do not find the marketplace. How can I get there? The UI is a bit different than the one in the Rancher documentation (not sure if rancher docs are out of date:
Also once it is installed (via helm or via GUI). I do need to define the StorageClass?
Do you know if I can install longhorn and if it will work with Nutanix SSD's?
Thanks for the help really appreciate it :D Francisco Yanez
Hello @eyanez111
Yes for Rancher , the best is to use their marketplace
you will find it in Explore Cluster -> your cluster and next the left Apps & Marketplace / Charts menu
you can directly define the storage class during install of the csi storage chart, so no need to care about the syntax
Using longhorn may potentially work but it is a little bit inconsistent because the Nutanix stack do already approximately the same kind of job
Hi I cannot install it. I was able to make it work (partially still my pods will get errors with the PVC) via helm and I used the same credentials. Here are the parameters I am putting:
This are the erros: helm-operation-ffgsx_undefined.log
and this is the file w the changes I did: differences-config.txt
not sure what is wrong there... any way to check if the cluster is on xfs or ext4? that might be the problem? I am doing it on ext4
Hi
what is the output of the chart install ? i can't debug without it did you start with a clean rancher cluster, because if you try to install it on top of the one where you already install manually it can cause trouble concerning ext4 or ifs it is your choice not related to the cluster and an existing state
How do I get the output from a fresh install? No, I did a fresh installation 2 days a go for this cluster. Is there a way to add those driver during the installation?
Thanks Francisco
when you launch the chart install you have a window who open with the helm output give me the content
no it is a two step process, first you install your cluster next you deploy driver
ah yes the output is this helm-operation-ffgsx_undefined.log
thanks
to error here
you ask to install service monitor but you don't install the rancher monitoring stack you don't install the nutanix-snapshot chart before the storage one
I did installed it, so do I need to remove it? that was the first thing I did I installed the snapshot.
Hello @tuxtof
I think I am getting closer and closer. I found on that I was pointing at SC that I created. The installation creates a default named nutanix-volume . So I deleted the app , the PVC and added the name of the class as nutanix-volume instead of acs-abs. Then I check on the GUI that there was no Mount Options
After those changes I deployed again and found this:
kubectl describe pvc/data-my-cluster-zookeeper-0 -n centralized-logging
Name: data-my-cluster-zookeeper-0
Namespace: centralized-logging
StorageClass: nutanix-volume
Status: Bound
Volume: pvc-b2415b65-f722-4615-9f6e-91f0d83d9b43
Labels: app.kubernetes.io/instance=my-cluster
app.kubernetes.io/managed-by=strimzi-cluster-operator
app.kubernetes.io/name=zookeeper
app.kubernetes.io/part-of=strimzi-my-cluster
strimzi.io/cluster=my-cluster
strimzi.io/kind=Kafka
strimzi.io/name=my-cluster-zookeeper
Annotations: pv.kubernetes.io/bind-completed: yes
strimzi.io/delete-claim: false
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 100Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: my-cluster-zookeeper-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 16m com.nutanix.csi_worker-nodes4_68c561d9-f786-4a4c-95ab-cdee76332408 External provisioner is provisioning volume for claim "centralized-logging/data-my-cluster-zookeeper-0"
Normal ExternalProvisioning 16m persistentvolume-controller waiting for a volume to be created, either by external provisioner "com.nutanix.csi" or manually created by system administrator
Normal ProvisioningSucceeded 16m com.nutanix.csi_worker-nodes4_68c561d9-f786-4a4c-95ab-cdee76332408 Successfully provisioned volume pvc-b2415b65-f722-4615-9f6e-91f0d83d9b43
So the PVC is working! Now the pods are the problem now:
kubectl get pods -n centralized-logging
NAME READY STATUS RESTARTS AGE
my-cluster-zookeeper-0 0/1 ContainerCreating 0 17m
my-cluster-zookeeper-1 0/1 ContainerCreating 0 17m
my-cluster-zookeeper-2 0/1 ContainerCreating 0 17m
strimzi-cluster-operator-76b49577c5-b62ln 1/1 Running 0 3d5h
The pods are stock in containercreating. I looked at the logs and described and found this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> 0/9 nodes are available: 9 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling <unknown> 0/9 nodes are available: 9 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled <unknown> Successfully assigned centralized-logging/my-cluster-zookeeper-0 to worker-nodes6
Warning FailedMount 17m kubelet, worker-nodes6 Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data zookeeper-metrics-and-logging zookeeper-nodes cluster-ca-certs my-cluster-zookeeper-token-hzb9g strimzi-tmp]: timed out waiting for the condition
Warning FailedMount 5m45s (x3 over 19m) kubelet, worker-nodes6 Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[cluster-ca-certs my-cluster-zookeeper-token-hzb9g strimzi-tmp data zookeeper-metrics-and-logging zookeeper-nodes]: timed out waiting for the condition
Warning FailedMount 3m30s (x2 over 12m) kubelet, worker-nodes6 Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[zookeeper-nodes cluster-ca-certs my-cluster-zookeeper-token-hzb9g strimzi-tmp data zookeeper-metrics-and-logging]: timed out waiting for the condition
Warning FailedMount 74s (x4 over 21m) kubelet, worker-nodes6 Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[strimzi-tmp data zookeeper-metrics-and-logging zookeeper-nodes cluster-ca-certs my-cluster-zookeeper-token-hzb9g]: timed out waiting for the condition
Warning FailedMount 53s (x19 over 23m) kubelet, worker-nodes6 MountVolume.SetUp failed for volume "pvc-b2415b65-f722-4615-9f6e-91f0d83d9b43" : rpc error: code = InvalidArgument desc = nutanix: iSCSI portal info is missing for 3d06afb8-231e-4e34-a289-3ce9a8fef581, err: <nil>
paoc@LAP-FYANEZ:~/centralized-logs/strimzi$ kubectl logs pod/my-cluster-zookeeper-0 -n centralized-logging
Error from server (BadRequest): container "zookeeper" in pod "my-cluster-zookeeper-0" is waiting to start: ContainerCreating
Seems there is this nutanix: iSCSI portal info is missing and that is causing the volumes to not be able to mount.
thanks for all the help. I am close to finish this and start deploying prod clusters w nutanix/rancher driver! Francisco Yanez
Hello Francisco
prerequisites is missing on your worker nodes on ubuntu for exemple
runcmd:
Hello @tuxtof,
I did installed nfs-common, let me update and enable iscsid
thanks Francisco
Hello @tuxtof,
I did the changes manually just to make sure all installed but I keep getting the pods stuck in creatingpods and I get this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> 0/9 nodes are available: 9 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling <unknown> 0/9 nodes are available: 9 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled <unknown> Successfully assigned centralized-logging/my-cluster-zookeeper-0 to worker-nodes3
Warning FailedMount 42s (x3 over 43s) kubelet, worker-nodes3 MountVolume.MountDevice failed for volume "pvc-bf0d9deb-aaf9-417f-948c-53f2ca8ee806" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name com.nutanix.csi not found in the list of registered CSI drivers
Warning FailedMount 9s (x4 over 39s) kubelet, worker-nodes3 MountVolume.SetUp failed for volume "pvc-bf0d9deb-aaf9-417f-948c-53f2ca8ee806" : rpc error: code = InvalidArgument desc = nutanix: iSCSI portal info is missing for 3c7e0a57-982f-4afa-b353-8449c7a74ed5, err: <nil>
any idea of what else would be missing?
thanks Francisco
Hello @tuxtof,
I continued to work on it today and I found a few videos. Apparently I am missing on serviceaccounts one named attacher:
kubectl get serviceaccounts -A | grep csi
ntnx-system csi-node-ntnx-plugin 1 4d2h
ntnx-system csi-provisioner 1 4d2h
Do you think I am missing that? or that is just for older version the video is from 2018 and if I am how do I install it? the rancher market place did not installed it.
thanks Francisco
Nutanix CSI driver does not use attacher now. Please open a Nutanix support case to get help with your setup.
Hello @eyanez111
the CSI Helm chart from the Rancher marketplace is fully functional and contain all the needed components.
looking your logs i see multiple strange things
can you give me an output of the following command please
kubectl get sc -o yaml kubectl get csidrivers -o yaml helm list -A
in the namespace where you install the drivers/charts kubectl get pods helm get values nutanix-csi-storage < maybe you need to change the name with the one you use for the deployed chart helm get values nutanix-csi-snapshot < maybe you need to change the name with the one you use for the deployed chart
Hello @tuxtof,
Thanks for the help. Let me get you that:
kubectl get sc -o yaml
apiVersion: v1
items:
- allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
meta.helm.sh/release-name: nutanix-csi-storage
meta.helm.sh/release-namespace: kube-system
storageclass.kubernetes.io/is-default-class: "true"
creationTimestamp: "2022-01-19T18:19:16Z"
labels:
app.kubernetes.io/managed-by: Helm
managedFields:
- apiVersion: storage.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:allowVolumeExpansion: {}
f:metadata:
f:annotations:
.: {}
f:meta.helm.sh/release-name: {}
f:meta.helm.sh/release-namespace: {}
f:storageclass.kubernetes.io/is-default-class: {}
f:labels:
.: {}
f:app.kubernetes.io/managed-by: {}
f:parameters:
.: {}
f:csi.storage.k8s.io/controller-expand-secret-name: {}
f:csi.storage.k8s.io/controller-expand-secret-namespace: {}
f:csi.storage.k8s.io/fstype: {}
f:csi.storage.k8s.io/node-publish-secret-name: {}
f:csi.storage.k8s.io/node-publish-secret-namespace: {}
f:csi.storage.k8s.io/provisioner-secret-name: {}
f:csi.storage.k8s.io/provisioner-secret-namespace: {}
f:isSegmentedIscsiNetwork: {}
f:storageContainer: {}
f:storageType: {}
f:provisioner: {}
f:reclaimPolicy: {}
f:volumeBindingMode: {}
manager: Go-http-client
operation: Update
time: "2022-01-19T18:19:16Z"
name: nutanix-volume
resourceVersion: "2978176"
selfLink: /apis/storage.k8s.io/v1/storageclasses/nutanix-volume
uid: 181146c1-d098-48c4-9d2f-934e0502b5dc
parameters:
csi.storage.k8s.io/controller-expand-secret-name: ntnx-secret
csi.storage.k8s.io/controller-expand-secret-namespace: kube-system
csi.storage.k8s.io/fstype: ext4
csi.storage.k8s.io/node-publish-secret-name: ntnx-secret
csi.storage.k8s.io/node-publish-secret-namespace: kube-system
csi.storage.k8s.io/provisioner-secret-name: ntnx-secret
csi.storage.k8s.io/provisioner-secret-namespace: kube-system
isSegmentedIscsiNetwork: "true"
storageContainer: default-container-67424176636311
storageType: NutanixVolumes
provisioner: csi.nutanix.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
kind: List
metadata:
resourceVersion: ""
selfLink: ""
kubectl get csidrivers -o yaml
apiVersion: v1
items:
- apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
annotations:
meta.helm.sh/release-name: nutanix-csi-storage
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2022-01-19T18:19:16Z"
labels:
app.kubernetes.io/managed-by: Helm
managedFields:
- apiVersion: storage.k8s.io/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:meta.helm.sh/release-name: {}
f:meta.helm.sh/release-namespace: {}
f:labels:
.: {}
f:app.kubernetes.io/managed-by: {}
f:spec:
f:attachRequired: {}
f:podInfoOnMount: {}
f:volumeLifecycleModes:
.: {}
v:"Persistent": {}
manager: Go-http-client
operation: Update
time: "2022-01-19T18:19:16Z"
name: csi.nutanix.com
resourceVersion: "2978198"
selfLink: /apis/storage.k8s.io/v1/csidrivers/csi.nutanix.com
uid: a7668568-f50d-4737-b777-5286365d92b3
spec:
attachRequired: false
podInfoOnMount: true
volumeLifecycleModes:
- Persistent
kind: List
metadata:
resourceVersion: ""
selfLink: ""
heml list -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
fleet-agent-c-8zc6z cattle-fleet-system 3 2022-01-11 22:06:51.611545228 +0000 UTC deployed fleet-agent-c-8zc6z-v0.0.0+s-62ff0e388e0cc43b5cd041ae6f39561b05d16a9f361b7501177c11349d72b
nutanix-csi-snapshot kube-system 1 2022-01-19 18:16:39.274791956 +0000 UTC deployed nutanix-csi-snapshot-1.0.0 1.0.0
nutanix-csi-storage kube-system 1 2022-01-19 18:19:15.989482312 +0000 UTC deployed nutanix-csi-storage-2.5.0 2.5.0
rancher-monitoring cattle-monitoring-system 1 2022-01-14 19:47:41.55555458 +0000 UTC deployed rancher-monitoring-100.1.0+up19.0.3 0.50.0
rancher-monitoring-crd cattle-monitoring-system 1 2022-01-14 19:47:32.119235294 +0000 UTC deployed rancher-monitoring-crd-100.1.0+up19.0.3
kubectl pods
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-655c554569-k72sc 1/1 Running 0 47h
canal-4bb7s 2/2 Running 0 47h
canal-7rpc2 2/2 Running 0 47h
canal-87wm2 2/2 Running 0 2d
canal-bxj6g 2/2 Running 0 47h
canal-cplc8 2/2 Running 0 47h
canal-hzv4v 2/2 Running 0 47h
canal-l6m46 2/2 Running 0 47h
canal-mvgwk 2/2 Running 0 47h
canal-zdplh 2/2 Running 0 47h
coredns-7cc5cfbd77-225fh 1/1 Running 0 47h
coredns-7cc5cfbd77-fsqfw 1/1 Running 0 47h
coredns-autoscaler-76f8869cc9-c8fq4 1/1 Running 0 47h
csi-node-ntnx-plugin-488pb 3/3 Running 0 27m
csi-node-ntnx-plugin-492xt 3/3 Running 0 27m
csi-node-ntnx-plugin-4svvx 3/3 Running 0 27m
csi-node-ntnx-plugin-8trfj 3/3 Running 0 27m
csi-node-ntnx-plugin-dwfcg 3/3 Running 0 27m
csi-node-ntnx-plugin-tln9v 3/3 Running 0 27m
csi-provisioner-ntnx-plugin-0 5/5 Running 0 27m
metrics-server-54788574fd-gnwff 1/1 Running 0 47h
snapshot-controller-0 1/1 Running 0 30m
snapshot-validation-deployment-66849d5586-lslqt 1/1 Running 0 30m
snapshot-validation-deployment-66849d5586-zsrvv 1/1 Running 0 30m
helm get values nutanix-csi-storage:
USER-SUPPLIED VALUES:
defaultStorageClass: volume
fsType: ext4
global:
cattle:
clusterId: c-8zc6z
clusterName: cl-prod-nutanix-rancher-cae
rkePathPrefix: ""
rkeWindowsPathPrefix: ""
systemDefaultRegistry: ""
url: https://rancher.rd.zedev.net
systemDefaultRegistry: ""
networkSegmentation: true
password: #%$&$*(*)
prismEndPoint: 172.22.4.101
servicemonitor:
enabled: true
storageContainer: default-container-67424176636311
username: rancher-user
volumeClass: true
This values are correct
helm get values nutanix-csi-snapsnot
USER-SUPPLIED VALUES:
global:
cattle:
clusterId: c-8zc6z
clusterName: cl-prod-nutanix-rancher-cae
rkePathPrefix: ""
rkeWindowsPathPrefix: ""
systemDefaultRegistry: ""
url: https://rancher.rd.zedev.net
systemDefaultRegistry: ""
One thing that I noticed is that you mentioned that I needed to install storage first and snapshot later. I could not do it like that I was getting errors saying I needed snapshot if I install snapshot first.
Thanks again for the help Francisco
the correct order is nutanix-snapshot first and nutanix-storage next, that's why i say before, we are agree
the potential error i see is the networkSegmentation settings at least you make specific related configuration on the nutanix side it need to be set to false
all the other parameters seems ok can you confirm 172.22.4.101 is your Prism Element VIP ? (not the same you use for rancher node driver who is Prism Central)
last point be sure to make the entire test on a fresh new cluster because as you make a lot of experimentation with and without helm i'm always afraid there is some old piece stuck somewhere
let me resume the entire process
deploy a fresh rancher cluster with the correct node dependency (iscsi and nfs) deploy monitoring extension deploy nutanix-snapshot helm chart deploy nutanix-storage helm chart and use the integrated wizard to create storage class(es)
and that's all starting here you will be able to create PVC
ok let me do that from scratch on another cluster and I will report.
Thanks for all the help @tuxtof
Hello @tuxtof,
I just wanted to report back. You were right! the installing the helm manually, then removing it, then installing the helm from the market place and for what I think it was in the wrong order (at least the first time). Made it unstable. I followed the instructions just as describe:
deploy a fresh rancher cluster with the correct node dependency (iscsi and nfs) deploy monitoring extension deploy nutanix-snapshot helm chart deploy nutanix-storage helm chart and use the integrated wizard to create storage class(es)
and it worked. Also I worked with support on a new cluster and installed the 2.4 version using kubectl and it worked. So it is tested and proven that it is a fully working driver.
Thanks so much. I will report once we are done testing and we go live. You have been helping me since the beginning. Francisco Yanez
Good news
Hello I am trying to assign an storage class to a Rancher cluster. I followed this tutorial: I followed this process: https://portal.nutanix.com/page/documents/details?targetId=CSI-Volume-Driver-v2_5:CSI-Volume-Driver-v2_5
I saw that there were some fields that I needed to edit: https://github.com/nutanix/helm/blob/nutanix-csi-storage-2.5.0/charts/nutanix-csi-storage/values.yaml
so I downloaded the file and added the requested fields like: prismEndPoint:
username: password:
The containers start: kubectl get pods -A | grep csi ntnx-system csi-node-ntnx-plugin-ckh2l 3/3 Running 0 12m ntnx-system csi-node-ntnx-plugin-dn5br 3/3 Running 0 12m ntnx-system csi-node-ntnx-plugin-h4s9c 3/3 Running 0 12m ntnx-system csi-node-ntnx-plugin-kzhn7 3/3 Running 0 12m ntnx-system csi-node-ntnx-plugin-slzzw 3/3 Running 0 12m ntnx-system csi-node-ntnx-plugin-wftpg 3/3 Running 0 12m ntnx-system csi-provisioner-ntnx-plugin-0 5/5 Running 0 12m
but when I check the logs I am getting this error on all the containers: error: a container name must be specified for pod csi-provisioner-ntnx-plugin-0, choose one of: [csi-provisioner csi-resizer csi-snapshotter ntnx-csi-plugin liveness-probe]
not sure how to fix that or if I am missing anything on the installation.
Thanks Francisco Yanez