nutanix / helm

Nutanix Helm Charts repository
https://nutanix.github.io/helm/
MIT License
17 stars 30 forks source link

MountVolume.SetUp failed for volume - the given Volume ID NutanixVolumes-***** already exists #140

Closed med-jordan closed 4 months ago

med-jordan commented 4 months ago

Hi Gents,

im using latest rancher version and latest nutanix csi snapshot and csi storage. All nodes meet the requirements as described. I can create a pvc and it get status bound and is online in rancher and nutanix prism elements.

When creating a new pod with the pvc then i got the error that the given volume allready exists.

"MountVolume.SetUp failed for volume "pvc-xxxxx-xxxxx-xxxxxx-xxxxx" : rpc error: code = Unavailable desc = An operation with the given Volume ID NutanixVolumes-xxxxx-xxxxxx-xxxxxx-xxxxxxx already exists"

Any ideas how to solve this ?

Thanks and BR Andre

tuxtof commented 4 months ago

Hello Andre,

we need more information here,

and the message is not about volume exist but operation on the volume already exist , this is a transient message during processing

we need full CSI logs and status/event of PV/PVC/PODS

in all case CSI is a supported component so you can also open a support call directly

med-jordan commented 4 months ago

Hi Christophe,

all errors i can see in csi logs is this :

2024-03-01T10:09:58.587596308-05:00 stderr F (exit status 8) 2024-03-01T10:09:58.587599155-05:00 stderr F E0301 15:09:58.587357 1 utils.go:31] GRPC error: rpc error: code = Internal desc = iscsi/lvm failure, last err seen: 2024-03-01T10:09:58.587601756-05:00 stderr F iscsi: failed to attach disk: Error: iscsiadm: Could not login to [iface: default, target: iqn.2010-06.com.nutanix:ntnx-k8s-793fec83-fb4c-4f52-88b2-4fc2fe12c188-tgt0, portal: 10.150.138.138,3260]. 2024-03-01T10:09:58.587604143-05:00 stderr F iscsiadm: initiator reported error (8 - connection timed out) 2024-03-01T10:09:58.587608699-05:00 stderr F iscsiadm: Could not log into all portals 2024-03-01T10:09:58.587611136-05:00 stderr F Logging in to [iface: default, target: iqn.2010-06.com.nutanix:ntnx-k8s-793fec83-fb4c-4f52-88b2-4fc2fe12c188-tgt0, portal: 10.150.138.138,3260] 2024-03-01T10:09:58.587613419-05:00 stderr F (exit status 8)

I checked port 3260 to ISCSI Data Services IP but its open ... credentials secret also ok... But i think youre right its better to create a support call directly.

Thanks Andre

tuxtof commented 4 months ago

is iscsid service started on the nodes ? there is two port to open for iscsi 3205, 3260 even if this is not a good idea to route iscsi traffic in all case there is the entire flow here https://portal.nutanix.com/page/documents/list?type=software&filterKey=software&filterVal=Ports%20and%20Protocols&productType=Nutanix%20Kubernetes%20Engine

can you give also status/event of PV/PVC/PODS

tuxtof commented 4 months ago

some other questions

and please share support ticket once open

med-jordan commented 4 months ago

rancher version is 2.8.1 k8s cluster is a rke2 cluster with embedded etcd OS on all nodes is centos stream 9

med-jordan commented 4 months ago

CentOS-Stream-GenericCloud-9-latest

tuxtof commented 4 months ago

is RKE2 used manually or with the nutanix rancher node driver ?

med-jordan commented 4 months ago

i have 3 nodes ( vms on nutanix cluster ) where is installed centos stream 9 and on top of that i have installed the rke2 cluster. on top of that there is the rancher server management interface installed where the rancher node driver is installed and activated.

med-jordan commented 4 months ago

so im using it with nutanix rancher node driver and its working good ... i can create kubernetes clusters without any problems... only the installed csi storage driver ( installed via helm chart in the store ) have the described behavior.

med-jordan commented 4 months ago

also have a second env where rancher runs on vm in docker container ... also same behavior.

med-jordan commented 4 months ago

Good New Christophe - after checking port tcp 7050,3205,2379 and udp 123 with firewall colleauges the issue is resolved.

Port tcp 7050 - Authenticates clients accessing clusters, and provides access to the NKE when a firewall exists between Prism Central and the workstation and provides authentication.

Thank you for your support and BR Andre

tuxtof commented 4 months ago

ok good news