Open NicolaiSchmid opened 4 years ago
这是来自QQ邮箱的假期自动回复邮件。邮件已收到,我会尽快回复您!
Wanted to add that this is on Ubuntu 20.04.3 LTS, and I checked for any active firewalling just in case but there is nothing besides the default docker/kubernetes rules. Every chain is set to accept. Including the complete set here in case the "KUBE-FIREWALL" drop rules could prove relevant (but unlikely since there are no hits):
Chain INPUT (policy ACCEPT 4002 packets, 2255K bytes)
pkts bytes target prot opt in out source destination
1395K 113M ACCEPT udp -- * * 0.0.0.0/0 169.254.25.10 udp dpt:53
0 0 ACCEPT tcp -- * * 0.0.0.0/0 169.254.25.10 tcp dpt:53
51M 27G KUBE-NODE-PORT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes health check rules */
51M 27G KUBE-FIREWALL all -- * * 0.0.0.0/0 0.0.0.0/0
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
31M 34G KUBE-FORWARD all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */
247K 15M ACCEPT all -- * * 10.233.64.0/18 0.0.0.0/0
0 0 ACCEPT all -- * * 0.0.0.0/0 10.233.64.0/18
Chain OUTPUT (policy ACCEPT 3737 packets, 1570K bytes)
pkts bytes target prot opt in out source destination
1395K 231M ACCEPT udp -- * * 169.254.25.10 0.0.0.0/0 udp spt:53
0 0 ACCEPT tcp -- * * 169.254.25.10 0.0.0.0/0 tcp spt:53
50M 19G KUBE-FIREWALL all -- * * 0.0.0.0/0 0.0.0.0/0
Chain DOCKER (0 references)
pkts bytes target prot opt in out source destination
Chain DOCKER-ISOLATION-STAGE-1 (0 references)
pkts bytes target prot opt in out source destination
Chain DOCKER-ISOLATION-STAGE-2 (0 references)
pkts bytes target prot opt in out source destination
Chain DOCKER-USER (0 references)
pkts bytes target prot opt in out source destination
Chain KUBE-FIREWALL (2 references)
pkts bytes target prot opt in out source destination
0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
0 0 DROP all -- * * !127.0.0.0/8 127.0.0.0/8 /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT
Chain KUBE-FORWARD (1 references)
pkts bytes target prot opt in out source destination
0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */ mark match 0x4000/0x4000
2171 2238K ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding conntrack rule */ ctstate RELATED,ESTABLISHED
Chain KUBE-KUBELET-CANARY (0 references)
pkts bytes target prot opt in out source destination
Chain KUBE-NODE-PORT (1 references)
pkts bytes target prot opt in out source destination
0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* Kubernetes health check node port */ match-set KUBE-HEALTH-CHECK-NODE-PORT dst
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
这是来自QQ邮箱的假期自动回复邮件。邮件已收到,我会尽快回复您!
Hi all,
i'm also having the same issue on rook (v1.8.8) external ceph (16.2.7)
controller.go:1337] provision "default/pvc-4" class "rc-fs-storage": started I0422 17:54:44.196780 1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"pvc-4", UID:"9c09aa0d-cfc3-484c-b536-f3e686e54f52", APIVersion:"v1", ResourceVersion:"71143", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/pvc-4" W0422 17:54:44.200513 1 controller.go:934] Retrying syncing claim "9c09aa0d-cfc3-484c-b536-f3e686e54f52", failure 5 E0422 17:54:44.200544 1 controller.go:957] error syncing claim "9c09aa0d-cfc3-484c-b536-f3e686e54f52": failed to provision volume with StorageClass "rc-fs-storage": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-9c09aa0d-cfc3-484c-b536-f3e686e54f52 already exists I0422 17:54:44.200560 1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"pvc-4", UID:"9c09aa0d-cfc3-484c-b536-f3e686e54f52", APIVersion:"v1", ResourceVersion:"71143", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "rc-fs-storage": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-9c09aa0d-cfc3-484c-b536-f3e686e54f52 already exists
please help
i checked there is no connectivity issue between rke2(rook) and external ceph
i checked on ceph side and i observed that i cant run
ceph fs subvolumegroup ls rc-mayank-cc-fs
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
这是来自QQ邮箱的假期自动回复邮件。邮件已收到,我会尽快回复您!
I had same issues as others in this thread, I'm not sure what exactly triggers it, but I was not able to deploy ACS on OpenShift. To fix that I had to do:
sh-4.4$ rbd -p ocs-storagecluster-cephblockpool ls |grep 0002e
csi-vol-32c4d1da-1172-11ed-ba20-0a580a80002e
sh-4.4$ rbd -p ocs-storagecluster-cephblockpool rm csi-vol-32c4d1da-1172-11ed-ba20-0a580a80002e
Removing image: 100% complete...done.
create rdb manually
rbd -p ocs-storagecluster-cephblockpool create csi-vol-32c4d1da-1172-11ed-ba20-0a580a80002e --size=100G
Looks like a glitch of provisioner...
这是来自QQ邮箱的假期自动回复邮件。邮件已收到,我会尽快回复您!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
这是来自QQ邮箱的假期自动回复邮件。邮件已收到,我会尽快回复您!
I am facing this situation now. Not sure what is the solution. Any further update.
这是来自QQ邮箱的假期自动回复邮件。邮件已收到,我会尽快回复您!
Hi, in my case, the problem was ... Firewall !! To be more precise : the CSI plugin is using the protocol v2 on port TCP/3300, instead of the legacy protocol on TCP/6789. This took me a while to understand, since all the other clients were using the legacy protocol and working smoothly. I was not using rook, but an external Ceph, and got the error... Revelation when I looked on the FW logs :-)...
I hit this problem today too https://github.com/rook/rook/issues/11617
这是来自QQ邮箱的假期自动回复邮件。邮件已收到,我会尽快回复您!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
这是来自QQ邮箱的假期自动回复邮件。邮件已收到,我会尽快回复您!
I had same issue today. my Cluster had planned power outage last night, So we powered-off all nodes gracefully. When reboot all nodes today , I got the same issue.
Still no hint to fix it.
I had same issue today. my Cluster had planned power outage last night, So we powered-off all nodes gracefully. When reboot all nodes today , I got the same issue.
Still no hint to fix it.
This can be because of wrong IP picked by OSD
Check ceph osd dump
and make sure OSD has right IP.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
any updates?
any updates?
Please check the device wether it mapped? is there /dev/rbd* under /dev?
https://www.mrajanna.com/troubleshooting-cephcsi/ there is a troubeshooting doc, maybe it can help you.
I had the exact same error messages (an operation with the given Volume xxx already exists
), while also having FS_DEGRADED. After restarting all mds, a crashed ganesha nfs pod and a few hours of waiting and not knowing what else to do, the FS_DEGRADED vanished and all PVCs mounted again.
It can be solved by deleting all the pods that prefix with csi-
.
It may be caused by the k8s node‘s’time is out of sync.
This is still an issue, FYI. Zero firewalling applied in test cluster atm, running cilium (non-host networking). I've tried all suggestions in this issue with zero luck of getting this fixed.
Had this issue this morning… what I did: delete all the csi-* pods (at once), restart all OSD pods (gracefully), restart all pods which showed this issue, … at some point it started working again (but I have no idea what exactly helped, at some point it looked like some network issue was also involved)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
Is this a bug report or feature request?
Deviation from expected behavior: Kubernetes tries to attach the pvc to a pod and fails:
On other nodes in the cluster, the attach and mount works fine and as expected. How to reproduce it (minimal and precise):
Create an example cluster with a rbd-csi storage-class. Create a PVC and a pod, attaching the pvc. I think the issue lies somewhere in mismatching configuration, software, kernel modules, etc.
Environment: of the node trying to mount:
uname -a
):Linux lb-173 4.15.0-88-generic #88~16.04.1-Ubuntu SMP Wed Feb 12 04:19:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
rook version
inside of a Rook Pod):ceph -v
):ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable)
kubectl version
):ceph health
in the Rook Ceph toolbox):HEALTH_OK