Closed LittleCadet closed 2 years ago
kubectl describe pvc -n kube-system:
Name: cephfs-pvc
Namespace: kube-system
StorageClass: rook-cephfs
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
volume.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: kube-registry-5b677b6c87-86kmp
kube-registry-5b677b6c87-btpsq
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 33m rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-86d7c46746-jxkgn_e1687ee9-24d3-40c8-a254-d42cde49edfa failed to provision volume with StorageClass "rook-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning ProvisioningFailed 14m (x13 over 33m) rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-86d7c46746-jxkgn_e1687ee9-24d3-40c8-a254-d42cde49edfa failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-ecbc3246-e9cf-4e52-a36e-580891aef6e1 already exists
Normal Provisioning 4m1s (x17 over 35m) rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-86d7c46746-jxkgn_e1687ee9-24d3-40c8-a254-d42cde49edfa External provisioner is provisioning volume for claim "kube-system/cephfs-pvc"
Normal ExternalProvisioning 40s (x142 over 35m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "rook-ceph.cephfs.csi.ceph.com" or manually created by system administrator
what should i do now, someone help ?
@LittleCadet looks like you have 1 OSD. please paste the filesystem yaml. if you are planning to test Rook ,please choose Replica value as 1 or use filesystem-test.yaml
@Madhu-1 sorry , i am a newer ,use the filesystem-test.yaml : not fixed :
some description about pvc in kube-system namespace:
[root@master01 cephfs]# kubectl describe pvc -n kube-system
Name: cephfs-pvc
Namespace: kube-system
StorageClass: rook-cephfs
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
volume.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: kube-registry-5b677b6c87-87sgc
kube-registry-5b677b6c87-klqdh
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 50m rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-86d7c46746-jxkgn_e1687ee9-24d3-40c8-a254-d42cde49edfa failed to provision volume with StorageClass "rook-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning ProvisioningFailed 27m (x14 over 50m) rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-86d7c46746-jxkgn_e1687ee9-24d3-40c8-a254-d42cde49edfa failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-06316dd3-5387-40a7-984b-a3b9178e4802 already exists
Normal ExternalProvisioning 3m21s (x203 over 53m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "rook-ceph.cephfs.csi.ceph.com" or manually created by system administrator
Normal Provisioning 2m27s (x22 over 53m) rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-86d7c46746-jxkgn_e1687ee9-24d3-40c8-a254-d42cde49edfa External provisioner is provisioning volume for claim "kube-system/cephfs-pvc"
then ceph status :
sh-4.4$ ceph status
cluster:
id: 42c5b2f2-0efb-46b5-ad9d-3a3b14b43538
health: HEALTH_WARN
Degraded data redundancy: 32 pgs undersized
1 pool(s) have no replicas configured
OSD count 1 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum a (age 112m)
mgr: a(active, since 111m)
mds: 1/1 daemons up
osd: 1 osds: 1 up (since 111m), 1 in (since 112m)
data:
volumes: 1/1 healthy
pools: 2 pools, 48 pgs
objects: 22 objects, 2.8 KiB
usage: 6.9 MiB used, 20 GiB / 20 GiB avail
pgs: 32 active+undersized
16 active+clean
progress:
Global Recovery Event (107m)
[=========...................] (remaining: 3h)
what should i do next ? i would like to do some tests.
i re-craete rook , now the ceph status :
[rook@rook-ceph-tools-d6d7c985c-mqt2r /]$ ceph status
cluster:
id: f9e7a676-eaeb-4f24-957d-9399f09a5720
health: HEALTH_OK
services:
mon: 1 daemons, quorum a (age 3h)
mgr: a(active, since 3h)
mds: 1/1 daemons up
osd: 1 osds: 1 up (since 3h), 1 in (since 3h)
data:
volumes: 1/1 healthy
pools: 3 pools, 96 pgs
objects: 25 objects, 465 KiB
usage: 29 MiB used, 20 GiB / 20 GiB avail
pgs: 96 active+clean
but still not fixed the question : pvc is pending .
i need some help , please.
What does ceph osd pool ls detail
show in the toolbox? It probably shows replica 3, which means you need at least 3 OSDs on different hosts by default. For a test where there is only one OSD, please create filesystem-test.yaml as @Madhu-1 mentioned, which only requires a single OSD.
@travisn now , I create rook in this method:
kubectl create -f crds.yaml -f common.yaml -f operator.yaml . kubectl create -f cluster-test.yaml kubectl create -f filesystem-test.yaml kubectl create -f storageclass.yaml kubectl create -f kube-registry.yaml
ceph osd pool ls detail :
sh-4.4$ ceph osd pool ls detail
pool 1 '.mgr' replicated size 1 min_size 1 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 33 lfor 0/0/22 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 2 'myfs-metadata' replicated size 1 min_size 1 crush_rule 2 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 52 lfor 0/0/36 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
pool 3 'myfs-replicated' replicated size 1 min_size 1 crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 38 lfor 0/0/36 flags hashpspool stripe_width 0 application cephfs
and found something in csi-cephfsplugin-provisioner-86d7c46746-87cmf :
E0626 00:45:21.807327 1 reflector.go:138] github.com/kubernetes-csi/external-snapshotter/client/v4/informers/externalversions/factory.go:117: Failed to watch *v1.VolumeSnapshotClass: failed to list *v1.VolumeSnapshotClass: the server could not find the requested resource (get volumesnapshotclasses.snapshot.storage.k8s.io)
E0626 00:45:39.748254 1 reflector.go:138] github.com/kubernetes-csi/external-snapshotter/client/v4/informers/externalversions/factory.go:117: Failed to watch *v1.VolumeSnapshotContent: failed to list *v1.VolumeSnapshotContent: the server could not find the requested resource (get volumesnapshotcontents.snapshot.storage.k8s.io)
E0626 00:46:12.243067 1 reflector.go:138] github.com/kubernetes-csi/external-snapshotter/client/v4/informers/externalversions/factory.go:117: Failed to watch *v1.VolumeSnapshotClass: failed to list *v1.VolumeSnapshotClass: the server could not find the requested resource (get volumesnapshotclasses.snapshot.storage.k8s.io)
E0626 00:46:16.996278 1 reflector.go:138] github.com/kubernetes-csi/external-snapshotter/client/v4/informers/externalversions/factory.go:117: Failed to watch *v1.VolumeSnapshotContent: failed to list *v1.VolumeSnapshotContent: the server could not find the requested resource (get volumesnapshotcontents.snapshot.storage.k8s.io)
E0626 00:46:51.594086 1 reflector.go:138] github.com/kubernetes-csi/external-snapshotter/client/v4/informers/externalversions/factory.go:117: Failed to watch *v1.VolumeSnapshotClass: failed to list *v1.VolumeSnapshotClass: the server could not find the requested resource (get volumesnapshotclasses.snapshot.storage.k8s.io)
E0626 00:47:14.984594 1 reflector.go:138] github.com/kubernetes-csi/external-snapshotter/client/v4/informers/externalversions/factory.go:117: Failed to watch *v1.VolumeSnapshotContent: failed to list *v1.VolumeSnapshotContent: the server could not find the requested resource (get volumesnapshotcontents.snapshot.storage.k8s.io)
and the logs of rook-ceph-operator-785cc8f794-bhbmt:
2022-06-26 00:37:00.706796 I | op-osd: waiting... 1 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated
2022-06-26 00:38:00.706645 I | op-osd: waiting... 1 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated
2022-06-26 00:39:00.707938 I | op-osd: waiting... 1 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated
2022-06-26 00:40:00.707030 I | op-osd: waiting... 1 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated
2022-06-26 00:41:00.706926 I | op-osd: waiting... 1 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated
2022-06-26 00:42:00.709996 I | op-osd: waiting... 1 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated
2022-06-26 00:43:00.707759 I | op-osd: waiting... 1 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated
2022-06-26 00:44:00.709079 I | op-osd: waiting... 1 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated
2022-06-26 00:45:00.706893 I | op-osd: waiting... 1 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated
2022-06-26 00:46:00.707395 I | op-osd: waiting... 1 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated
2022-06-26 00:47:00.707031 I | op-osd: waiting... 1 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated
2022-06-26 00:48:00.707187 I | op-osd: waiting... 1 of 2 OSD prepare jobs have finished processing and 1 of 1 OSDs have been updated
and describe pvc/cephfs-pvc:
[root@master01 cephfs]# kubectl describe pvc/cephfs-pvc
Name: cephfs-pvc
Namespace: default
StorageClass: rook-cephfs
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
volume.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 50m (x546 over 35h) rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-86d7c46746-stq62_a7824160-1233-4311-b870-51a729c70e19 External provisioner is provisioning volume for claim "default/cephfs-pvc"
Warning ProvisioningFailed 24m (x99 over 48m) persistentvolume-controller storageclass.storage.k8s.io "rook-cephfs" not found
Warning ProvisioningFailed 15m rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-86d7c46746-gdjqw_9970df94-db32-4f87-a6e7-31842f641586 failed to provision volume with StorageClass "rook-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal ExternalProvisioning 4m24s (x8426 over 35h) persistentvolume-controller waiting for a volume to be created, either by external provisioner "rook-ceph.cephfs.csi.ceph.com" or manually created by system administrator
Normal Provisioning 8s (x5 over 17m) rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-86d7c46746-gdjqw_9970df94-db32-4f87-a6e7-31842f641586 External provisioner is provisioning volume for claim "default/cephfs-pvc"
Warning ProvisioningFailed 8s (x4 over 11m) rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-86d7c46746-gdjqw_9970df94-db32-4f87-a6e7-31842f641586 failed to provision volume with StorageClass "rook-cephfs": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-e820eb04-7e91-4c31-a779-4acdc69af2d5 already exists
some questions here :
github.com/kubernetes-csi/external-snapshotter/client/v4/informers/externalversions/factory.go:117: Failed to watch *v1.VolumeSnapshotContent: failed to list *v1.VolumeSnapshotContent: the server could not find the requested resource (get volumesnapshotcontents.snapshot.storage.k8s.io)
an operation with the given Volume ID pvc-e820eb04-7e91-4c31-a779-4acdc69af2d5 already exists
check crd, it is ok
[root@master01 ~]# kubectl get crd -n rook-ceph | grep volumesnapshotclasses.snapshot.storage.k8s.io
volumesnapshotclasses.snapshot.storage.k8s.io 2022-06-10T09:31:32Z
i have no idea about this question:
github.com/kubernetes-csi/external-snapshotter/client/v4/informers/externalversions/factory.go:117: Failed to watch *v1.VolumeSnapshotContent: failed to list *v1.VolumeSnapshotContent: the server could not find the requested resource (get volumesnapshotcontents.snapshot.storage.k8s.io)
now, solve the problem !!! actually , everything is fine in rook and in ceph . but my problem is special . the true reason is my k8s network not right.
my k8s cluster do not use any network plugin. so the kubelet log like this :
Jun 26 14:24:30 a-slave01 kubelet[1586]: I0626 14:24:30.793784 1586 docker_sandbox.go:402] "Failed to read pod IP from plugin/docker" err="Couldn't find network status for rook-ceph/csi-cephfsplugin-provisioner-659bf8dfcb-rxmrw through plugin: in>
Jun 26 14:24:31 a-slave01 kubelet[1586]: I0626 14:24:31.028303 1586 docker_sandbox.go:402] "Failed to read pod IP from plugin/docker" err="Couldn't find network status for rook-ceph/csi-cephfsplugin-provisioner-659bf8dfcb-rxmrw through plugin: in>
finally : the method is : re-create k8s cluster and install the network plugin : flannel . then . the pvc in bound status:
[root@master01 cephfs]# kubectl get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
kube-system cephfs-pvc Bound pvc-ed13a5b9-9ecc-4cb3-aebc-767a3aa23529 1Gi RWX rook-cephfs 6s
haha
@travisn @Madhu-1 @LittleCadet Can you please say me whether db_session_id is a secret or just identifier? Thanks in advance!
@ira-gordin-sap can you be more specific what db_session_id
is you are talking about and where did you find it :)
@ira-gordin-sap can you be more specific what
db_session_id
is you are talking about and where did you find it :)
Madhu-1 in the logs.
@ira-gordin-sap can you be more specific what
db_session_id
is you are talking about and where did you find it :)Madhu-1 in the logs.
@ira-gordin-sap in which logs, is it in csi logs, if yes which pod and which container? without that am not sure which session_id you are referring to
@Madhu-1 in the logs attached to this issue for example
@Madhu-1 in the logs attached to this issue for example
@Madhu-1 in addition we had in this pod: the following message:
@ira-gordin-sap its in ceph pod not in csi pod, Thank you, @travisn @BlaineEXE might know about it.
@ira-gordin-sap The db_session_id
is just an identifier, not a secret
Is this a bug report or feature request?
Deviation from expected behavior: pvc status is pending
Expected behavior: pvc status is bound
How to reproduce it (minimal and precise):
yesterday : everything is fine , but , today : i re-create cephCluster. then , pvc status is pending .
File(s) to submit:
rook-ceph-mon-a-54c944bfc9-p6hm9 :
rook-ceph-mgr-a-ddc449956-r9lfx :
rook-ceph-operator-785cc8f794-pdpg7:
csi-cephfsplugin-provisioner-86d7c46746-7vrkt :
csi-cephfsplugin-kgtfj : no logs
Environment:
CENTOS_MANTISBT_PROJECT="CentOS-8" CENTOS_MANTISBT_PROJECT_VERSION="8" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="8