Open uthark opened 1 year ago
A few questions.
Are you able to reproduce this with kubectl 1.27?
If you set the timeout longer does it eventually resolve?
What does the CRD look like? Mainly asking about the status and meta fields - don't need the whole spec.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
Reproduced it today with
$ kubectl version
Client Version: v1.28.8
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.3
When waiting for cephcluster that comes from rook
$ kubectl get cephcluster my-cluster -n rook-ceph --context dr2 -o yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"ceph.rook.io/v1","kind":"CephCluster","metadata":{"annotations":{},"name":"my-cluster","namespace":"rook-ceph"},"spec":{"cephConfig":{"global":{"bdev_flock_retry":"20","bluefs_buffered_io":"false","mon_data_avail_warn":"10","mon_warn_on_pool_no_redundancy":"false","osd_pool_default_size":"1"}},"cephVersion":{"allowUnsupported":true,"image":"quay.io/ceph/ceph:v18"},"crashCollector":{"disable":true},"dashboard":{"enabled":true},"dataDirHostPath":"/data/rook","disruptionManagement":{"managePodBudgets":true},"healthCheck":{"daemonHealth":{"mon":{"interval":"45s","timeout":"600s"}}},"mgr":{"allowMultiplePerNode":true,"count":1},"mon":{"allowMultiplePerNode":true,"count":1},"monitoring":{"enabled":false},"network":{"provider":"host"},"priorityClassNames":{"all":"system-node-critical","mgr":"system-cluster-critical"},"storage":{"useAllDevices":true,"useAllNodes":true}}}
creationTimestamp: "2024-04-07T19:10:31Z"
finalizers:
- cephcluster.ceph.rook.io
generation: 3
name: my-cluster
namespace: rook-ceph
resourceVersion: "6553"
uid: a4dfa8d1-e371-4a4f-a4f5-764155c80774
spec:
cephConfig:
global:
bdev_flock_retry: "20"
bluefs_buffered_io: "false"
mon_data_avail_warn: "10"
mon_warn_on_pool_no_redundancy: "false"
osd_pool_default_size: "1"
cephVersion:
allowUnsupported: true
image: quay.io/ceph/ceph:v18
cleanupPolicy:
sanitizeDisks: {}
crashCollector:
disable: true
csi:
cephfs: {}
readAffinity:
enabled: false
dashboard:
enabled: true
dataDirHostPath: /data/rook
disruptionManagement:
managePodBudgets: true
external: {}
healthCheck:
daemonHealth:
mon:
interval: 45s
timeout: 600s
osd: {}
status: {}
logCollector: {}
mgr:
allowMultiplePerNode: true
count: 1
mon:
allowMultiplePerNode: true
count: 1
monitoring:
enabled: false
network:
multiClusterService: {}
provider: host
priorityClassNames:
all: system-node-critical
mgr: system-cluster-critical
security:
keyRotation:
enabled: false
kms: {}
storage:
flappingRestartIntervalHours: 0
store: {}
useAllDevices: true
useAllNodes: true
status:
ceph:
capacity:
bytesAvailable: 53636202496
bytesTotal: 53687091200
bytesUsed: 50888704
lastUpdated: "2024-04-07T19:49:56Z"
fsid: 735d962d-e186-4926-9a77-219aedd41c29
health: HEALTH_OK
lastChecked: "2024-04-07T19:49:56Z"
versions:
mgr:
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
mon:
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
osd:
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
overall:
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 4
rbd-mirror:
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
conditions:
- lastHeartbeatTime: "2024-04-07T19:49:56Z"
lastTransitionTime: "2024-04-07T19:12:20Z"
message: Cluster created successfully
reason: ClusterCreated
status: "True"
type: Ready
message: Cluster created successfully
observedGeneration: 2
phase: Ready
state: Created
storage:
osd:
storeType:
bluestore: 1
version:
image: quay.io/ceph/ceph:v18
version: 18.2.2-0
$ kubectl wait cephcluster my-cluster --for=condition=Ready -n rook-ceph --context dr2 --timeout 10s -v8
I0407 22:58:54.680107 922416 loader.go:395] Config loaded from file: /home/nsoffer/.kube/config
I0407 22:58:54.680718 922416 cert_rotation.go:137] Starting client certificate rotation controller
I0407 22:58:54.683401 922416 round_trippers.go:463] GET https://192.168.122.101:8443/apis/ceph.rook.io/v1/namespaces/rook-ceph/cephclusters/my-cluster
I0407 22:58:54.683583 922416 round_trippers.go:469] Request Headers:
I0407 22:58:54.683597 922416 round_trippers.go:473] Accept: application/json
I0407 22:58:54.683605 922416 round_trippers.go:473] User-Agent: kubectl/v1.28.8 (linux/amd64) kubernetes/fc11ff3
I0407 22:58:54.690629 922416 round_trippers.go:574] Response Status: 200 OK in 7 milliseconds
I0407 22:58:54.690650 922416 round_trippers.go:577] Response Headers:
I0407 22:58:54.690657 922416 round_trippers.go:580] Audit-Id: 11d6e5c4-6e5f-4e08-8131-7f3889688b47
I0407 22:58:54.690663 922416 round_trippers.go:580] Cache-Control: no-cache, private
I0407 22:58:54.690671 922416 round_trippers.go:580] Content-Type: application/json
I0407 22:58:54.690675 922416 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: d44c3b2b-7793-4a13-ae31-4ec41bcea39d
I0407 22:58:54.690680 922416 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: f26e2a00-ac90-4bd8-8fa0-90880816ccc8
I0407 22:58:54.690684 922416 round_trippers.go:580] Date: Sun, 07 Apr 2024 19:58:54 GMT
I0407 22:58:54.690766 922416 request.go:1212] Response Body: {"apiVersion":"ceph.rook.io/v1","kind":"CephCluster","metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ceph.rook.io/v1\",\"kind\":\"CephCluster\",\"metadata\":{\"annotations\":{},\"name\":\"my-cluster\",\"namespace\":\"rook-ceph\"},\"spec\":{\"cephConfig\":{\"global\":{\"bdev_flock_retry\":\"20\",\"bluefs_buffered_io\":\"false\",\"mon_data_avail_warn\":\"10\",\"mon_warn_on_pool_no_redundancy\":\"false\",\"osd_pool_default_size\":\"1\"}},\"cephVersion\":{\"allowUnsupported\":true,\"image\":\"quay.io/ceph/ceph:v18\"},\"crashCollector\":{\"disable\":true},\"dashboard\":{\"enabled\":true},\"dataDirHostPath\":\"/data/rook\",\"disruptionManagement\":{\"managePodBudgets\":true},\"healthCheck\":{\"daemonHealth\":{\"mon\":{\"interval\":\"45s\",\"timeout\":\"600s\"}}},\"mgr\":{\"allowMultiplePerNode\":true,\"count\":1},\"mon\":{\"allowMultiplePerNode\":true,\"count\":1},\"monitoring\":{\"enabled\":false},\"network\":{\"provider\":\"host\"},\"priorityClassNames\":{\"all\":\ [truncated 5321 chars]
I0407 22:58:54.691117 922416 reflector.go:289] Starting reflector *unstructured.Unstructured (0s) from vendor/k8s.io/client-go/tools/watch/informerwatcher.go:146
I0407 22:58:54.691128 922416 reflector.go:325] Listing and watching *unstructured.Unstructured from vendor/k8s.io/client-go/tools/watch/informerwatcher.go:146
I0407 22:58:54.691193 922416 round_trippers.go:463] GET https://192.168.122.101:8443/apis/ceph.rook.io/v1/namespaces/rook-ceph/cephclusters?fieldSelector=metadata.name%3Dmy-cluster&limit=500&resourceVersion=0
I0407 22:58:54.691199 922416 round_trippers.go:469] Request Headers:
I0407 22:58:54.691205 922416 round_trippers.go:473] Accept: application/json
I0407 22:58:54.691210 922416 round_trippers.go:473] User-Agent: kubectl/v1.28.8 (linux/amd64) kubernetes/fc11ff3
I0407 22:58:54.692121 922416 round_trippers.go:574] Response Status: 200 OK in 0 milliseconds
I0407 22:58:54.692129 922416 round_trippers.go:577] Response Headers:
I0407 22:58:54.692136 922416 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: d44c3b2b-7793-4a13-ae31-4ec41bcea39d
I0407 22:58:54.692143 922416 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: f26e2a00-ac90-4bd8-8fa0-90880816ccc8
I0407 22:58:54.692150 922416 round_trippers.go:580] Date: Sun, 07 Apr 2024 19:58:54 GMT
I0407 22:58:54.692155 922416 round_trippers.go:580] Audit-Id: 31bee88f-4318-4f8a-8166-81ac77c50e61
I0407 22:58:54.692159 922416 round_trippers.go:580] Cache-Control: no-cache, private
I0407 22:58:54.692164 922416 round_trippers.go:580] Content-Type: application/json
I0407 22:58:54.692223 922416 request.go:1212] Response Body: {"apiVersion":"ceph.rook.io/v1","items":[{"apiVersion":"ceph.rook.io/v1","kind":"CephCluster","metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ceph.rook.io/v1\",\"kind\":\"CephCluster\",\"metadata\":{\"annotations\":{},\"name\":\"my-cluster\",\"namespace\":\"rook-ceph\"},\"spec\":{\"cephConfig\":{\"global\":{\"bdev_flock_retry\":\"20\",\"bluefs_buffered_io\":\"false\",\"mon_data_avail_warn\":\"10\",\"mon_warn_on_pool_no_redundancy\":\"false\",\"osd_pool_default_size\":\"1\"}},\"cephVersion\":{\"allowUnsupported\":true,\"image\":\"quay.io/ceph/ceph:v18\"},\"crashCollector\":{\"disable\":true},\"dashboard\":{\"enabled\":true},\"dataDirHostPath\":\"/data/rook\",\"disruptionManagement\":{\"managePodBudgets\":true},\"healthCheck\":{\"daemonHealth\":{\"mon\":{\"interval\":\"45s\",\"timeout\":\"600s\"}}},\"mgr\":{\"allowMultiplePerNode\":true,\"count\":1},\"mon\":{\"allowMultiplePerNode\":true,\"count\":1},\"monitoring\":{\"enabled\":false},\"network\":{\"provider\":\" [truncated 5441 chars]
I0407 22:58:54.692631 922416 round_trippers.go:463] GET https://192.168.122.101:8443/apis/ceph.rook.io/v1/namespaces/rook-ceph/cephclusters?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dmy-cluster&resourceVersion=7743&timeoutSeconds=458&watch=true
I0407 22:58:54.692638 922416 round_trippers.go:469] Request Headers:
I0407 22:58:54.692644 922416 round_trippers.go:473] Accept: application/json
I0407 22:58:54.692649 922416 round_trippers.go:473] User-Agent: kubectl/v1.28.8 (linux/amd64) kubernetes/fc11ff3
I0407 22:58:54.693292 922416 round_trippers.go:574] Response Status: 200 OK in 0 milliseconds
I0407 22:58:54.693302 922416 round_trippers.go:577] Response Headers:
I0407 22:58:54.693308 922416 round_trippers.go:580] Audit-Id: d1d844d0-4e09-4991-98f5-9b48274875d8
I0407 22:58:54.693313 922416 round_trippers.go:580] Cache-Control: no-cache, private
I0407 22:58:54.693324 922416 round_trippers.go:580] Content-Type: application/json
I0407 22:58:54.693330 922416 round_trippers.go:580] X-Kubernetes-Pf-Flowschema-Uid: d44c3b2b-7793-4a13-ae31-4ec41bcea39d
I0407 22:58:54.693335 922416 round_trippers.go:580] X-Kubernetes-Pf-Prioritylevel-Uid: f26e2a00-ac90-4bd8-8fa0-90880816ccc8
I0407 22:58:54.693340 922416 round_trippers.go:580] Date: Sun, 07 Apr 2024 19:58:54 GMT
I0407 22:58:54.791437 922416 shared_informer.go:341] caches populated
I0407 22:59:04.682780 922416 reflector.go:295] Stopping reflector *unstructured.Unstructured (0s) from vendor/k8s.io/client-go/tools/watch/informerwatcher.go:146
error: timed out waiting for the condition on cephclusters/my-cluster
The CRD is here: https://github.com/rook/rook/blob/8c8844e70d7ac0225ec42a38414361b32acdb6ff/deploy/examples/crds.yaml#L918
The interesting detail - I have 2 clusters with ceph, both with same version, this works on the second cluster:
$ kubectl get cephcluster my-cluster -n rook-ceph --context dr1 -o yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"ceph.rook.io/v1","kind":"CephCluster","metadata":{"annotations":{},"name":"my-cluster","namespace":"rook-ceph"},"spec":{"cephConfig":{"global":{"bdev_flock_retry":"20","bluefs_buffered_io":"false","mon_data_avail_warn":"10","mon_warn_on_pool_no_redundancy":"false","osd_pool_default_size":"1"}},"cephVersion":{"allowUnsupported":true,"image":"quay.io/ceph/ceph:v18"},"crashCollector":{"disable":true},"dashboard":{"enabled":true},"dataDirHostPath":"/data/rook","disruptionManagement":{"managePodBudgets":true},"healthCheck":{"daemonHealth":{"mon":{"interval":"45s","timeout":"600s"}}},"mgr":{"allowMultiplePerNode":true,"count":1},"mon":{"allowMultiplePerNode":true,"count":1},"monitoring":{"enabled":false},"network":{"provider":"host"},"priorityClassNames":{"all":"system-node-critical","mgr":"system-cluster-critical"},"storage":{"useAllDevices":true,"useAllNodes":true}}}
creationTimestamp: "2024-04-07T19:12:01Z"
finalizers:
- cephcluster.ceph.rook.io
generation: 3
name: my-cluster
namespace: rook-ceph
resourceVersion: "7993"
uid: 3b1f9adc-eeb6-4bde-a2d0-c9850aa008d7
spec:
cephConfig:
global:
bdev_flock_retry: "20"
bluefs_buffered_io: "false"
mon_data_avail_warn: "10"
mon_warn_on_pool_no_redundancy: "false"
osd_pool_default_size: "1"
cephVersion:
allowUnsupported: true
image: quay.io/ceph/ceph:v18
cleanupPolicy:
sanitizeDisks: {}
crashCollector:
disable: true
csi:
cephfs: {}
readAffinity:
enabled: false
dashboard:
enabled: true
dataDirHostPath: /data/rook
disruptionManagement:
managePodBudgets: true
external: {}
healthCheck:
daemonHealth:
mon:
interval: 45s
timeout: 600s
osd: {}
status: {}
logCollector: {}
mgr:
allowMultiplePerNode: true
count: 1
mon:
allowMultiplePerNode: true
count: 1
monitoring:
enabled: false
network:
multiClusterService: {}
provider: host
priorityClassNames:
all: system-node-critical
mgr: system-cluster-critical
security:
keyRotation:
enabled: false
kms: {}
storage:
flappingRestartIntervalHours: 0
store: {}
useAllDevices: true
useAllNodes: true
status:
ceph:
capacity:
bytesAvailable: 53632000000
bytesTotal: 53687091200
bytesUsed: 55091200
lastUpdated: "2024-04-07T20:02:34Z"
fsid: 0d014e73-153a-4ee1-ae59-c9f8b568b910
health: HEALTH_OK
lastChecked: "2024-04-07T20:02:34Z"
versions:
mgr:
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
mon:
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
osd:
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
overall:
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 4
rbd-mirror:
ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
conditions:
- lastHeartbeatTime: "2024-04-07T19:13:50Z"
lastTransitionTime: "2024-04-07T19:13:50Z"
message: Processing OSD 0 on node "dr1"
reason: ClusterProgressing
status: "True"
type: Progressing
- lastHeartbeatTime: "2024-04-07T20:02:35Z"
lastTransitionTime: "2024-04-07T19:13:53Z"
message: Cluster created successfully
reason: ClusterCreated
status: "True"
type: Ready
message: Cluster created successfully
phase: Ready
state: Created
storage:
osd:
storeType:
bluestore: 1
version:
image: quay.io/ceph/ceph:v18
version: 18.2.2-0
$ kubectl wait cephcluster my-cluster --for=condition=Ready -n rook-ceph --context dr1 --timeout 10s
cephcluster.ceph.rook.io/my-cluster condition met
The only difference is that in the working cluster there is also a "Progressing" condition - not sure why it is not reported for the broken cluster.
Ceph works fine on both clusters otherwise.
@travisn Maybe this is related to rook?
@travisn Maybe this is related to rook?
@nirs So you are waiting for the condition "Ready", and the kubectl wait
command is timing out even though the condition is as expected? As long as the condition is what you expect, sounds like there is not a Rook issue.
@travisn Maybe this is related to rook?
@nirs So you are waiting for the condition "Ready", and the
kubectl wait
command is timing out even though the condition is as expected? As long as the condition is what you expect, sounds like there is not a Rook issue.
Yes, the condition looks valid but kubectl wait
times out,
- lastHeartbeatTime: "2024-04-07T19:49:56Z"
lastTransitionTime: "2024-04-07T19:12:20Z"
message: Cluster created successfully
reason: ClusterCreated
status: "True"
type: Ready
This does not seems like a rook issue. But it times out on the on the cluster that does not report the "Progressing" condition - which is reported on the other cluster. Both cluster should have the same state. So something is going on with rook.
I can confirm the same issue with a Kafka CRD instance and kubectl
v1.29.4.
kubectl get Kafka test
status:
conditions:
- lastTransitionTime: "2024-03-24T15:06:43.925578714Z"
status: "True"
type: Ready
kubectl wait kafkas test --for=condition=Ready --timeout=1m
error: timed out waiting for the condition on kafkas/test
kubectl wait kafkas test --for=jsonpath='{.status.conditions[?(@.type=="Ready")].status}'=True --timeout=5m
I hit the same issue with ceph as well (2 clusters), same having same conditions ; on one "kubectl wait --for=condition=Ready" works, on the other the same command times-out.
conditions:
- lastHeartbeatTime: "2024-04-26T22:01:05Z"
lastTransitionTime: "2023-12-15T11:14:11Z"
message: Cluster created successfully
reason: ClusterCreated
status: "True"
type: Ready
@jnt2007: Thanks for workarround --for=jsonpath='{.status.conditions[?(@.type=="Ready")].status}'=True
, it helped (note that you just need recent enough kubectl)
Looking in kubectl wait code, it ignores the flag value if the status observedGeneration does not match the object generation: https://github.com/kubernetes/kubectl/blob/5ff591adc68e5016618d45332422916af4489dc2/pkg/cmd/wait/wait.go#L583
Which looks right to me. Looking in resources I posted here: https://github.com/kubernetes/kubectl/issues/1414#issuecomment-2041587821
The resource with the issue:
$ kubectl get cephcluster my-cluster -n rook-ceph --context dr2 -o yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
...
generation: 3
...
status:
...
conditions:
- lastHeartbeatTime: "2024-04-07T19:49:56Z"
lastTransitionTime: "2024-04-07T19:12:20Z"
message: Cluster created successfully
reason: ClusterCreated
status: "True"
type: Ready
...
observedGeneration: 2
The resource without the issue:
$ kubectl get cephcluster my-cluster -n rook-ceph --context dr1 -o yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
generation: 3
...
status:
conditions:
- lastHeartbeatTime: "2024-04-07T19:13:50Z"
lastTransitionTime: "2024-04-07T19:13:50Z"
message: Processing OSD 0 on node "dr1"
reason: ClusterProgressing
status: "True"
type: Progressing
- lastHeartbeatTime: "2024-04-07T20:02:35Z"
lastTransitionTime: "2024-04-07T19:13:53Z"
message: Cluster created successfully
reason: ClusterCreated
status: "True"
type: Ready
The second resource does not have observedGeneration, so the condition is respected.
So at least in my case kubectl wait is doing the right thing, and the workaround to wait for jsonpath='{.status.conditions[?(@.type=="Ready")].status}'=True
may be incorrect - waiting for stale status.
/remove-lifecycle rotten /triage accepted
Something does seem wonky. We need to get a minimal and reliable way to reproduce this so we can step through a debugger.
jsonpath='{.status.conditions[?(@.type=="Ready")].status}'=True
Just as a note to future viewers of this issue, this workaround looks to see if this object has ever been ready, not whether it is currently ready.
What happened: I have a CRD with conditions. The status of the conditions:
When I wait for the condition, I expect it to return successfully, but instead, I get timed-out error.
What you expected to happen:
Both services report the same status, but kubectl only works for one of them. I got the kubectl output with -v=10 debug level, but no issues there, both for the object that times out and for other object conditions response look the same:
OK:
Failed:
The failing one also contains severity, but not sure if it's related field comes from https://github.com/knative/pkg/blob/main/apis/condition_types.go#L67-L70
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: This is KServe CRDs, but not sure if it's relevant
Environment:
kubectl version
):cat /etc/os-release
): COS