kubectl wait sometime doesn't resolve ready condition

uthark commented 1 year ago

What happened: I have a CRD with conditions. The status of the conditions:

status:
  ...
  conditions:
  - lastTransitionTime: "2023-04-14T20:38:28Z"
    status: "True"
    type: IngressReady
  - lastTransitionTime: "2023-04-14T20:38:28Z"
    status: "True"
    type: PredictorReady
  - lastTransitionTime: "2023-04-14T20:38:28Z"
    status: "True"
    type: Ready
  ...

When I wait for the condition, I expect it to return successfully, but instead, I get timed-out error.

k wait isvc test1 --for=condition=ready=true --timeout=5s
error: timed out waiting for the condition on inferenceservices/test1

What you expected to happen:

k wait isvc test2 --for=condition=ready=true --timeout=5s
inferenceservice.serving.kserve.io/test2 condition met

Both services report the same status, but kubectl only works for one of them. I got the kubectl output with -v=10 debug level, but no issues there, both for the object that times out and for other object conditions response look the same:

OK:

"conditions":[{"lastTransitionTime":"2023-04-14T19:36:40Z","status":"True","type":"IngressReady"},{"lastTransitionTime":"2023-04-14T19:36:40Z","status":"True","type":"PredictorReady"},{"lastTransitionTime":"2023-04-14T19:36:40Z","status":"True","type":"Ready"}]

Failed:

"conditions":[{"lastTransitionTime":"2023-04-14T21:24:21Z","status":"True","type":"IngressReady"},{"lastTransitionTime":"2023-04-14T21:24:21Z","status":"True","type":"PredictorReady"},{"lastTransitionTime":"2023-04-14T21:24:21Z","status":"True","type":"Ready"},{"lastTransitionTime":"2023-04-08T02:16:46Z","severity":"Info","status":"True","type":"TransformerReady"}]

The failing one also contains severity, but not sure if it's related field comes from https://github.com/knative/pkg/blob/main/apis/condition_types.go#L67-L70

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?: This is KServe CRDs, but not sure if it's relevant

Environment:

Kubernetes client and server versions (use kubectl version):

k version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.25.4
Kustomize Version: v4.5.7
Server Version: v1.25.7-gke.1000

Cloud provider or hardware configuration: GKE
OS (e.g: cat /etc/os-release): COS

eddiezane commented 1 year ago

A few questions.

Are you able to reproduce this with kubectl 1.27?

If you set the timeout longer does it eventually resolve?

What does the CRD look like? Mainly asking about the status and meta fields - don't need the whole spec.

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

nirs commented 6 months ago

Reproduced it today with

$ kubectl version
Client Version: v1.28.8
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.3

When waiting for cephcluster that comes from rook

$ kubectl get cephcluster my-cluster -n rook-ceph --context dr2 -o yaml 
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"ceph.rook.io/v1","kind":"CephCluster","metadata":{"annotations":{},"name":"my-cluster","namespace":"rook-ceph"},"spec":{"cephConfig":{"global":{"bdev_flock_retry":"20","bluefs_buffered_io":"false","mon_data_avail_warn":"10","mon_warn_on_pool_no_redundancy":"false","osd_pool_default_size":"1"}},"cephVersion":{"allowUnsupported":true,"image":"quay.io/ceph/ceph:v18"},"crashCollector":{"disable":true},"dashboard":{"enabled":true},"dataDirHostPath":"/data/rook","disruptionManagement":{"managePodBudgets":true},"healthCheck":{"daemonHealth":{"mon":{"interval":"45s","timeout":"600s"}}},"mgr":{"allowMultiplePerNode":true,"count":1},"mon":{"allowMultiplePerNode":true,"count":1},"monitoring":{"enabled":false},"network":{"provider":"host"},"priorityClassNames":{"all":"system-node-critical","mgr":"system-cluster-critical"},"storage":{"useAllDevices":true,"useAllNodes":true}}}
  creationTimestamp: "2024-04-07T19:10:31Z"
  finalizers:
  - cephcluster.ceph.rook.io
  generation: 3
  name: my-cluster
  namespace: rook-ceph
  resourceVersion: "6553"
  uid: a4dfa8d1-e371-4a4f-a4f5-764155c80774
spec:
  cephConfig:
    global:
      bdev_flock_retry: "20"
      bluefs_buffered_io: "false"
      mon_data_avail_warn: "10"
      mon_warn_on_pool_no_redundancy: "false"
      osd_pool_default_size: "1"
  cephVersion:
    allowUnsupported: true
    image: quay.io/ceph/ceph:v18
  cleanupPolicy:
    sanitizeDisks: {}
  crashCollector:
    disable: true
  csi:
    cephfs: {}
    readAffinity:
      enabled: false
  dashboard:
    enabled: true
  dataDirHostPath: /data/rook
  disruptionManagement:
    managePodBudgets: true
  external: {}
  healthCheck:
    daemonHealth:
      mon:
        interval: 45s
        timeout: 600s
      osd: {}
      status: {}
  logCollector: {}
  mgr:
    allowMultiplePerNode: true
    count: 1
  mon:
    allowMultiplePerNode: true
    count: 1
  monitoring:
    enabled: false
  network:
    multiClusterService: {}
    provider: host
  priorityClassNames:
    all: system-node-critical
    mgr: system-cluster-critical
  security:
    keyRotation:
      enabled: false
    kms: {}
  storage:
    flappingRestartIntervalHours: 0
    store: {}
    useAllDevices: true
    useAllNodes: true
status:
  ceph:
    capacity:
      bytesAvailable: 53636202496
      bytesTotal: 53687091200
      bytesUsed: 50888704
      lastUpdated: "2024-04-07T19:49:56Z"
    fsid: 735d962d-e186-4926-9a77-219aedd41c29
    health: HEALTH_OK
    lastChecked: "2024-04-07T19:49:56Z"
    versions:
      mgr:
        ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
      mon:
        ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
      osd:
        ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
      overall:
        ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 4
      rbd-mirror:
        ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
  conditions:
  - lastHeartbeatTime: "2024-04-07T19:49:56Z"
    lastTransitionTime: "2024-04-07T19:12:20Z"
    message: Cluster created successfully
    reason: ClusterCreated
    status: "True"
    type: Ready
  message: Cluster created successfully
  observedGeneration: 2
  phase: Ready
  state: Created
  storage:
    osd:
      storeType:
        bluestore: 1
  version:
    image: quay.io/ceph/ceph:v18
    version: 18.2.2-0

$ kubectl wait cephcluster my-cluster --for=condition=Ready -n rook-ceph --context dr2 --timeout 10s -v8
I0407 22:58:54.680107  922416 loader.go:395] Config loaded from file:  /home/nsoffer/.kube/config
I0407 22:58:54.680718  922416 cert_rotation.go:137] Starting client certificate rotation controller
I0407 22:58:54.683401  922416 round_trippers.go:463] GET https://192.168.122.101:8443/apis/ceph.rook.io/v1/namespaces/rook-ceph/cephclusters/my-cluster
I0407 22:58:54.683583  922416 round_trippers.go:469] Request Headers:
I0407 22:58:54.683597  922416 round_trippers.go:473]     Accept: application/json
I0407 22:58:54.683605  922416 round_trippers.go:473]     User-Agent: kubectl/v1.28.8 (linux/amd64) kubernetes/fc11ff3
I0407 22:58:54.690629  922416 round_trippers.go:574] Response Status: 200 OK in 7 milliseconds
I0407 22:58:54.690650  922416 round_trippers.go:577] Response Headers:
I0407 22:58:54.690657  922416 round_trippers.go:580]     Audit-Id: 11d6e5c4-6e5f-4e08-8131-7f3889688b47
I0407 22:58:54.690663  922416 round_trippers.go:580]     Cache-Control: no-cache, private
I0407 22:58:54.690671  922416 round_trippers.go:580]     Content-Type: application/json
I0407 22:58:54.690675  922416 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: d44c3b2b-7793-4a13-ae31-4ec41bcea39d
I0407 22:58:54.690680  922416 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: f26e2a00-ac90-4bd8-8fa0-90880816ccc8
I0407 22:58:54.690684  922416 round_trippers.go:580]     Date: Sun, 07 Apr 2024 19:58:54 GMT
I0407 22:58:54.690766  922416 request.go:1212] Response Body: {"apiVersion":"ceph.rook.io/v1","kind":"CephCluster","metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ceph.rook.io/v1\",\"kind\":\"CephCluster\",\"metadata\":{\"annotations\":{},\"name\":\"my-cluster\",\"namespace\":\"rook-ceph\"},\"spec\":{\"cephConfig\":{\"global\":{\"bdev_flock_retry\":\"20\",\"bluefs_buffered_io\":\"false\",\"mon_data_avail_warn\":\"10\",\"mon_warn_on_pool_no_redundancy\":\"false\",\"osd_pool_default_size\":\"1\"}},\"cephVersion\":{\"allowUnsupported\":true,\"image\":\"quay.io/ceph/ceph:v18\"},\"crashCollector\":{\"disable\":true},\"dashboard\":{\"enabled\":true},\"dataDirHostPath\":\"/data/rook\",\"disruptionManagement\":{\"managePodBudgets\":true},\"healthCheck\":{\"daemonHealth\":{\"mon\":{\"interval\":\"45s\",\"timeout\":\"600s\"}}},\"mgr\":{\"allowMultiplePerNode\":true,\"count\":1},\"mon\":{\"allowMultiplePerNode\":true,\"count\":1},\"monitoring\":{\"enabled\":false},\"network\":{\"provider\":\"host\"},\"priorityClassNames\":{\"all\":\ [truncated 5321 chars]
I0407 22:58:54.691117  922416 reflector.go:289] Starting reflector *unstructured.Unstructured (0s) from vendor/k8s.io/client-go/tools/watch/informerwatcher.go:146
I0407 22:58:54.691128  922416 reflector.go:325] Listing and watching *unstructured.Unstructured from vendor/k8s.io/client-go/tools/watch/informerwatcher.go:146
I0407 22:58:54.691193  922416 round_trippers.go:463] GET https://192.168.122.101:8443/apis/ceph.rook.io/v1/namespaces/rook-ceph/cephclusters?fieldSelector=metadata.name%3Dmy-cluster&limit=500&resourceVersion=0
I0407 22:58:54.691199  922416 round_trippers.go:469] Request Headers:
I0407 22:58:54.691205  922416 round_trippers.go:473]     Accept: application/json
I0407 22:58:54.691210  922416 round_trippers.go:473]     User-Agent: kubectl/v1.28.8 (linux/amd64) kubernetes/fc11ff3
I0407 22:58:54.692121  922416 round_trippers.go:574] Response Status: 200 OK in 0 milliseconds
I0407 22:58:54.692129  922416 round_trippers.go:577] Response Headers:
I0407 22:58:54.692136  922416 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: d44c3b2b-7793-4a13-ae31-4ec41bcea39d
I0407 22:58:54.692143  922416 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: f26e2a00-ac90-4bd8-8fa0-90880816ccc8
I0407 22:58:54.692150  922416 round_trippers.go:580]     Date: Sun, 07 Apr 2024 19:58:54 GMT
I0407 22:58:54.692155  922416 round_trippers.go:580]     Audit-Id: 31bee88f-4318-4f8a-8166-81ac77c50e61
I0407 22:58:54.692159  922416 round_trippers.go:580]     Cache-Control: no-cache, private
I0407 22:58:54.692164  922416 round_trippers.go:580]     Content-Type: application/json
I0407 22:58:54.692223  922416 request.go:1212] Response Body: {"apiVersion":"ceph.rook.io/v1","items":[{"apiVersion":"ceph.rook.io/v1","kind":"CephCluster","metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ceph.rook.io/v1\",\"kind\":\"CephCluster\",\"metadata\":{\"annotations\":{},\"name\":\"my-cluster\",\"namespace\":\"rook-ceph\"},\"spec\":{\"cephConfig\":{\"global\":{\"bdev_flock_retry\":\"20\",\"bluefs_buffered_io\":\"false\",\"mon_data_avail_warn\":\"10\",\"mon_warn_on_pool_no_redundancy\":\"false\",\"osd_pool_default_size\":\"1\"}},\"cephVersion\":{\"allowUnsupported\":true,\"image\":\"quay.io/ceph/ceph:v18\"},\"crashCollector\":{\"disable\":true},\"dashboard\":{\"enabled\":true},\"dataDirHostPath\":\"/data/rook\",\"disruptionManagement\":{\"managePodBudgets\":true},\"healthCheck\":{\"daemonHealth\":{\"mon\":{\"interval\":\"45s\",\"timeout\":\"600s\"}}},\"mgr\":{\"allowMultiplePerNode\":true,\"count\":1},\"mon\":{\"allowMultiplePerNode\":true,\"count\":1},\"monitoring\":{\"enabled\":false},\"network\":{\"provider\":\" [truncated 5441 chars]
I0407 22:58:54.692631  922416 round_trippers.go:463] GET https://192.168.122.101:8443/apis/ceph.rook.io/v1/namespaces/rook-ceph/cephclusters?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dmy-cluster&resourceVersion=7743&timeoutSeconds=458&watch=true
I0407 22:58:54.692638  922416 round_trippers.go:469] Request Headers:
I0407 22:58:54.692644  922416 round_trippers.go:473]     Accept: application/json
I0407 22:58:54.692649  922416 round_trippers.go:473]     User-Agent: kubectl/v1.28.8 (linux/amd64) kubernetes/fc11ff3
I0407 22:58:54.693292  922416 round_trippers.go:574] Response Status: 200 OK in 0 milliseconds
I0407 22:58:54.693302  922416 round_trippers.go:577] Response Headers:
I0407 22:58:54.693308  922416 round_trippers.go:580]     Audit-Id: d1d844d0-4e09-4991-98f5-9b48274875d8
I0407 22:58:54.693313  922416 round_trippers.go:580]     Cache-Control: no-cache, private
I0407 22:58:54.693324  922416 round_trippers.go:580]     Content-Type: application/json
I0407 22:58:54.693330  922416 round_trippers.go:580]     X-Kubernetes-Pf-Flowschema-Uid: d44c3b2b-7793-4a13-ae31-4ec41bcea39d
I0407 22:58:54.693335  922416 round_trippers.go:580]     X-Kubernetes-Pf-Prioritylevel-Uid: f26e2a00-ac90-4bd8-8fa0-90880816ccc8
I0407 22:58:54.693340  922416 round_trippers.go:580]     Date: Sun, 07 Apr 2024 19:58:54 GMT
I0407 22:58:54.791437  922416 shared_informer.go:341] caches populated
I0407 22:59:04.682780  922416 reflector.go:295] Stopping reflector *unstructured.Unstructured (0s) from vendor/k8s.io/client-go/tools/watch/informerwatcher.go:146
error: timed out waiting for the condition on cephclusters/my-cluster

The CRD is here: https://github.com/rook/rook/blob/8c8844e70d7ac0225ec42a38414361b32acdb6ff/deploy/examples/crds.yaml#L918

The interesting detail - I have 2 clusters with ceph, both with same version, this works on the second cluster:

$ kubectl get cephcluster my-cluster -n rook-ceph --context dr1 -o yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"ceph.rook.io/v1","kind":"CephCluster","metadata":{"annotations":{},"name":"my-cluster","namespace":"rook-ceph"},"spec":{"cephConfig":{"global":{"bdev_flock_retry":"20","bluefs_buffered_io":"false","mon_data_avail_warn":"10","mon_warn_on_pool_no_redundancy":"false","osd_pool_default_size":"1"}},"cephVersion":{"allowUnsupported":true,"image":"quay.io/ceph/ceph:v18"},"crashCollector":{"disable":true},"dashboard":{"enabled":true},"dataDirHostPath":"/data/rook","disruptionManagement":{"managePodBudgets":true},"healthCheck":{"daemonHealth":{"mon":{"interval":"45s","timeout":"600s"}}},"mgr":{"allowMultiplePerNode":true,"count":1},"mon":{"allowMultiplePerNode":true,"count":1},"monitoring":{"enabled":false},"network":{"provider":"host"},"priorityClassNames":{"all":"system-node-critical","mgr":"system-cluster-critical"},"storage":{"useAllDevices":true,"useAllNodes":true}}}
  creationTimestamp: "2024-04-07T19:12:01Z"
  finalizers:
  - cephcluster.ceph.rook.io
  generation: 3
  name: my-cluster
  namespace: rook-ceph
  resourceVersion: "7993"
  uid: 3b1f9adc-eeb6-4bde-a2d0-c9850aa008d7
spec:
  cephConfig:
    global:
      bdev_flock_retry: "20"
      bluefs_buffered_io: "false"
      mon_data_avail_warn: "10"
      mon_warn_on_pool_no_redundancy: "false"
      osd_pool_default_size: "1"
  cephVersion:
    allowUnsupported: true
    image: quay.io/ceph/ceph:v18
  cleanupPolicy:
    sanitizeDisks: {}
  crashCollector:
    disable: true
  csi:
    cephfs: {}
    readAffinity:
      enabled: false
  dashboard:
    enabled: true
  dataDirHostPath: /data/rook
  disruptionManagement:
    managePodBudgets: true
  external: {}
  healthCheck:
    daemonHealth:
      mon:
        interval: 45s
        timeout: 600s
      osd: {}
      status: {}
  logCollector: {}
  mgr:
    allowMultiplePerNode: true
    count: 1
  mon:
    allowMultiplePerNode: true
    count: 1
  monitoring:
    enabled: false
  network:
    multiClusterService: {}
    provider: host
  priorityClassNames:
    all: system-node-critical
    mgr: system-cluster-critical
  security:
    keyRotation:
      enabled: false
    kms: {}
  storage:
    flappingRestartIntervalHours: 0
    store: {}
    useAllDevices: true
    useAllNodes: true
status:
  ceph:
    capacity:
      bytesAvailable: 53632000000
      bytesTotal: 53687091200
      bytesUsed: 55091200
      lastUpdated: "2024-04-07T20:02:34Z"
    fsid: 0d014e73-153a-4ee1-ae59-c9f8b568b910
    health: HEALTH_OK
    lastChecked: "2024-04-07T20:02:34Z"
    versions:
      mgr:
        ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
      mon:
        ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
      osd:
        ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
      overall:
        ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 4
      rbd-mirror:
        ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable): 1
  conditions:
  - lastHeartbeatTime: "2024-04-07T19:13:50Z"
    lastTransitionTime: "2024-04-07T19:13:50Z"
    message: Processing OSD 0 on node "dr1"
    reason: ClusterProgressing
    status: "True"
    type: Progressing
  - lastHeartbeatTime: "2024-04-07T20:02:35Z"
    lastTransitionTime: "2024-04-07T19:13:53Z"
    message: Cluster created successfully
    reason: ClusterCreated
    status: "True"
    type: Ready
  message: Cluster created successfully
  phase: Ready
  state: Created
  storage:
    osd:
      storeType:
        bluestore: 1
  version:
    image: quay.io/ceph/ceph:v18
    version: 18.2.2-0

$ kubectl wait cephcluster my-cluster --for=condition=Ready -n rook-ceph --context dr1 --timeout 10s
cephcluster.ceph.rook.io/my-cluster condition met

The only difference is that in the working cluster there is also a "Progressing" condition - not sure why it is not reported for the broken cluster.

Ceph works fine on both clusters otherwise.

@travisn Maybe this is related to rook?

travisn commented 6 months ago

@travisn Maybe this is related to rook?

@nirs So you are waiting for the condition "Ready", and the kubectl wait command is timing out even though the condition is as expected? As long as the condition is what you expect, sounds like there is not a Rook issue.

nirs commented 6 months ago

@travisn Maybe this is related to rook?

@nirs So you are waiting for the condition "Ready", and the kubectl wait command is timing out even though the condition is as expected? As long as the condition is what you expect, sounds like there is not a Rook issue.

Yes, the condition looks valid but kubectl wait times out,

  - lastHeartbeatTime: "2024-04-07T19:49:56Z"
    lastTransitionTime: "2024-04-07T19:12:20Z"
    message: Cluster created successfully
    reason: ClusterCreated
    status: "True"
    type: Ready

This does not seems like a rook issue. But it times out on the on the cluster that does not report the "Progressing" condition - which is reported on the other cluster. Both cluster should have the same state. So something is going on with rook.

jnt2007 commented 5 months ago

I can confirm the same issue with a Kafka CRD instance and kubectl v1.29.4. kubectl get Kafka test

status:
  conditions:
  - lastTransitionTime: "2024-03-24T15:06:43.925578714Z"
    status: "True"
    type: Ready

kubectl wait kafkas test --for=condition=Ready --timeout=1m

error: timed out waiting for the condition on kafkas/test

WORKAROUND for kubectl 1.29+

kubectl wait kafkas test --for=jsonpath='{.status.conditions[?(@.type=="Ready")].status}'=True --timeout=5m

pavel-jancik commented 5 months ago

I hit the same issue with ceph as well (2 clusters), same having same conditions ; on one "kubectl wait --for=condition=Ready" works, on the other the same command times-out.

  conditions:
  - lastHeartbeatTime: "2024-04-26T22:01:05Z"
    lastTransitionTime: "2023-12-15T11:14:11Z"
    message: Cluster created successfully
    reason: ClusterCreated
    status: "True"
    type: Ready

@jnt2007: Thanks for workarround --for=jsonpath='{.status.conditions[?(@.type=="Ready")].status}'=True , it helped (note that you just need recent enough kubectl)

nirs commented 5 months ago

Looking in kubectl wait code, it ignores the flag value if the status observedGeneration does not match the object generation: https://github.com/kubernetes/kubectl/blob/5ff591adc68e5016618d45332422916af4489dc2/pkg/cmd/wait/wait.go#L583

Which looks right to me. Looking in resources I posted here: https://github.com/kubernetes/kubectl/issues/1414#issuecomment-2041587821

The resource with the issue:

$ kubectl get cephcluster my-cluster -n rook-ceph --context dr2 -o yaml 
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  ...
  generation: 3
...
status:
  ...
  conditions:
  - lastHeartbeatTime: "2024-04-07T19:49:56Z"
    lastTransitionTime: "2024-04-07T19:12:20Z"
    message: Cluster created successfully
    reason: ClusterCreated
    status: "True"
    type: Ready
  ...
  observedGeneration: 2

The resource without the issue:

$ kubectl get cephcluster my-cluster -n rook-ceph --context dr1 -o yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  generation: 3
  ...
status:
  conditions:
  - lastHeartbeatTime: "2024-04-07T19:13:50Z"
    lastTransitionTime: "2024-04-07T19:13:50Z"
    message: Processing OSD 0 on node "dr1"
    reason: ClusterProgressing
    status: "True"
    type: Progressing
  - lastHeartbeatTime: "2024-04-07T20:02:35Z"
    lastTransitionTime: "2024-04-07T19:13:53Z"
    message: Cluster created successfully
    reason: ClusterCreated
    status: "True"
    type: Ready

The second resource does not have observedGeneration, so the condition is respected.

So at least in my case kubectl wait is doing the right thing, and the workaround to wait for jsonpath='{.status.conditions[?(@.type=="Ready")].status}'=True may be incorrect - waiting for stale status.

eddiezane commented 4 months ago

/remove-lifecycle rotten /triage accepted

Something does seem wonky. We need to get a minimal and reliable way to reproduce this so we can step through a debugger.

mpuckett159 commented 4 months ago

jsonpath='{.status.conditions[?(@.type=="Ready")].status}'=True Just as a note to future viewers of this issue, this workaround looks to see if this object has ever been ready, not whether it is currently ready.

kubernetes / kubectl

kubectl wait sometime doesn't resolve ready condition #1414

WORKAROUND for kubectl 1.29+