stefanprodan / timoni

Timoni is a package manager for Kubernetes, powered by CUE and inspired by Helm.
https://timoni.sh
Apache License 2.0
1.55k stars 68 forks source link

timoni bundle apply is consistently timing out #421

Open philipsd6 opened 2 months ago

philipsd6 commented 2 months ago

For example:

╰─❯ SOMEAPP_CLAIM="claim-GcXZo1u7sf793zpSuYnQ" timoni bundle apply -f someapp.cue --runtime-from-env --force --overwrite-ownership
3:18PM INF b:someapp > applying 1 instance(s)
3:18PM INF b:someapp > i:someapp > applying module timoni.sh/someapp version 0.0.6
3:18PM INF b:someapp > i:someapp > upgrading someapp in namespace someapp
3:18PM INF b:someapp > i:someapp > ConfigMap/someapp/someapp-21ca602d configured
3:18PM INF b:someapp > i:someapp > Service/someapp/someapp configured
3:18PM INF b:someapp > i:someapp > Service/someapp/someapp-udp configured
3:18PM INF b:someapp > i:someapp > Deployment/someapp/someapp configured
3:18PM INF b:someapp > i:someapp > PersistentVolume/someapp-config configured
3:18PM INF b:someapp > i:someapp > PersistentVolume/someapp-media configured
3:18PM INF b:someapp > i:someapp > PersistentVolumeClaim/someapp/config configured
3:18PM INF b:someapp > i:someapp > PersistentVolumeClaim/someapp/media configured
3:18PM INF b:someapp > i:someapp > Ingress/someapp/someapp configured

Even though everything is gets reconciled and working ok... it eventual times out and I get this:

3:23PM ERR [timeout waiting for: [PersistentVolume/someapp/someapp-media status: 'Unknown': client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline, Service/someapp/someapp status: 'Unknown': client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline, Service/someapp/someapp-udp status: 'Unknown': client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline, PersistentVolumeClaim/someapp/config status: 'Unknown': client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline, Ingress/someapp/someapp status: 'Unknown': client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline, PersistentVolume/someapp/someapp-config status: 'Unknown': client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline, ConfigMap/someapp/someapp-21ca602d status: 'Unknown': client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline, PersistentVolumeClaim/someapp/media status: 'Unknown': client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline, Deployment/someapp/someapp status: 'Unknown': client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline], storing instance failed: client rate limiter Wait returned an error: contextdeadline exceeded]

This also happens when I use timoni apply not just when doing it as a bundle.

stefanprodan commented 2 months ago

I guess the PV is stuck, can you check your cluster and see if that's the case, describing the applied objects with kubectl should tell you.

philipsd6 commented 2 months ago

All the stuff I deploy with Timoni includes PVs, because I need to direct them to a hostPath. So I haven't tested on anything without PVs yet. But:

╰─❯ SOMEAPP_CLAIM="claim-GcXZo1u7sf793zpSuYnQ" timoni bundle apply -f someapp.cue --runtime-from-env 7:27PM INF b:someapp > applying 1 instance(s) 7:27PM INF b:someapp > i:someapp > applying module timoni.sh/someapp version 0.0.6 7:27PM INF b:someapp > i:someapp > upgrading someapp in namespace someapp 7:27PM INF b:someapp > i:someapp > ConfigMap/someapp/someapp-21ca602d unchanged 7:27PM INF b:someapp > i:someapp > Service/someapp/someapp unchanged 7:27PM INF b:someapp > i:someapp > Service/someapp/someapp-udp unchanged 7:27PM INF b:someapp > i:someapp > Deployment/someapp/someapp unchanged 7:27PM INF b:someapp > i:someapp > PersistentVolume/someapp-config unchanged 7:27PM INF b:someapp > i:someapp > PersistentVolume/someapp-media unchanged 7:27PM INF b:someapp > i:someapp > PersistentVolumeClaim/someapp/config unchanged 7:27PM INF b:someapp > i:someapp > PersistentVolumeClaim/someapp/media unchanged 7:27PM INF b:someapp > i:someapp > Ingress/someapp/someapp unchanged

Since it's already installed and running fine, running it again shows that everything is unchanged. But it's still spinning waiting for 9 resource(s) to become ready... and then timing out as above. Meanwhile, my PVCs report status as bound, no events, and my PVs report the same.

philipsd6 commented 2 months ago

And interestingly a timoni status -n someapp someapp shows a long previous installation:

╰─❯ timoni status -n someapp someapp
7:33PM INF i:someapp > last applied 2024-08-23T20:22:42Z
7:33PM INF i:someapp > module oci://ghcr.io/philipsd6/timoni-someapp:0.0.2
7:33PM INF i:someapp > digest sha256:b8a1f598b8fb0e5f4fba39d8c690d04b8fb23d17a25c9cf3a3da04e8febd7b7c
7:33PM INF i:someapp > container image someapp/someapp:beta
7:33PM INF i:someapp > ConfigMap/someapp/someapp-21ca602d Current - Resource is always ready
7:33PM INF i:someapp > Service/someapp/someapp Current - Service is ready
7:33PM INF i:someapp > Deployment/someapp/someapp Current - Deployment is available. Replicas: 1
7:33PM INF i:someapp > PersistentVolume/someapp-config Current - Resource is current
7:33PM INF i:someapp > PersistentVolume/someapp-media Current - Resource is current
7:33PM INF i:someapp > PersistentVolumeClaim/someapp/config Current - PVC is Bound
7:33PM INF i:someapp > PersistentVolumeClaim/someapp/media Current - PVC is Bound
7:33PM INF i:someapp > Ingress/someapp/someapp Current - Resource is current

Deleting the timoni.someapp secret doesn't resolve the problem, but it gets recreated on the next timony apply with the latest into.

stefanprodan commented 2 months ago

Run the apply of 0.0.6 with --wait=false, this should update the instance state accordingly.

philipsd6 commented 2 months ago

Well yes, that does let it work without timing out, but it's a bit like:

Patient: Doctor, my arm really hurts when I bend it this way, do you have any advice?" Doctor: Don't bend it that way.

stefanprodan commented 2 months ago

The timeout comes from an applied resource that doesn't become ready, you need to describe each applied resource and see which one is stuck. Once you figure out which one fails, we can then look into how to improve the health check code to report it.

philipsd6 commented 2 months ago

It seems that the resource type that is causing the issue is my PersistentVolumes. When I apply a bundle that uses PVs, I eventually get this:

9:13AM ERR [timeout waiting for: [Deployment/nodered/nodered status: 'InProgress', PersistentVolume/nodered/nodered-nodered status: 'Unknown'], storing instance failed: client rate limiter Wait returned an error: context deadline exceeded]

But as I said, everything is in the correct state, including the PVs -- they are properly bound to the PVCs:

⏎╰─❯ kubectl get pv nodered-nodered -o json | gron | grep -i status
json.status = {};
json.status.lastPhaseTransitionTime = "2024-08-27T21:46:45Z";
json.status.phase = "Bound";
╰─❯ kubectl -n nodered get pvc nodered -o json | gron | grep status
json.status = {};
json.status.accessModes = [];
json.status.accessModes[0] = "ReadWriteOnce";
json.status.capacity = {};
json.status.capacity.storage = "1Gi";
json.status.phase = "Bound";