siderolabs / terraform-provider-talos

Mozilla Public License 2.0
123 stars 17 forks source link

`talos_cluster_health` takes 5s to pass #145

Closed zargony closed 7 months ago

zargony commented 9 months ago

I noticed (while trying around with #143), that even if the cluster is healthy, the talos_cluster_health check takes about 5s to pass. It seems like an unusual long delay during terraform apply. Is this intended behavior?

data.talos_cluster_health.cluster: Reading...
data.talos_cluster_health.cluster: Read complete after 5s [id=cluster_health]
frezbo commented 9 months ago

Yes, it is, it does a multitude of checks which seems like expected check times.

smira commented 9 months ago

There's one check we could probably improve on Talos side - that is specifically a check for the node to finish boot sequence, it's certainly not optimal.

zargony commented 8 months ago

Thanks for clarifying. I get that there's a multitude of checks to do to ensure that the cluster is up and running. It makes a lot sense waiting for everything to settle when the cluster is created. On the other hand, with a running cluster, this causes a 5s delay every time Terraform refreshes its state (i.e. every time you run plan or apply), which is a little tedious since basically all k8s resources depend on this health check. For me, running terraform refresh went up from 1.3 seconds to 6.2 seconds. It's probably no big deal for CD workflows, but kind of annoying for local development (it felt a lot quicker before talos_cluster_health was introduced). Maybe it would be worth introducing a quick check option in some way? E.g. talosctl dashboard comes up very quickly with the system being healthy or not.

smira commented 7 months ago

Fixed in latest Talos 1.6.x and Talos 1.7+