Open pneigel-ca opened 1 year ago
The same code, when deployed to another environment, produces a similar but different error:
Error: Creating Catalog V2: Timeout getting Catalog V2 Client at cluster ID c-lc6wd: Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [lost connection to cluster: failed to find Session for client stv-cluster-c-lc6wd] from [https://<server_url>/k8s/clusters/c-lc6wd/v1]
I'm experiencing the same problem with all _v2 resources, including storage, which randomly fails alone with catalog and apps. I'm unable to get a clean full cluster creation unless I plan / apply multiple times, progressing each time until it succeeds.
cut and paste of terraform apply logs:
Tue, 16 Apr 2024 03:42:27 GMT rancher2_cluster_sync.wait_cluster_ready: Still creating... [2m30s elapsed]
Tue, 16 Apr 2024 03:42:37 GMT rancher2_cluster_sync.wait_cluster_ready: Still creating... [2m40s elapsed]
Error: mError: [ERROR] waiting for cluster ID (c-t6kzx) downloading catalogs: [ERROR] getting catalog V2 list at cluster ID (c-t6kzx): Timeout getting catalog V2 list at cluster ID c-t6kzx: Unknown schema type [catalog.cattle.io.clusterrepo]
Tue, 16 Apr 2024 03:54:21 GMT │
Tue, 16 Apr 2024 03:54:21 GMT │ with rancher2_cluster_sync.wait_cluster_ready,
Tue, 16 Apr 2024 03:54:21 GMT │ on main.tf line 403, in resource "rancher2_cluster_sync" "wait_cluster_ready":
Tue, 16 Apr 2024 03:54:21 GMT │ 403: resource "rancher2_cluster_sync" "wait_cluster_ready" {
Tue, 16 Apr 2024 03:54:21 GMT │
Tue, 16 Apr 2024 03:54:21 GMT ╵
this is the resource definition:
resource "rancher2_cluster_sync" "wait_cluster_ready" {
cluster_id = module.rancher2_import.cluster_id
wait_catalogs = true
}
🤷🏻
I still experience this issue in 2024 with the latest provider and a fresh cluster on Rancher 2.8.3.
I'm experimenting with state_confirm
values in the rancher2_cluster_sync
resource to see if it's just a matter of waiting longer. Rather hacky solution though. Also with a bigger timeout
in the rancher2
provider.
Same here. Rancher 2.8.3, always have to re-apply a 2nd time my plan then it goes through.
Error: Creating secret V2: Timeout getting Catalog V2 Client at cluster ID c-w9szc: Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [lost connection to cluster: failed to find Session for client stv-cluster-c-w9szc] from
I do use the rancher2_cluster_sync
resource as well with state_confirm
. The reason I do this is to allow cluster repositories to sync before we deploy apps. Sometimes when a new helm version is available, it`s not immediately added when the repo is (for example, Longhorn ), you have to manually click refresh or wait until the cattle-cluster-agent triggers a refresh for the catalog/repo.
Before that, I would get a error saying there was no such version for the helm chart I was installing.
Rancher Server Setup
Information about the Cluster
User Information
Provider Information
Describe the bug
When provisioning a new downstream cluster with terraform and automation, the cluster is created but resources in the downstream cluster encounter an error. Re-applying the same terraform after a short period of time works without issue.
To Reproduce
Create a fresh, moderate sized cluster (~20min to create) and deploy resources to the clusters' namespaces.
Actual Result
Creating resources fails with an odd error, but can be reapplied without issue or modification.
Expected Result
Cluster sync should understand when the cluster is ready for resource provisioning.
Other information
Similar issue which was "resolved" here, newer reports of the issue are almost identical so I opened a new issue: https://github.com/rancher/terraform-provider-rancher2/issues/662
We use
rancher2_cluster_sync
resources to ensure the cluster is up and available. Re-applying the same code with no changes works without issue.Fail:
Rerun:
We are experiencing the problem with both
rancher2_catalog_v2
as well asrancher2_app_v2
resources.