Open varunthakur2480 opened 1 year ago
@varunthakur2480 help me to understand how we can help here. I don't see any specific google provider resource. Can you provide details and be specific?
sorry I forgot to mention some more details
data "google_client_config" "default" {} resource is responsible for fetching the cluster data along with the temp oauth token which is then used to run terraform operations. After the cluster certificates are rotated using gcloud command , terraform plan shows that the data rendered by google_client_config data resource pulls correct API endpoint of the cluster but the O-auth token does not get refreshed and hence everything starts to fail
We had to rebuild the cluster to fix it
@varunthakur2480 I noticed what you said but the O-auth token does not get refreshed
. Can you share the debug log? I want to see how google_client_config
is called?
I dont have debug log now as the cluster was rebuilt . Will try to recreate the issue in dev and share it next week
Are you running plan with or without refresh? If refresh is disabled I would expect this to happen
FWIW terraform plan
can unexpectedly preserve values when we'd expect them to change- and terraform refresh
will change them. I've never figured out the exact mechanics.
Teraform refresh is not supported for Remote backends and it is also worth mentioning that it is deprecated in latest versions https://developer.hashicorp.com/terraform/cli/commands/refresh
Error: error starting operation:
The "remote" backend does not support the "OperationTypeRefresh" operation.```
@varunthakur2480 waiting for your debug log and steps that I can use to repro the issue
I'm fairly certain that the google_client_config data source pulls the token from your local authentication source of the Terraform provider. What authentication method are you using for Terraform, and are you updating it after you cycle the GKE cluster certificate?
I have debug logs available now, is there a way to share them privately as they might contain some classified info
@varunthakur2480 https://gist.github.com/ is the place you can use. It is public, and you need to redact any secrets you don't want to share.
did you get a chance to look at the logs ?
I have a broken cluster just to provide more information if required, so I am wondering if you would get time this week to look at the debug logs?
@varunthakur2480 your terraform version (v0.14.10) is pretty old. Is it possible to try with the latest version?
I do see below line in your log. I notice module.default.module.default.data.google_client_config.current
. It appears module - module -. Not sure what impact it has from that levels. Are you able to try put the data.google_client_config.current at the root level to see if you can repro?
2023/03/20 07:31:46 [WARN] Provider "registry.terraform.io/hashicorp/google" produced an unexpected new value for module.default.module.default.data.google_client_config.current.
- .access_token: inconsistent values for sensitive attribute
I have attached another debug log for a smaller component let me know if that helps run-ExyaP5VkmrSMykwQ-plan-log.txt
Upgrade to latest tf is not possible due to sentinel limitations
@trodge besides the terraform version, what else can you think of that could cause the issue?
the issue is with google provider, I'm not sure how terraform version upgrade can fix it. We are on relatively new version of provider already
After the cluster certificates are rotated using gcloud command , terraform plan shows that the data rendered by google_client_config data resource pulls correct API endpoint of the cluster but the O-auth token does not get refreshed and hence everything starts to fail
I'm also a bit unclear with what the issue is, but I wanted to explore the idea of the access token not being refreshed.
From your config I can see that you configure the provider with access_token = data.xxxxx.gke_cluster_viewer.data["token"]
. This means that when the provider is configured in the early stages of a plan/apply step this code is hit (see it's in a block handling a scenario when the user configures the provider with an access token). I'll return to this info in the next paragraph.
When data "google_client_config" "default" {}
uses the provider's client to get an access token it uses the token source set within the provider. That means it uses the token source created in the code I linked above, which looks like:
return googleoauth.Credentials{
TokenSource: StaticTokenSource{oauth2.StaticTokenSource(token)},
}, nil
The token source made there, oauth2.StaticTokenSource, returns the same token without refreshing it. The oauth2 documentation for this method (https://pkg.go.dev/golang.org/x/oauth2#StaticTokenSource) says "Because the provided token t is never refreshed, StaticTokenSource is only useful for tokens that never expire.".
So it sounds like you expected the token to be refreshed, but this method doesn't allow the token to be refreshed and instead returns the same token assuming that it doesn't expire. Could you please confirm whether the access_token your configuration uses from Vault data.xxxxx.gke_cluster_viewer.data["token"]
token expires or not?
Additionally: I quickly checked what token source is used when the provider is configured with credentials
instead of access_token
, and I see it's oauth2.reuseTokenSource
, which looks like it means that tokens returned by google_client_config
when the provider is configured with credentials
will refresh?
If you can't change the method of how you configure the google provider in your Terraform project then (if I'm not completely wrong) I think you may need to open a feature request? I'll ask internally
thanks for detailed response , for clarifications I am adding some more context , but it seems that the assumption of token remaining static in code is in correct.
So the problem I am facing is that kubernetes/GKE good practices expect that we regenerate the GKE cluster certificates manually every few months/years Basically when you roll the certificates as mentioned here https://cloud.google.com/kubernetes-engine/docs/how-to/credential-rotation you need to reauthenticate to the cluster and old credentials will not work. This works fine for normal authentication but however it fails for us as you pointed out that the tokens are expected to remain static. Guess this needs to change In the mean time I will explore if we can use oauth2.reuseTokenSource
I checked vault token bit and unfortunately it needs to be access_token and can't be of type credentials in order to prevent leakage of keys Also vault token is set to expire every 2 hours
Thanks for checking!
So to summarise so far, you're configuring the google
provider with an access token coming from Vault. That same access token is then being retrieved by data "google_client_config" "default" {}
. Due to how that data source works, the token is not refreshed at any point.
A possible concern is that unrefreshed tokens expire, but I don't think your issue is the access token expiring. The plan you've shared here starts at 2023-04-03T04:28:24 and the request that fails is at 2023-04-03T04:28:30. Tokens last 1 hour by default and your error occurs very quickly.
Is the output from data "google_client_config" "default" {}
used to configure the kubernetes provider? Could you please post some details about how the google_client_config data source is used.
In the plan you shared here I see the failing request is a GET /api/v1/namespaces/prv1-e2-prv-exampletf/configmaps/my-config done by the kubernetes provider. The auth is a bearer token that looks like it's a kubernetes service account token (versus GCP service account) after I looked at it's contents:
{
"iss": "kubernetes/serviceaccount",
"kubernetes.io/serviceaccount/namespace": "value changed",
"kubernetes.io/serviceaccount/secret.name": "value changed",
"kubernetes.io/serviceaccount/service-account.name": "value changed",
"kubernetes.io/serviceaccount/service-account.uid": "value changed",
"sub": "value changed"
}
It's starting to feel like something is incorrect with tokens made in your cluster perhaps? Though I admit it's been a while since I've done k8s work.
I was reading this blog post here for some ideas... could you can inspect the Secret that the kubernetes service account token comes from (e.g. terraform-token-2kdzg
) and see if the data.ca.crt value is correct for the new certificates?
If that doesn't help in any way I can ask someone from the kubernetes provider team if they have any ideas.
Also, is the access token supplied from Vault set up with the scope https://www.googleapis.com/auth/userinfo.email
?
there is also a similar issue here though I am not sure if this any any relation to ours https://github.com/hashicorp/terraform/issues/27741
Based on your comment I forced recreation of vault token still getting the same issue
Community Note
modular-magician
user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned tohashibot
, a community member has claimed the issue already.Terraform Version
Terraform version: v0.14.11 Kubernetes Provider version: 2.15.0 Kubernetes version: 1.23 google-beta/4.15.0 google/4.15.0
Affected Resource(s)
Terraform Configuration Files
Debug Output
:31:15] [2023-02-27 08:31:15] Error: Invalid configuration for API client [2023-02-27 08:31:15] [2023-02-27 08:31:15] on ../../../modules/flux-setup/flux.tf line 37, in resource "kubernetes_manifest" "application_kustomize": [2023-02-27 08:31:15] 37: resource "kubernetes_manifest" "application_kustomize" { [2023-02-27 08:31:15] [2023-02-27 08:31:15] Get "https://10.124.239.189/apis": Service Unavailable [2023-02-27 08:31:15]
Panic Output
data "google_client_config" "default" {} seems to generate an oauth token on the fly , but after the cluster cert rotation oauth token expired and the data resource was not able to regenerate the token . I even tried deleting the resource from tfstate to force it to refresh, but that did not help
Expected Behavior
Plan should have succeeded
Actual Behavior
[2023-02-27 08:31:15] Get "https://10.124.239.189/apis": Service Unavailable
Steps to Reproduce
Cycle the cluster certificates using gcloud command Run terraform plan
References
0000