hashicorp / terraform-provider-kubernetes

Terraform Kubernetes provider
https://www.terraform.io/docs/providers/kubernetes/
Mozilla Public License 2.0
1.58k stars 966 forks source link

Unable to use kubernetes provider with fixed limited permissions - see here: https://github.com/hashicorp/terraform-provider-azurerm/pull/21229 #2072

Open slzmruepp opened 1 year ago

slzmruepp commented 1 year ago

Terraform Version, Provider Version and Kubernetes Version

Terraform version: 1.4.4
Kubernetes provider version: 2.19.0
Kubernetes version: 1.24.9
Azurerm provicer: 3.51.0

Affected Resource(s)

Terraform Configuration Files

# Configure the Microsoft Azure Provider
provider "kubernetes" {
  host                   = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.host
  username               = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.username
  password               = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.password
  client_certificate     = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_certificate)
  client_key             = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_key)
  cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.cluster_ca_certificate)
}
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">= 3.51.0"
    }
    azuread = {
      source  = "hashicorp/azuread"
      version = ">= 2.36.0"
    }
    kubernetes = {
    source = "hashicorp/kubernetes"
    version = ">= 2.19.0"
    }
    helm = {
      source = "hashicorp/helm"
      version = "2.9.0"
    }
  }

  required_version = ">= 0.14.9"
  backend "azurerm" {
  }
}

data "azurerm_kubernetes_cluster" "aks_provider_config" {
  name                = var.env_config[var.ENV][ "aks_cluster_name" ]
  resource_group_name = var.env_config[var.ENV][ "aks_rg_name" ]
}

data "kubernetes_namespace_v1" "proj_ns" {
  metadata {
    name = local.proj_name
  }
}

Debug Output

Planning failed. Terraform encountered an error while generating this plan.
╷
│ Error: Unauthorized
│ 
│   with data.kubernetes_namespace_v1.proj_ns,
│   on var-proj.tf line 37, in data "kubernetes_namespace_v1" "proj_ns":
│   37: data "kubernetes_namespace_v1" "proj_ns" {
│ 
╵

Steps to Reproduce

See here: https://github.com/hashicorp/terraform-provider-azurerm/issues/21183

  1. Create AKS Cluster with Azure AD auth with RBAC and local accounts enabled
  2. Create a service principal
  3. Assign the principal Azure Kubernetes Service Cluster User Role to allow fetch the limited permission kubeconfig
  4. Assign the principal Azure Kubernetes Service RBAC Admin Role to a specific namespace
  5. Authenticate terraform with the specific service principal and configure the k8s provider
  6. Try to fetch the data.kubernetes_namespace_v1

Expected Behavior

Kubernetes resources should be able to be fetched via data and resources should be created according to the limited permissions in the specific namespace

Actual Behavior

Terraform returns with error "unauthorized"

Important Factoids

I did some testing and the outcome is, that the fetch of the limited permissions works now:

2023-04-11T13:21:45.761Z [DEBUG] provider.terraform-provider-azurerm_v3.51.0_x5: AzureRM Request: 
POST /subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.ContainerService/managedClusters/XXX/listClusterUserCredential?api-version=2023-02-02-preview HTTP/1.1
Host: management.azure.com
User-Agent: Go/go1.19.3 (amd64-linux) go-autorest/v14.2.1 hashicorp/go-azure-sdk/managedclusters/2023-02-02-preview HashiCorp Terraform/1.4.4 (+https://www.terraform.io) Terraform Plugin SDK/2.10.1 terraform-provider-azurerm/dev VSTS_2c406b0a-3caf-4961-98e2-e310b237dd52_build_241_0 pid-222c6c49-1b0a-5959-a213-6608f9eb8820
Content-Length: 0
Content-Type: application/json; charset=utf-8
X-Ms-Correlation-Request-Id: 16ef209b-5c71-1f06-9efe-412a949223cd
Accept-Encoding: gzip: timestamp=2023-04-11T13:21:45.761Z
2023-04-11T13:21:45.949Z [DEBUG] provider.terraform-provider-azurerm_v3.51.0_x5: AzureRM Response for https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.ContainerService/managedClusters/XXX/listClusterUserCredential?api-version=2023-02-02-preview: 
HTTP/2.0 200 OK
Cache-Control: no-cache
Content-Type: application/json
Date: Tue, 11 Apr 2023 13:21:44 GMT
Expires: -1
Pragma: no-cache
Server: nginx
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Accept-Encoding
X-Content-Type-Options: nosniff
X-Ms-Correlation-Request-Id: 16ef209b-5c71-1f06-9efe-412a949223cd
X-Ms-Ratelimit-Remaining-Subscription-Writes: 1198
X-Ms-Request-Id: 09f91280-9e21-4b60-bb28-1871c7e4a1d2
X-Ms-Routing-Request-Id: WESTEUROPE:20230411T132145Z:c88433b2-eb1d-4cf0-a40d-9f4c3d36dbd7

{
  "kubeconfigs": [
   {
    "name": "clusterUser",
    "value": "XXX"
   }
  ]
 }: timestamp=2023-04-11T13:21:45.949Z

The base64 decoded kubeconf (the value) looks correct:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: XXX
    server: https://XXX.azmk8s.io:443
  name: XXX
contexts:
- context:
    cluster: XXX
    user: clusterUser_XXX
  name: XXX
current-context: XXX
kind: Config
preferences: {}
users:
- name: clusterUser_XXX
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - get-token
      - --environment
      - AzurePublicCloud
      - --server-id
      - XXX
      - --client-id
      - XXX
      - --tenant-id
      - XXX
      - --login
      - devicecode
      command: kubelogin
      env: null
      provideClusterInfo: false

To debug, I tried to use this service principal sequence for kubectl. I use the following sequence:

az login --service-principal -u XXX -p XXX --tenant XXX

(This command fetches the identical kubeconfig as the terraform sequence)
az aks get-credentials --name XXX --resource-group XXX --overwrite-existing

(However, when I try to use `kubectl get all -n proj_ns` directly, i get the following:
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code XXX to authenticate.
It only works after I use kubelogin)
kubelogin convert-kubeconfig -l azurecli

After the kubelogin convert, the kubeconfig under .kube/config looks like this:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: XXX
    server: https://XXX.azmk8s.io:443
  name: XXX
contexts:
- context:
    cluster: XXX
    user: clusterUser_XXX
  name: XXX
current-context: XXX
kind: Config
preferences: {}
users:
- name: clusterUser_XXX
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - get-token
      - --login
      - azurecli
      - --server-id
      -  XXX
      command: kubelogin
      env: null
      provideClusterInfo: false

So I don't know what runs behind the terraform curtain, but I suspect the kubelogin part of the steps is not accounted for, thus getting the "unauthorized" response because it does not get the token from the azurecli context.

References

Community Note

slzmruepp commented 1 year ago

Tagging @browley86 here, I think this time its a kubernetes-provider issue.

slzmruepp commented 1 year ago

I guess this also is the fact for the helm provider. I link the issues: https://github.com/hashicorp/terraform-provider-helm/issues/1114

browley86 commented 1 year ago

Hey @slzmruepp, I actually think there is a way to get this working but the setup has to be done at the provider level. There is a post about this where the user uses the Exec plugins that the k8s provider exposes. I haven't had time to give it a go myself but that was my plan initially after getting the kubeconfig going using the new API endpoint that was just released. I'll try and get this going in the next few days.

browley86 commented 1 year ago

Ok so I got it to work, there is good news and bad news. The bad news is that, giving the permissions of the Service Principals, it cannot read the required --server-id field from the Kubernetes Enterprise App named "Azure Kubernetes Service AAD Server". There is a app registration in Azure, outlined via the blog post, that you need the Application Id from to use with kubelogin:

data "azuread_service_principal" "aks" {
  display_name = "Azure Kubernetes Service AAD Server"
}

Which is fed into the kubelogin:

      "--server-id",
     data.azuread_service_principal.aks.application_id,

My Service Principal, out of the box, doesn't have the rights to lookup that user. Instead, because I had the server id in my own .kube/config file, I took a shortcut and extended the HashiCorp Vault secret to include the app registration Application Id which got it working. Here is my provider block:

provider "kubernetes" {
  host = data.azurerm_kubernetes_cluster.aks.kube_config[0].host
  cluster_ca_certificate = base64decode(
    data.azurerm_kubernetes_cluster.aks.kube_config[0].cluster_ca_certificate,
  )

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "/usr/local/bin/kubelogin"
    args = [
      "get-token",
      "--login",
      "spn",
      "--environment",
      "AzurePublicCloud",
      "--tenant-id",
      data.vault_generic_secret.service_principal.data["tenantId"],
      "--server-id",
      data.vault_generic_secret.service_principal.data["azure_k8s_service_app_id"],
      "--client-id",
      data.vault_generic_secret.service_principal.data["clientId"],
      "--client-secret",
      data.vault_generic_secret.service_principal.data["clientSecret"]
    ]
  }
}

There may be another way to get the app id from the initial data source but I did not see an option on an admittedly quick scan. The vault workaround is hacky but good enough for me in the short term. Hope that helps.

slzmruepp commented 1 year ago

Thank you, but I consider as a workaround. This is not as documentation suggests. Is there an effort from hashicorp to solve this in the provider code?

Also, when the data structure in tf fetches the kubeconf from the listClusterUserCredential Endpoint, it gets back the kubeconf as a base64 encoded string. If you look at my post where I show the decoded kubeconf, this kubeconf contains the server-id. Is there a way to use the azcli token to authenticate to the cluster in the background?

But, I would suggest here this is a bug. First, the documentation implicates that it works OOB like this. See here under provider setup: https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/guides/getting-started

Second, there is no documentation about kubelogin use. I am talking about a sole CI/CD approach. We never run TF locally, only in the pipelines. So the pipeline agents would have kubelogin available. Isnt it enough to reformat the kubeconf in the background to use azcli? Basically what kubelogin does? Because when you run tf in pipeline, an azcli token must be existing somehow anyway, right?

From:

users:
- name: clusterUser_XXX
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - get-token
      - --environment
      - AzurePublicCloud
      - --server-id
      - XXX
      - --client-id
      - XXX
      - --tenant-id
      - XXX
      - --login
      - devicecode
      command: kubelogin
      env: null
      provideClusterInfo: false

to:

users:
- name: clusterUser_XXX
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - get-token
      - --login
      - azurecli
      - --server-id
      -  XXX
      command: kubelogin
      env: null
      provideClusterInfo: false

Thanks

browley86 commented 1 year ago

@slzmruepp - so uh, I did it. I didn't like it. But yeah:

yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"]

That will get it without my hack of hard-coding it in a different backend. Some context: I had to plan to a file then decode the file to actually see where in the JSON the value showed up. Aside from my references to my vault variable, the only other place it showed up is in the kube_config_raw. I then spent the better part of an hour hacking around to get the above. This is not definitely ideal. It feels like the provider should export this in some way and, unless I missed something, it looks like it only shows up in the kube_config_raw which then forces end-users to parse. I don't know that I would call this a "bug" on the provider side as kubelogin is the thing forcing this. That said, it would be a very nice feature to have this as an available export for the azurerm_kubernetes_cluster because, otherwise, people have to do the above hack.

Edit for completeness, here is the final provider block:

provider "kubernetes" {
  host = data.azurerm_kubernetes_cluster.aks.kube_config[0].host
  cluster_ca_certificate = base64decode(
    data.azurerm_kubernetes_cluster.aks.kube_config[0].cluster_ca_certificate,
  )

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "/usr/local/bin/kubelogin"
    args = [
      "get-token",
      "--login",
      "spn",
      "--environment",
      "AzurePublicCloud",
      "--tenant-id",
      data.vault_generic_secret.service_principal.data["tenantId"],
      "--server-id",
      yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"],
      "--client-id",
      data.vault_generic_secret.service_principal.data["clientId"],
      "--client-secret",
      data.vault_generic_secret.service_principal.data["clientSecret"]
    ]
  }
}
slzmruepp commented 1 year ago

Haha, yes this is quite a bit. Problem with Azure DevOps Pipelines and the terraform tasks is, there is no access to the sp secret. So this is all done in the background. Maybe one approach would be to fetch the kubeconf and put it on the filesystem, then run kubelogin convert-kubeconfig -l azurecli and in the provider section just refer to this kubeconf file. But how can such things run before provider init?

@browley86 thanks for the effort, but I dont think this is a feasible approach because to fetch the service principal keys from a key vault would have been possible all along. But we dont even know the secrets ourself because we create them programmatically and create directly the service connections in azure devops. So, I would suggest there should be a feasable solution from the provider itself, thus I filed this issue. The azurecli token is there, the terraform authenticates with it and it works. So the same service connection (SP) should be able to use the token to authenticate to the k8s plane.

We have kubelogin on our azure agents but how to configure properly is a mess. Also the documentation of the provider is still wrong because it just dont work...

I tried this but still get unauthorized when trying reading a data "kubernetes_namespace_v1" object:

# Configure the Microsoft Azure Provider
provider "kubernetes" {
  host                   = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.host
  username               = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.username
  password               = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.password
  client_certificate     = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_certificate)
  client_key             = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_key)
  cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.cluster_ca_certificate)

  exec {
    api_version    = "client.authentication.k8s.io/v1beta1"
    command       = "kubelogin"
    args = [
      "get-token",
      "--login",
      "azurecli",
      "--server-id",
      "Manually extracted for testing"
    ]
  }
}
browley86 commented 1 year ago

I dont think this is a feasible approach because to fetch the service principal keys from a key vault would have been possible all along. But we dont even know the secrets ourself because we create them programmatically and create directly the service connections in azure devops.

In fairness to those tracking the issue, this is a different problem though: the password for the service principal is exposed via the Service Principal Password which can then be referenced later in the run or, if another team is creating the SP via terraform in a different repo, they would need to put the password in some backend like Hashicorp vault or Azure Key Vault so that your user could pick it up later (it looks like the data lookups for SPs don't have the password).

Re: the idea of making the file, it might work using local_file with kube_config_raw and then throwing a depends_on for the provider. That said, that feels like it is once again getting out-of-scope for the actual issue: the azurerm_kubernetes_cluster resource provides no "nice way" of getting the server-id for kubelogin and should be added as an enhancement so people don't have hack around it. Kubelogin is not going away anytime soon so it would be very helpful.

Edit: just noticed there is local_sensitive_file which is way more appropriate for the kubeconfig, so some quick pseudo-code:

resource "local_sensitive_file" "kubeconfig" {
  content  = yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)
  filename = var.kubeconfig_filepath
}

The issue there though is now there's a file with sensitive stuff lying around and would need to be cleaned at the end of every run.

slzmruepp commented 1 year ago

So the catch22 when I try the following is, that the plan step always fails because the kubeconf is not existing yet, thus our pipelines fail.

resource "local_sensitive_file" "kubeconfig" {
  content  = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config_raw
  filename = "./kubeconfig"

  provisioner "local-exec" {
    command = "kubelogin convert-kubeconfig --login azurecli --kubeconfig ./kubeconfig"
  }
}

provider "kubernetes" {
  config_path = local_sensitive_file.kubeconfig.filename
}

Is there someone from Team Hashicorp watching this? Any Solutions? Thanks!

slzmruepp commented 1 year ago

Ok so now I got an acceptable solution: First, the Azure Kubernetes Service AAD Server Enterprise Application ID is the same for each cluster in the same tenant. It is even the same in different subscriptions. So if you run more AKS clusters per environment, the Application ID == --server-id is the SAME. We now hardcoded the id as variable and I can say this works:

provider "kubernetes" {
  host                   = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.host
  cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.cluster_ca_certificate)

  exec {
    api_version    = "client.authentication.k8s.io/v1beta1"
    command       = "kubelogin"
    args = [
      "get-token",
      "--login",
      "azurecli",
      "--server-id",
      var.env_config[var.ENV][ "server_id" ]
    ]
  }
}

So the main issue with my former approach was that I was under the impression that if I dont remove these lines:

  username               = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.username
  password               = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.password
  client_certificate     = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_certificate)
  client_key             = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_key)

from the provider config, it still uses the kubelogin token. BUT IT DONT. IT TRIES TO AUTH WITH THE CERT and KEY. Thats why my first approach failed...

So if you run your tf in an azcli token environment this is the way to go...

slzmruepp commented 1 year ago

Still I think this should be a built in feature of the provider to support such behavior without hacking the exec plugin.

sheneska commented 1 year ago

@slzmruepp - so uh, I did it. I didn't like it. But yeah:

yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"]

That will get it without my hack of hard-coding it in a different backend. Some context: I had to plan to a file then decode the file to actually see where in the JSON the value showed up. Aside from my references to my vault variable, the only other place it showed up is in the kube_config_raw. I then spent the better part of an hour hacking around to get the above. This is not definitely ideal. It feels like the provider should export this in some way and, unless I missed something, it looks like it only shows up in the kube_config_raw which then forces end-users to parse. I don't know that I would call this a "bug" on the provider side as kubelogin is the thing forcing this. That said, it would be a very nice feature to have this as an available export for the azurerm_kubernetes_cluster because, otherwise, people have to do the above hack.

Edit for completeness, here is the final provider block:

provider "kubernetes" {
  host = data.azurerm_kubernetes_cluster.aks.kube_config[0].host
  cluster_ca_certificate = base64decode(
    data.azurerm_kubernetes_cluster.aks.kube_config[0].cluster_ca_certificate,
  )

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "/usr/local/bin/kubelogin"
    args = [
      "get-token",
      "--login",
      "spn",
      "--environment",
      "AzurePublicCloud",
      "--tenant-id",
      data.vault_generic_secret.service_principal.data["tenantId"],
      "--server-id",
      yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"],
      "--client-id",
      data.vault_generic_secret.service_principal.data["clientId"],
      "--client-secret",
      data.vault_generic_secret.service_principal.data["clientSecret"]
    ]
  }
}

Hi @slzmruepp according to our documentation this is the correct way to configure the provider for auth plugins.

browley86 commented 1 year ago

@sheneska - could you please provide a link to that documentation?

mruepp commented 1 year ago

©sheneska Obviously this is not the way how it works when you use azurecli context login which a lot of TF Tasks in pipelines do. So either way is not part of the documentation so far. At least I did not find it. So this feels pretty hacky and only some sources on the internet document at least the SPN way of configure the exec plugin of the provider. So if this is the "official way", I would certainly expect this to be documented. See here: https://github.com/hashicorp/terraform-provider-kubernetes/tree/main/_examples/aks https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs#exec-plugins https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/guides/getting-started#provider-setup

All documentation points to a certainly NOT working approach for limited permissions including AKS kubelogin.

sheneska commented 1 year ago

Hi @browley86 we are documenting the use of using exec plugins, however we are not able to document every configuration argument for every plugin that is available. Please refer to the documentation for the specific plugins on how to actual configure them.

browley86 commented 1 year ago

@sheneska, so totally get it, the exec statement is a sort of catch-all for kubernetes plugins. Documenting every use case is impossible. That said, in the case of Azure AKS service, they seem to have at least, for the time being, standardized on kubelogin so documenting that use case would probably be worthwhile. That said, my ask here is less as a bug and more of a feature request: it would be extremely nice/convenient for the data object to expose the server ID for the kubelogin plugin. So, for example, instead of using:

  "--server-id",
      yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"],

It would be way nicer to use

  "--server-id",
     data.azurerm_kubernetes_cluster.aks.server_id

Considering the terraform provider gets this as part of it's response in the code, it would be nice to be able to expose it as an additional output so the end-user(s) can leverage it without having to hack the kube_config_raw portion of the data lookup as I did above. Hopefully that makes sense but if not please let me know.

philippbussche commented 6 months ago

@browley86 would it make it easier to use something like

"--tenant-id",
data.azurerm_client_config.current.tenant_id,

and

"--client-id",
data.azurerm_client_config.current.client_id,

to lookup those two from the current context rather than having to also look them up from your vault / key_vault ?

It does not help with getting the server_id however ;)

philippbussche commented 6 months ago

@sheneska, so totally get it, the exec statement is a sort of catch-all for kubernetes plugins. Documenting every use case is impossible. That said, in the case of Azure AKS service, they seem to have at least, for the time being, standardized on kubelogin so documenting that use case would probably be worthwhile. That said, my ask here is less as a bug and more of a feature request: it would be extremely nice/convenient for the data object to expose the server ID for the kubelogin plugin. So, for example, instead of using:

  "--server-id",
      yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"],

It would be way nicer to use

  "--server-id",
     data.azurerm_kubernetes_cluster.aks.server_id

Considering the terraform provider gets this as part of it's response in the code, it would be nice to be able to expose it as an additional output so the end-user(s) can leverage it without having to hack the kube_config_raw portion of the data lookup as I did above. Hopefully that makes sense but if not please let me know.

Btw. it looks like the yamldecode is not working anymore or it is not working because of our AKS API setup here (private API server with VNet integration):

╷
│ Error: Invalid index
│ 
│   on .terraform/modules/aks/versions.tf line 34, in provider "kubernetes":
│   34:       yamldecode(azurerm_kubernetes_cluster.kubernetes_cluster.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"],
│ 
│ The given key does not identify an element in this collection value.
╵
brow86 commented 6 months ago

@sheneska, so totally get it, the exec statement is a sort of catch-all for kubernetes plugins. Documenting every use case is impossible. That said, in the case of Azure AKS service, they seem to have at least, for the time being, standardized on kubelogin so documenting that use case would probably be worthwhile. That said, my ask here is less as a bug and more of a feature request: it would be extremely nice/convenient for the data object to expose the server ID for the kubelogin plugin. So, for example, instead of using:

  "--server-id",
      yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"],

It would be way nicer to use

  "--server-id",
     data.azurerm_kubernetes_cluster.aks.server_id

Considering the terraform provider gets this as part of it's response in the code, it would be nice to be able to expose it as an additional output so the end-user(s) can leverage it without having to hack the kube_config_raw portion of the data lookup as I did above. Hopefully that makes sense but if not please let me know.

Btw. it looks like the yamldecode is not working anymore or it is not working because of our AKS API setup here (private API server with VNet integration):

╷
│ Error: Invalid index
│ 
│   on .terraform/modules/aks/versions.tf line 34, in provider "kubernetes":
│   34:       yamldecode(azurerm_kubernetes_cluster.kubernetes_cluster.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"],
│ 
│ The given key does not identify an element in this collection value.
╵

So my above account got swallowed but the email made its way to my personal account. Anywho, I had the wrong key above, it should be:

  server_id              = yamldecode(azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["exec"]["args"][4]
cveld commented 5 months ago

I am not sure why this extended server_id is required in your context?

I got the following code working:

# data "azuread_service_principal" "aks" {
#   display_name = "Azure Kubernetes Service AAD Server"
# }

provider "kubernetes" {
  host = module.aks.cluster.kube_config.0.host
  cluster_ca_certificate = base64decode(
    module.aks.cluster.kube_config[0].cluster_ca_certificate,
  )
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "kubelogin"
    args = [
      "get-token",
      "--login",
      "azurecli",
      "--server-id",
      "6dae42f8-4368-4678-94ff-3960e28e3630" # data.azuread_service_principal.aks.client_id
    ]
  }
}
resource "kubernetes_namespace" "default" {
  metadata {
    name = "helloworld"
  }
}

I have local accounts disabled and I am using my user account with the "Azure Kubernetes Service RBAC Cluster Admin" role assigned. I would expect that any azure cli authenticated context would work here?

brow86 commented 4 months ago

I am not sure why this extended server_id is required in your context?

I got the following code working:


# data "azuread_service_principal" "aks" {
#   display_name = "Azure Kubernetes Service AAD Server"
# }

Sorry for the long delay but I just wanted to close the loop here: this, above, is the best answer.  In short, Microsoft creates an Enterprise Application called "Azure Kubernetes Service AAD Server" and the Application ID of that Enterprise app is the `server_id`.  A quick aside, this blew my mind 🤯 .  Anyway, instead of using the `kube_config_raw` path returned by the cluster build, it is way easier to just use a data lookup:
data "azuread_service_principal" "aks" {
  display_name = "Azure Kubernetes Service AAD Server"
}

provider "kubectl" {
  host = module.aks.cluster.kube_config.0.host
  cluster_ca_certificate = base64decode(
    module.aks.cluster.kube_config[0].cluster_ca_certificate,
  )
  load_config_file       = false

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "kubelogin"
    args = [
      "get-token",
      "--login",
      "msi",
      "--client-id",
      <CLIENT_ID of managed identity>,
      "--server-id",
      data.azuread_service_principal.aks.client_id
    ]
  }
}

A few notes though: this is all to workaround the fact that some people consider client_ids and object_ids to be sensitive. If this is the case the data lookup with a sensitive wrapper will work here. That said, the bigger potential issue is the separation of concerns: by using the data lookup the SP/managed identity will need access to read AD which means a whole other provider setup (azuread vs azurerm) and, in my case, giving the managed identity permissions to do AD look-ups. In the case of a limited environment and/or very strict permissions, this may not be available.

TLDR: If you are limited to azurerm provider, use the kube_config_raw path from the AKS output, otherwise, get azuread working with the SP/managed identity and use the data lookup.

KimiaJM-visma commented 3 months ago

Hi everyone!

I found this issue looking for solutions and thanks to this last response I was able to understand it and it worked on my case. So I share it here in case it helps others.

In our case, we provided all the necessary roles to the Service Principal (to be cluster-admin in Kubernetes), but it was still giving 401 Unauthorized, and this was because I was trying to use a wrong --server-id. In the kubelogin documentation they explain that this is the application used by the server side and that the access token accessing AKS clusters need to be issued for this app. As other were commenting earlier, this is the application id of the Microsoft-managed enterprise application named "Azure Kubernetes Service AAD Server". When I used this specific fixed guid for the --server-id option... Then it started working as expected!

It also blew my mind as it's not clearly documented anywhere in Azure's documentation... But at least it's on kubelogin's documentation. In case it helps, that was the missing piece in our case

This is the magic --server-id we were missing:

  6dae42f8-4368-4678-94ff-3960e28e3630

I hope it helps others with the same problem.