oracle-terraform-modules / terraform-oci-oke

The Terraform OKE Module Installer for Oracle Cloud Infrastructure provides a Terraform module that provisions the necessary resources for Oracle Container Engine.
https://oracle-terraform-modules.github.io/terraform-oci-oke/
Universal Permissive License v1.0
147 stars 203 forks source link

5.x: v5.0.0-beta.3 , operator host, cannot `kubectl get pods`, instance_principal problem? #776

Closed rodrigc closed 1 year ago

rodrigc commented 1 year ago

Community Note

Terraform Version and Provider Version

terraform -v
2023-07-19T12:59:56.124-0700 [INFO]  Terraform version: 1.4.4
2023-07-19T12:59:56.124-0700 [DEBUG] using github.com/hashicorp/go-tfe v1.18.0
2023-07-19T12:59:56.124-0700 [DEBUG] using github.com/hashicorp/hcl/v2 v2.16.2
2023-07-19T12:59:56.124-0700 [DEBUG] using github.com/hashicorp/terraform-config-inspect v0.0.0-20210209133302-4fd17a0faac2
2023-07-19T12:59:56.124-0700 [DEBUG] using github.com/hashicorp/terraform-svchost v0.1.0
2023-07-19T12:59:56.124-0700 [DEBUG] using github.com/zclconf/go-cty v1.12.1
2023-07-19T12:59:56.124-0700 [INFO]  Go runtime version: go1.19.6
2023-07-19T12:59:56.124-0700 [INFO]  CLI args: []string{"/usr/local/Cellar/tfenv/3.0.0/versions/1.4.4/terraform", "-v"}
2023-07-19T12:59:56.125-0700 [DEBUG] Attempting to open CLI config file: /Users/crodrigues/.terraformrc
2023-07-19T12:59:56.125-0700 [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2023-07-19T12:59:56.127-0700 [DEBUG] ignoring non-existing provider search directory terraform.d/plugins
2023-07-19T12:59:56.127-0700 [DEBUG] ignoring non-existing provider search directory /Users/crodrigues/.terraform.d/plugins
2023-07-19T12:59:56.127-0700 [DEBUG] ignoring non-existing provider search directory /Users/crodrigues/Library/Application Support/io.terraform/plugins
2023-07-19T12:59:56.127-0700 [DEBUG] ignoring non-existing provider search directory /Library/Application Support/io.terraform/plugins
2023-07-19T12:59:56.129-0700 [INFO]  CLI command args: []string{"version", "-v"}
Terraform v1.4.4
on darwin_amd64
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/helm v2.9.0
+ provider registry.terraform.io/hashicorp/http v3.2.1
+ provider registry.terraform.io/hashicorp/null v3.2.1
+ provider registry.terraform.io/hashicorp/random v3.4.3
+ provider registry.terraform.io/hashicorp/time v0.9.1
+ provider registry.terraform.io/oracle/oci v4.115.0

Your version of Terraform is out of date! The latest version
is 1.5.3. You can update by downloading from https://www.terraform.io/downloads.html

Affected Resource(s)

Terraform Configuration Files

  module "oke_prereqs" {
    source                   = "../../modules/oke_prereqs"
    compartment_id           = local.tenancy_id
    name                     = local.name
    region                   = var.region
    config_file_profile      = var.config_file_profile
    home_config_file_profile = var.home_config_file_profile
  }

  module "oke" {
    source  = "oracle-terraform-modules/oke/oci"
    version = "5.0.0-beta.3"

    cluster_name            = local.name
    control_plane_is_public = false
    cluster_type            = "enhanced"

    compartment_id         = module.oke_prereqs.oke_compartment_id
    network_compartment_id = module.oke_prereqs.oke_compartment_id
    worker_compartment_id  = module.oke_prereqs.oke_compartment_id
    tenancy_id             = local.tenancy_id

    ssh_private_key_path = var.ssh_private_key_path
    ssh_public_key_path  = var.ssh_public_key_path

    home_region = local.home_region
    region      = var.region

    // node pool config
    worker_pools            = var.node_pools
    worker_image_id         = "none"
    worker_image_type       = "platform"
    worker_image_os         = "Oracle Linux"
    worker_image_os_version = "8.7"

    create_operator = true
    create_bastion  = true

    providers = {
      oci.home = oci.home
    }
  }

Expected Behavior

Log into bastion to get access to operator:

ssh -i ./my_key.pem -o ProxyCommand='ssh -i ./my_key.pem -W %h:%p opc@XX.XX.XX.XX' -L 6443:XX.XX.XX.XX:6443 opc@XX.XX.XX.XX
kubectl get pods

should be able to access the pods in the kubernetes cluster

Actual Behavior

ssh -i ./my_key.pem -o ProxyCommand='ssh -i ./my_key.pem -W %h:%p opc@XX.XX.XX.XX' -L 6443:XX.XX.XX.XX:6443 opc@XX.XX.XX.XX
kubectl get pods
error: You must be logged in to the server (Unauthorized)

Steps to Reproduce

  1. Log into bastion
  2. Run any command with kubectl
  3. Command will fail with:
error: You must be logged in to the server (Unauthorized)
rodrigc commented 1 year ago

@devoncrouse @hyder @Djelibeybi In older versions of this module, there was a variable enable_operator_instance_principal.

That variable is now gone.

So looks like on the cluster I set up, the operator host does not have instance_principal configured, so I cannot run any kubectl commands against the cluster.

In this doc: https://oracle-terraform-modules.github.io/terraform-oci-oke/guide/operator_identity.html it mentions that instance_principal is disabled by default, but it does not mention how to enable it.

How do I enable instance_principal in the 5.x branch?

rodrigc commented 1 year ago

In this doc: https://oracle-terraform-modules.github.io/terraform-oci-oke/guide/operator_identity.html

You can also turn on and off the feature at any time without impact on the operator or the cluster.

How can I turn on/off the instance_principal feature? It is not clear to me from the docs how to do this?

According to this doc: Enabling Instance Principal Authorization for Terraform, I can set instance_principal by doing this in the provider:

provider "oci" {
   auth = "InstancePrincipal"  
   region = "${var.region}"

}

Is that the way?

hyder commented 1 year ago

You now need to use these 2:

create_iam_resources = true create_iam_operator_policy = "always"

Djelibeybi commented 1 year ago

Note: please stop tagging folks in your comments. GitHub will notify the right people automatically.

devoncrouse commented 1 year ago

Hi @rodrigc - as Ali mentioned above, the following two inputs control the creation of dynamic group and policy resources to grant the operator instance access to manage the associated cluster:

We also have newer versions of the 5.x pre-release published now that address many issues you may run into while evaluating - please have a look at using the latest when you get a chance.

rodrigc commented 1 year ago

Thanks for the clarification.

At the bare minimum, it is possible to just specify:

create_iam_resources = true

Since the default value of create_iam_operator_policy is auto, and this logic in module-iam.tf:

  create_iam_operator_policy = anytrue([
    var.create_iam_operator_policy == "always",
    var.create_iam_operator_policy == "auto" && local.operator_enabled
  ])

will toggle this to true if create_operator = true.

Quite involved to figure this out!

devoncrouse commented 1 year ago

Yes, the default "auto" is intended to work like you've noted for the policy. Many users lack the ability to create identity resources in their environments, so the overall creation input is defaulted to false.

rodrigc commented 1 year ago

Based on the explanation in this issue, I took a whack at clarifying some of the docs: https://github.com/oracle-terraform-modules/terraform-oci-oke/pull/777

rodrigc commented 1 year ago

I'll think about how to clarify the docs for operator and bastion.

To me, it is really weird that you can specify:

create_bastion = true
create_operator = true

and then have an operator host where kubectl does not work.

Some clarifying text might help with understanding.

rodrigc commented 1 year ago

Closing this issue, since an explanation of how to correctly configure this with 5.x was provided.

devoncrouse commented 1 year ago

I agree with your point above @rodrigc and appreciate the feedback - we'll think about improving this as well.