oracle-terraform-modules / terraform-oci-oke

The Terraform OKE Module Installer for Oracle Cloud Infrastructure provides a Terraform module that provisions the necessary resources for Oracle Container Engine.
https://oracle-terraform-modules.github.io/terraform-oci-oke/
Universal Permissive License v1.0
157 stars 211 forks source link

5.x: Implement worker node taints #754

Open devoncrouse opened 1 year ago

devoncrouse commented 1 year ago

https://blogs.oracle.com/cloud-infrastructure/post/auto-tainting-oke-nodes-startup https://oracle-terraform-modules.github.io/terraform-oci-oke/guide/workers_cloudinit.html https://github.com/oracle-terraform-modules/terraform-oci-oke/blob/5.x/docs/src/guide/workers.md

tb137 commented 1 year ago

Hi @devoncrouse, can you tell me if taints is working now? I see some implementation was done - https://github.com/oracle-terraform-modules/terraform-oci-oke/blob/c177189b4d44e0b1a41bae6e475b7b65b1e3c922/modules/workers/virtualnodepools.tf#L46:L53

However configuration is being ignored, then I try to use it with following example:

worker_pools = {
     n2-pool = {
      shape              = "VM.Standard.E4.Flex",
      ....
      taints = {
        dedicated = {
          value  = "observability",
          effect = "NoExecute"
        }
      }
    }
  }

Am I missing something?

thpham commented 1 year ago

Hello, I'm really interested in having this feature, but I fear that the current oci-terraform-provider doesn't support it given the following issue: https://github.com/oracle/terraform-provider-oci/issues/1504

and terraform containerengine_node_pool doesn't have taints.

is there any ETA for having it available in the provider and not having to do hacks with cloud-init stuff ?

ajhindle commented 3 weeks ago

Hi @thpham I wish taints were easy to do in OKE too! I've tried hacking with cloud-init (as discussed by Oracle) but had no luck - have you got a working solution for it?

My last attempt:

...

worker_pool = {
   overhead = {
    size             = 1,
    shape            = "VM.Standard.E4.Flex",
    ocpus            = 1,
    memory           = 4,
    boot_volume_size = 50,
    os               = "Oracle Linux",
    os_version       = "8",
    cloud_init = [
      {
        content = <<-EOT
        #!/bin/bash
        curl --fail -H "Authorization: Bearer Oracle" -L0 http://169.254.169.254/opc/v2/instance/metadata/oke_init_script | base64 --decode >/var/run/oke-init.sh
        bash /var/run/oke-init.sh --kubelet-extra-args "--node-labels=oke.oraclecloud.com/pool.name=overhead --register-with-taints=notfm-app=true:NoSchedule"
        touch /var/log/oke.done
        EOT
      },
    ]
  },

...
rayeswong commented 2 weeks ago

Hi @ajhindle Have you set worker_disable_default_cloud_init to true in order to use custom cloud init script?

The following works for me

worker_disable_default_cloud_init = true

worker_pools = {
  np1 = {
    shape              = "VM.Standard.E4.Flex",
    ocpus              = 1,
    memory             = 4,
    size               = 1,
    boot_volume_size   = 50,
    kubernetes_version = "v1.29.1",
    cloud_init = [
      {
      content      = <<-EOT
    runcmd:
    - 'curl --fail -H "Authorization: Bearer Oracle" -L0 http://169.254.169.254/opc/v2/instance/metadata/oke_init_script | base64 --decode >/var/run/oke-init.sh'
    - 'bash -x /var/run/oke-init.sh --kubelet-extra-args "--register-with-taints=notfm-app=true:NoSchedule"'
    - 'touch /var/log/oke.done'
    EOT
      content_type = "text/cloud-config",
      }
    ]
  }
}
zestrells commented 1 week ago

@rayeswong The solution above worked well for me. Please note that new worker pools will need to be created to enable this feature. Additionally, if you check the console for a worker pool's cloud init, you'll see it is base64 encoded. Hope this information is helpful!

More details can be found here.

ajhindle commented 6 days ago

Thanks @rayeswong - worker_disable_default_cloud_init = true worked.

When this flag is set to true, I found that if I want to make node pool 2 without any taint or special kubelet arguments then I still need to explicitly declare boilerplate CloudInit like this, otherwise node pool 2 will fail.

  nodepool2= {
    size             = 1,
    shape            = "VM.Standard.E4.Flex",
    ocpus            = 2,
    memory           = 32,
    os               = "Oracle Linux",
    os_version       = "8",
    cloud_init = [
      {
      content      = <<-EOT
    runcmd:
    - 'curl --fail -H "Authorization: Bearer Oracle" -L0 http://169.254.169.254/opc/v2/instance/metadata/oke_init_script | base64 --decode >/var/run/oke-init.sh'
    - 'bash -x /var/run/oke-init.sh'
    - 'touch /var/log/oke.done'
    EOT
      content_type = "text/cloud-config",
      },
    ]
  }