pods deployed to a worker node, different to the node where the dns-pod is deployed, are not able to resolve kubernetes.default. This is true even if the worker nodes are in the same availability domain.
Steps to reproduce:
terraform apply (with similar terrafrom.tfvars as above
deploy a busybox pod in the cluster and scale it to 2, so there is a pod on each worker node
note on which node the dns pod is deployed
go inside busybox pod on the same node - here "nslookup kubernetes.default" works
go inside busybox pod on the other worker node - here "nslookup kubernetes.default" does not work
Below is a log of kubectl commands which illustrate this:
[opc@k8s-master-ad3-0 ~]$ kubectl -n kube-system get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
kube-apiserver-k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com 1/1 Running 0 4m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com
kube-controller-manager-k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com 1/1 Running 0 4m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com
kube-dns-596797cd48-lghdb 3/3 Running 0 5m 10.99.78.2 k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com
kube-proxy-k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com 1/1 Running 0 4m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com
kube-proxy-k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com 1/1 Running 0 3m 10.0.42.3 k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com
kube-proxy-k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com 1/1 Running 0 3m 10.0.42.2 k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com
kube-scheduler-k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com 1/1 Running 0 4m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com
kubernetes-dashboard-796487df76-d8q7f 1/1 Running 0 5m 10.99.69.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com
oci-cloud-controller-manager-sdqv5 1/1 Running 0 5m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com
oci-volume-provisioner-66f47d7fcf-ks6pk 1/1 Running 0 5m 10.99.17.2 k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com
Terraform Version
[pdos@ol7 terraform-kubernetes-installer]$ terraform -v Terraform v0.11.3
OCI Provider Version
[pdos@ol7 terraform-kubernetes-installer]$ ls -l terraform.d/plugins/linux_amd64/terraform-provider-oci_v2.1.4 -rwxr-xr-x. 1 pdos pdos 28835846 Apr 10 09:33 terraform.d/plugins/linux_amd64/terraform-provider-oci_v2.1.4
Terraform Installer for Kubernetes Version
v.1.3.0
Input Variables
[pdos@ol7 terraform-kubernetes-installer]$ cat terraform.tfvars
OCI authentication
region = "us-ashburn-1" tenancy_ocid = "ocid1.tenancy.oc1..aaaaaaaa4jaw55rds22u6yaiy5fxt5qxjr2ja4l5fzkv4hci7kwmexv3hpqq" compartment_ocid = "ocid1.compartment.oc1..aaaaaaaakvhehb5u7nrupwuunhefoedpbegvbnysvz5pdfluxt5wxl5aquwa" fingerprint = "15:fd:5a:0f:7b:f7:c8:d0:82:f5:20:f8:97:07:42:02" private_key_path = "/home/pdos/.oci/oci_api_key.pem" user_ocid = "ocid1.user.oc1..aaaaaaaai3a6zzhjw23wncjhk5ogvjmk4x22zsws6xn4ydmzzlxoo6rthxya"
tenancy_ocid = "ocid1.tenancy.oc1..aaaaaaaa763cu5f3m7qpzwnvr2shs3o26ftrn7fkgz55cpzgxmglgtui3v7q"
compartment_ocid = "ocid1.compartment.oc1..aaaaaaaaidy3jl7bdmiwfryo6myhdnujcuug5zxzoclsz7vpfzw4bggng7iq"
fingerprint = "ed:51:83:3b:d2:04:f4:af:9d:7b:17:96:dd:8a:99:bc"
private_key_path = "/tmp/oci_api_key.pem"
user_ocid = "ocid1.user.oc1..aaaaaaaa5fy2l5aki6z2bzff5yrrmlahiif44vzodeetygxmpulq3mbnckya"
CCM user
cloud_controller_user_ocid = "ocid1.tenancy.oc1..aaaaaaaa763cu5f3m7qpzwnvr2shs3o26ftrn7fkgz55cpzgxmglgtui3v7q"
cloud_controller_user_fingerprint = "ed:51:83:3b:d2:04:f4:af:9d:7b:17:96:dd:8a:99:bc"
cloud_controller_user_private_key_path = "/tmp/oci_api_key.pem"
etcdShape = "VM.Standard1.1" k8sMasterShape = "VM.Standard1.1" k8sWorkerShape = "VM.Standard2.1"
etcdAd1Count = "0" etcdAd2Count = "0" etcdAd3Count = "1"
k8sMasterAd1Count = "0" k8sMasterAd2Count = "0" k8sMasterAd3Count = "1"
k8sWorkerAd1Count = "0" k8sWorkerAd2Count = "1" k8sWorkerAd3Count = "1"
etcdLBShape = "400Mbps" k8sMasterLBShape = "400Mbps"
etcd_ssh_ingress = "10.0.0.0/16"
etcd_ssh_ingress = "0.0.0.0/0"
etcd_cluster_ingress = "10.0.0.0/16"
master_ssh_ingress = "0.0.0.0/0" worker_ssh_ingress = "0.0.0.0/0" master_https_ingress = "0.0.0.0/0" worker_nodeport_ingress = "0.0.0.0/0"
worker_nodeport_ingress = "10.0.0.0/16"
control_plane_subnet_access = "public" k8s_master_lb_access = "public"
natInstanceShape = "VM.Standard1.2"
nat_instance_ad1_enabled = "true"
nat_instance_ad2_enabled = "false"
nat_instance_ad3_enabled = "true"
nat_ssh_ingress = "0.0.0.0/0"
public_subnet_http_ingress = "0.0.0.0/0" public_subnet_https_ingress = "0.0.0.0/0"
worker_iscsi_volume_create is a bool not a string
worker_iscsi_volume_create = true
worker_iscsi_volume_size = 100
etcd_iscsi_volume_create = true
etcd_iscsi_volume_size = 50
Description of issue:
pods deployed to a worker node, different to the node where the dns-pod is deployed, are not able to resolve kubernetes.default. This is true even if the worker nodes are in the same availability domain.
Steps to reproduce:
Below is a log of kubectl commands which illustrate this: [opc@k8s-master-ad3-0 ~]$ kubectl -n kube-system get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE kube-apiserver-k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com 1/1 Running 0 4m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com kube-controller-manager-k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com 1/1 Running 0 4m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com kube-dns-596797cd48-lghdb 3/3 Running 0 5m 10.99.78.2 k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com kube-proxy-k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com 1/1 Running 0 4m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com kube-proxy-k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com 1/1 Running 0 3m 10.0.42.3 k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com kube-proxy-k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com 1/1 Running 0 3m 10.0.42.2 k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com kube-scheduler-k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com 1/1 Running 0 4m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com kubernetes-dashboard-796487df76-d8q7f 1/1 Running 0 5m 10.99.69.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com oci-cloud-controller-manager-sdqv5 1/1 Running 0 5m 10.0.32.2 k8s-master-ad3-0.k8smasterad3.k8sbmcs.oraclevcn.com oci-volume-provisioner-66f47d7fcf-ks6pk 1/1 Running 0 5m 10.99.17.2 k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com
[opc@k8s-master-ad3-0 ~]$ kubectl scale deployment busybox --replicas=2 deployment "busybox" scaled [opc@k8s-master-ad3-0 ~]$ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE busybox-56b5f5cd9d-6brvj 1/1 Running 0 6s 10.99.78.4 k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com busybox-56b5f5cd9d-lvz4z 1/1 Running 1 13m 10.99.17.5 k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com nginx-7cbc4b4d9c-7z772 1/1 Running 0 15m 10.99.17.3 k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com nginx-7cbc4b4d9c-c6lrj 1/1 Running 0 15m 10.99.78.3 k8s-worker-ad3-0.k8sworkerad3.k8sbmcs.oraclevcn.com nginx-7cbc4b4d9c-k2kjr 1/1 Running 0 15m 10.99.17.4 k8s-worker-ad3-1.k8sworkerad3.k8sbmcs.oraclevcn.com [opc@k8s-master-ad3-0 ~]$ kubectl exec -it busybox-56b5f5cd9d-6brvj nslookup kubernetes.default Server: 10.21.21.21 Address 1: 10.21.21.21 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default Address 1: 10.21.0.1 kubernetes.default.svc.cluster.local [opc@k8s-master-ad3-0 ~]$ kubectl exec -it busybox-56b5f5cd9d-lvz4z nslookup kubernetes.default Server: 10.21.21.21 Address 1: 10.21.21.21
nslookup: can't resolve 'kubernetes.default' command terminated with exit code 1