vmware-archive / photon-controller

Photon Controller
Other
27 stars 4 forks source link

Enable setting Kubernetes SERVICE_CLUSTER_IP_RANGE for install? #112

Closed tactical-drone closed 7 years ago

tactical-drone commented 7 years ago

I am having huge issues with my kubernetes cluster. It was working, but after a power failure and restart I just cannot access the kubernetes dashboard and I cannot figure out why.

There are many issues with a photon restart, maybe I will go into them later. But after manually starting photon again and checking all logs for status OK I just cannot access the dashboard. I can access everything up to api/v1/namespaces/kube-system/services/kubernetes-dashboard but not api/v1/namespaces/kube-system/services/kubernetes-dashboard/proxy. Then it just hangs. kubectl proxy also just hangs if you try to access the dashboard. It used to work, but not after the restart.

Inspecting everything all I can think of is that because our local network is a 10.0.0.0/24 network and kubernetes uses the same range internally, then iptables JAM.

How can I set the SERVICE_CLUSTER_IP_RANGE value?

My https://10.0.7.9:6443/api/v1/namespaces/kube-system/services/kubernetes-dashboard service:

{
  "kind": "Service",
  "apiVersion": "v1",
  "metadata": {
    "name": "kubernetes-dashboard",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/services/kubernetes-dashboard",
    "uid": "d830320e-2fde-11e7-abb1-000c29717226",
    "resourceVersion": "1194219",
    "creationTimestamp": "2017-05-03T08:59:41Z",
    "labels": {
      "addonmanager.kubernetes.io/mode": "Reconcile",
      "k8s-app": "kubernetes-dashboard",
      "kubernetes.io/cluster-service": "true"
    },
    "annotations": {
      "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Service\",\"metadata\":{\"annotations\":{},\"labels\":{\"addonmanager.kubernetes.io/mode\":\"Reconcile\",\"k8s-app\":\"kubernetes-dashboard\",\"kubernetes.io/cluster-service\":\"true\"},\"name\":\"kubernetes-dashboard\",\"namespace\":\"kube-system\"},\"spec\":{\"ports\":[{\"port\":80,\"targetPort\":9090}],\"selector\":{\"k8s-app\":\"kubernetes-dashboard\"}}}\n"
    }
  },
  "spec": {
    "ports": [
      {
        "protocol": "TCP",
        "port": 80,
        "targetPort": 9090
      }
    ],
    "selector": {
      "k8s-app": "kubernetes-dashboard"
    },
    "clusterIP": "10.0.0.32",
    "type": "ClusterIP",
    "sessionAffinity": "None"
  },
  "status": {
    "loadBalancer": {}
  }
}
mwest44 commented 7 years ago

I suspect the issue is that your haproxy in the LB did not restart. I found this bug earlier this week. The net of it is that the /run directory was mounted on tmp and some of the config from the initial create was lost on restart.

You should be able to run the /root/haproxy/run-haproxy.sh script it will recreate a directory /run/haproxy and restart the haproxy service.

tactical-drone commented 7 years ago

@mwest44 Yes, that was one of the minor issues I fixed restarting photon. I could handle that one.

I also needed to bounce vmware-stsd.service on a lightwave because it just crashed randomly. That service should have a restart=Always in it which could prevent some cluster bricks.

If you click on:

image

https://10.0.7.9:6443/api/v1/namespaces/kube-system/endpoints/kubernetes-dashboard/proxy yields:

Error: 'dial tcp 10.2.49.2:9090: getsockopt: connection timed out'
Trying to reach: 'http://10.2.49.2:9090/'

https://10.0.7.9:6443/api/v1/namespaces/kube-system/endpoints/kubernetes-dashboard yields:

{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kubernetes-dashboard",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kubernetes-dashboard",
    "uid": "2ece2121-30c5-11e7-a7fc-000c2906d4f7",
    "resourceVersion": "1192712",
    "creationTimestamp": "2017-05-04T12:28:31Z",
    "labels": {
      "addonmanager.kubernetes.io/mode": "Reconcile",
      "k8s-app": "kubernetes-dashboard",
      "kubernetes.io/cluster-service": "true"
    }
  },
  "subsets": [
    {
      "addresses": [
        {
          "ip": "10.2.49.2",
          "nodeName": "10.0.7.86",
          "targetRef": {
            "kind": "Pod",
            "namespace": "kube-system",
            "name": "kubernetes-dashboard-2917854236-26bl7",
            "uid": "05f0535c-30c3-11e7-a7fc-000c2906d4f7",
            "resourceVersion": "1190993"
          }
        }
      ],
      "ports": [
        {
          "port": 9090,
          "protocol": "TCP"
        }
      ]
    }
  ]
}

The issue is just strange. In the end it comes down to the fact that I don't know how kubernetes does it's magic. I followed the issue to the worker that contains the kubernetes-dashboard. The log says:

Using HTTP port: 9090
Creating API server client for https://10.0.0.1:443
Successful initial request to the apiserver, version: v1.6.0
Creating in-cluster Heapster client
Using service account token for csrf signing

The kube-proxy on that node has logs:

Flag --resource-container has been deprecated, This feature will be removed in a later release.
I0504 10:17:08.764544       1 iptables.go:175] Could not connect to D-Bus system bus: dial unix /var/run/dbus/system_bus_socket: connect: no such file or directory
I0504 10:17:08.784043       1 server.go:182] Running in resource-only container "\"\""
I0504 10:17:08.787055       1 server.go:225] Using iptables Proxier.
W0504 10:17:08.835877       1 server.go:469] Failed to retrieve node info: nodes "worker-4cafe0a2-2c2a-4f61-9380-146858efe08f" not found
W0504 10:17:08.835971       1 proxier.go:304] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
W0504 10:17:08.835985       1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic
I0504 10:17:08.836008       1 server.go:249] Tearing down userspace rules.
I0504 10:17:08.836486       1 healthcheck.go:119] Initializing kube-proxy health checker
I0504 10:17:08.963737       1 proxier.go:490] Adding new service "default/kubernetes:https" at 10.0.0.1:443/TCP
I0504 10:17:08.963910       1 proxier.go:490] Adding new service "kube-system/kube-dns:dns" at 10.0.0.10:53/UDP
I0504 10:17:08.963950       1 proxier.go:490] Adding new service "kube-system/kube-dns:dns-tcp" at 10.0.0.10:53/TCP
I0504 10:17:08.963982       1 proxier.go:490] Adding new service "kube-system/kubernetes-dashboard:" at 10.0.0.32:80/TCP
I0504 10:17:08.964094       1 proxier.go:767] Not syncing iptables until Services and Endpoints have been received from master
I0504 10:17:08.964125       1 proxier.go:566] Received first Endpoints update
I0504 10:17:08.964146       1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
I0504 10:17:08.965383       1 conntrack.go:66] Setting conntrack hashsize to 131072
I0504 10:17:08.965762       1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0504 10:17:08.965799       1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0504 10:17:23.624401       1 proxier.go:742] Deleting connection tracking state for service IP 10.0.0.10, endpoint IP 10.2.25.2
NAME                                    READY     STATUS    RESTARTS   AGE
k8s-master-10.0.7.8                     4/4       Running   28         3h
k8s-master-10.0.7.9                     4/4       Running   31         3h
k8s-proxy-v1-09g7q                      1/1       Running   1          3h
k8s-proxy-v1-cb53w                      1/1       Running   0          3h
k8s-proxy-v1-gppcb                      1/1       Running   0          3h
k8s-proxy-v1-k95h7                      1/1       Running   1          3h
k8s-proxy-v1-t690z                      1/1       Running   1          3h
k8s-proxy-v1-w1wws                      1/1       Running   0          2h
kube-addon-manager-10.0.7.8             1/1       Running   6          3h
kube-addon-manager-10.0.7.9             1/1       Running   7          3h
kube-dns-806549836-fbv97                3/3       Running   3          3h
kubernetes-dashboard-2917854236-26bl7   1/1       Running   0          1h

So everything looks right, it just hangs. The problem as I see it is, somehow the master node must forward traffic coming into https://10.0.7.9:6443 to the correct worker nodes. So it does so with some strange firewall rules:

-A KUBE-SEP-UNWUY2K5U773KLT4 -s 10.2.49.2/32 -m comment --comment "kube-system/kubernetes-dashboard:" -j KUBE-MARK-MASQ
-A KUBE-SEP-UNWUY2K5U773KLT4 -p tcp -m comment --comment "kube-system/kubernetes-dashboard:" -m tcp -j DNAT --to-destination 10.2.49.2:9090

Those rules are found on the master node, but what I don't understand is what the hell is firewall rules for 19,2,49,2:9090 doing there? How could the master ever reach that ip which only lives on the worker nodes' docker0 interface? Total bizarre!

mwest44 commented 7 years ago

The UI issue is one that we are trying to work through. The issue is that Kubernetes dashboard does not support OIDC redirects for Auth. So that link won’t work. You have two options for accessing the dashboard:

1) Connect to the MasterIP:6443/ui (if using nsx this will be the Floating IP assigned to the master. You can find that with the photon cluster show command) and use basic auth username: admin password: admin. We will disable basic auth when we have a resolution to the OIDC issue so don’t expect it to work forever.

2) If you are running the kubectl CLI and the browser on the same laptop (no jumpbox), then once you have your context set you can issue the kubectl proxy command. Then you can access the dashboard on localhost:8001

Neither are great solutions, but they work for initial testing.

Michael West¦ VMware Technical Product Manager Cloud Native Applications Office: 1-650-427-5880 Mobile: 1- 571-594-3968 mwest@vmware.commailto:mwest@vmware.com

From: Vern notifications@github.com Reply-To: vmware/photon-controller reply@reply.github.com Date: Thursday, May 4, 2017 at 9:18 AM To: vmware/photon-controller photon-controller@noreply.github.com Cc: Michael West mwest@vmware.com, Mention mention@noreply.github.com Subject: Re: [vmware/photon-controller] Enable setting Kubernetes SERVICE_CLUSTER_IP_RANGE for install? (#112)

@mwest44https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mwest44&d=DwMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=NfogdLpANSFztq0YNdB2kA&m=Jk4vAPe-RgHEn-VO5mpmp6tvtCyXOHrkXh_00tZ9oqc&s=lD8NhVgTME_jZkQvsfdRTTQh4m5tomYR1d_Efuf4bh4&e= Yes, that was one of the minor issues I fixed restarting photon. I could handle that one.

I also needed to bounce vmware-stsd.service on a lightwave because it just crashed randomly. That service should have a restart=Always in it.

Photon-controller should also not give up after many of these:

pc-1 run.sh[2107]: Lightwave REST server not reachable (attempt 1/50), will try again.

Anyway enough about photon.

If you click on:

[mage]https://urldefense.proofpoint.com/v2/url?u=https-3A__cloud.githubusercontent.com_assets_1815319_25704953_cae49520-2D30db-2D11e7-2D8f37-2D112d12ad254f.png&d=DwMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=NfogdLpANSFztq0YNdB2kA&m=Jk4vAPe-RgHEn-VO5mpmp6tvtCyXOHrkXh_00tZ9oqc&s=08Z_eADvpRkYr1beA_20C_gmsouBnwG0TrkK2-13fnA&e=

You get:

Error: 'dial tcp 10.2.49.2:9090: getsockopt: connection timed out'

Trying to reach: 'http://10.2.49.2:9090/'

https://10.0.7.9:6443/api/v1/namespaces/kube-system/endpoints/kubernetes-dashboardhttps://urldefense.proofpoint.com/v2/url?u=https-3A__10.0.7.9-3A6443_api_v1_namespaces_kube-2Dsystem_endpoints_kubernetes-2Ddashboard&d=DwMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=NfogdLpANSFztq0YNdB2kA&m=Jk4vAPe-RgHEn-VO5mpmp6tvtCyXOHrkXh_00tZ9oqc&s=DOs4pbUaalliDWi1K1P7jlYnhwIE4xxx1hyaEAKuyVE&e= yields:

{

"kind": "Endpoints",

"apiVersion": "v1",

"metadata": {

"name": "kubernetes-dashboard",

"namespace": "kube-system",

"selfLink": "/api/v1/namespaces/kube-system/endpoints/kubernetes-dashboard",

"uid": "2ece2121-30c5-11e7-a7fc-000c2906d4f7",

"resourceVersion": "1192712",

"creationTimestamp": "2017-05-04T12:28:31Z",

"labels": {

  "addonmanager.kubernetes.io/mode": "Reconcile",

  "k8s-app": "kubernetes-dashboard",

  "kubernetes.io/cluster-service": "true"

}

},

"subsets": [

{

  "addresses": [

    {

      "ip": "10.2.49.2",

      "nodeName": "10.0.7.86",

      "targetRef": {

        "kind": "Pod",

        "namespace": "kube-system",

        "name": "kubernetes-dashboard-2917854236-26bl7",

        "uid": "05f0535c-30c3-11e7-a7fc-000c2906d4f7",

        "resourceVersion": "1190993"

      }

    }

  ],

  "ports": [

    {

      "port": 9090,

      "protocol": "TCP"

    }

  ]

}

]

}

The issue is just strange. In the end it comes down to the fact that I don't know how kubernetes does it's magic. I followed the issue to the worker that contains the kubernetes-dashboard. The log says:

Using HTTP port: 9090

Creating API server client for https://10.0.0.1:443

Successful initial request to the apiserver, version: v1.6.0

Creating in-cluster Heapster client

Using service account token for csrf signing

The kube-proxy on that node has logs:

Flag --resource-container has been deprecated, This feature will be removed in a later release.

I0504 10:17:08.764544 1 iptables.go:175] Could not connect to D-Bus system bus: dial unix /var/run/dbus/system_bus_socket: connect: no such file or directory

I0504 10:17:08.784043 1 server.go:182] Running in resource-only container "\"\""

I0504 10:17:08.787055 1 server.go:225] Using iptables Proxier.

W0504 10:17:08.835877 1 server.go:469] Failed to retrieve node info: nodes "worker-4cafe0a2-2c2a-4f61-9380-146858efe08f" not found

W0504 10:17:08.835971 1 proxier.go:304] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP

W0504 10:17:08.835985 1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic

I0504 10:17:08.836008 1 server.go:249] Tearing down userspace rules.

I0504 10:17:08.836486 1 healthcheck.go:119] Initializing kube-proxy health checker

I0504 10:17:08.963737 1 proxier.go:490] Adding new service "default/kubernetes:https" at 10.0.0.1:443/TCP

I0504 10:17:08.963910 1 proxier.go:490] Adding new service "kube-system/kube-dns:dns" at 10.0.0.10:53/UDP

I0504 10:17:08.963950 1 proxier.go:490] Adding new service "kube-system/kube-dns:dns-tcp" at 10.0.0.10:53/TCP

I0504 10:17:08.963982 1 proxier.go:490] Adding new service "kube-system/kubernetes-dashboard:" at 10.0.0.32:80/TCP

I0504 10:17:08.964094 1 proxier.go:767] Not syncing iptables until Services and Endpoints have been received from master

I0504 10:17:08.964125 1 proxier.go:566] Received first Endpoints update

I0504 10:17:08.964146 1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288

I0504 10:17:08.965383 1 conntrack.go:66] Setting conntrack hashsize to 131072

I0504 10:17:08.965762 1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400

I0504 10:17:08.965799 1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600

I0504 10:17:23.624401 1 proxier.go:742] Deleting connection tracking state for service IP 10.0.0.10, endpoint IP 10.2.25.2

So everything looks right, it just hangs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_vmware_photon-2Dcontroller_issues_112-23issuecomment-2D299182226&d=DwMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=NfogdLpANSFztq0YNdB2kA&m=Jk4vAPe-RgHEn-VO5mpmp6tvtCyXOHrkXh_00tZ9oqc&s=urUWxKETrRBWYE_GAaPBPALHateCVJJg4K67T_oLzFc&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AIPyrE50SofviYjsQF93sJ0GEqqoLaWxks5r2dATgaJpZM4NQnJf&d=DwMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=NfogdLpANSFztq0YNdB2kA&m=Jk4vAPe-RgHEn-VO5mpmp6tvtCyXOHrkXh_00tZ9oqc&s=uM41J2SpW7wLuXnnDxIljdqdxl2PsXQ1YhH8oReBog8&e=.

tactical-drone commented 7 years ago

@mwest44 I was using kubectl proxy like the documentation describes. But after the restart it started hanging when you went to /ui

Then after much scratching around I found the admin:admin password and thought I could access the dashboard from photon's management ui link, but same issue. It just hangs.

It has something to do with the master's ability to route incoming traffic on https://10.0.7.9:6443/ui to the dashboard worker node docker container. That part is not working.

AlainRoy commented 7 years ago

Hi,

I'm sorry this problem hit you. I haven't seen it before, so I don't know what it is. But we can look at a few things.

If you run kubectl get pods --all-namespaces, does it show the kubernetes-dashboard pod? (The name will have a suffix. Mine is "kubernetes-dashboard-2917854236-p62r7", yours will be different.) Can you provide the complete output?

If you run kubectl get services --all-namespaces, does it show the kubernetes-dashboard service? Can you provide the complete output?

When you had the power failure, what lost power? All of Photon Controller, including all the Kubernetes VMs? Or just a subset? I'll try to reproduce your problem locally.

tactical-drone commented 7 years ago

Hi @AlainRoy. We basically had new servers but our UPS only came yesterday so last week my 4 ESXi hosts that I play on lost power. I could just reinstall, but I want to pretend that this is production and that I must get it working again. To learn.

I don't think you will be able to recreate my issues. I had to bugfix and hack slash many things for my productionisation tests. I am not sure if I broke something.

One of the major things I did was hack the worker nodes. Kubernetes provisions worker nodes with the same flavor as the master node, that obviously won't do. So I shut the workers down and edited them manually one by one making them 16 CPU 64Gb ram. When two workers are on the same ESXi host I just delete the one after making the other one fat, the system then detects that it must deploy a new worker somewhere else (it takes like 15 minutes for the thing to react though pretty slow). After a while doing this you can get 4 ESXi hosts with 4 workers on them that are pretty fat. But I am not sure if this broke other things. To get those workers in again you have to for example kubectl delete node and restart the worker so that it registers before the 15 min re-provision timeout.

If you run kubectl get pods --all-namespaces, does it show the kubernetes-dashboard pod?

kubectl get all
NAME                                       READY     STATUS    RESTARTS   AGE
po/k8s-master-10.0.7.8                     4/4       Running   28         4h
po/k8s-master-10.0.7.9                     4/4       Running   31         4h
po/k8s-proxy-v1-09g7q                      1/1       Running   1          4h
po/k8s-proxy-v1-cb53w                      1/1       Running   0          4h
po/k8s-proxy-v1-gppcb                      1/1       Running   0          4h
po/k8s-proxy-v1-k95h7                      1/1       Running   1          4h
po/k8s-proxy-v1-t690z                      1/1       Running   1          4h
po/k8s-proxy-v1-w1wws                      1/1       Running   0          2h
po/kube-addon-manager-10.0.7.8             1/1       Running   6          4h
po/kube-addon-manager-10.0.7.9             1/1       Running   7          4h
po/kube-dns-806549836-fbv97                3/3       Running   3          4h
po/kubernetes-dashboard-2917854236-26bl7   1/1       Running   0          1h

NAME                       CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
svc/kube-dns               10.0.0.10    <none>        53/UDP,53/TCP   12d
svc/kubernetes-dashboard   10.0.0.32    <none>        80/TCP          1d

NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/kube-dns               1         1         1            1           12d
deploy/kubernetes-dashboard   1         1         1            1           12d

NAME                                 DESIRED   CURRENT   READY     AGE
rs/kube-dns-806549836                1         1         1         1d
rs/kubernetes-dashboard-2917854236   1         1         1         6h
noob@photon-dev:~$ kk get svc
NAME                   CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
kube-dns               10.0.0.10    <none>        53/UDP,53/TCP   12d
kubernetes-dashboard   10.0.0.32    <none>        80/TCP          1d
noob@photon-dev:~$ kk get ep
NAME                      ENDPOINTS                   AGE
kube-controller-manager   <none>                      23h
kube-dns                  10.2.25.2:53,10.2.25.2:53   12d
kube-scheduler            <none>                      23h
kubernetes-dashboard      10.2.49.2:9090              1h
mwest44 commented 7 years ago

Note that you can manually start that Node recreate with the photon cluster trigger-maintenance command. Then you don't have to wait the 15 minutes.

tactical-drone commented 7 years ago

@mwest44 Awesome, I wondered what that was for. Thanks.

tactical-drone commented 7 years ago

@AlainRoy Actually what happened is the entire cluster got shut down, then I got it working again. But the dashboard would intermittently hang and my worker nodes were all messed up.

After some investigations I realized that worker nodes obtain IPs via DHCP. When the workers register's themselves onto kubernetes these IPs become entrenched somewhere. And this is not good because it's a IP that might change. Which it did and caused some strange issues. After fixing my DHCP server with some static leases to what those workers were, I sommer rebooted the cluster again. That is when the dashboard just went into a complete JAM. The pods started working though.

Let me trash all worker nodes and see what happens.

tactical-drone commented 7 years ago

@AlainRoy Ok that fixed it, My old workers (that I try to keep alive forever because it's effort to get them up to size) were jammed. I will continue to play with standard workers instead.

AlainRoy commented 7 years ago

I'm glad you found the issue! I'll look into what we can do to be robust to changes in the DHCP-assigned IP addresses.

You can specify distinct flavors for the Kubernetes masters and workers. It's an option in the CLI, and I'm pretty sure it's in the UI as well:

% photon-1.2 service create --help | grep flavor
   --vm_flavor value, -v value         VM flavor name for master and worker
   --master-vm-flavor value, -m value  Override master VM flavor
   --worker-vm-flavor value, -W value  Override worker VM flavor
   --disk_flavor value, -d value       Disk flavor name
tactical-drone commented 7 years ago

I don't understand those settings. They assign 2 flavors. One is used for the master nodes and the other for etcd & worker nodes. It would have been nice if we could also specify a worker flavor. It just feels like it is so limiting where I would want every pod to be able to cherry pick any flavor (like you can do with drive)?

Regardless, I can probably just make the worker large and downscale the etcd nodes I don't no.

AlainRoy commented 7 years ago

I agree that having separate flavors for etcd and worker nodes is a good idea. We tried to keep it simple, but it also made it a little limiting. We'll consider this change in the future.

I'm not sure what you mean by pods cherry picking flavors: a Kubernetes pod isn't aware of Photon Controller flavors. Are you saying that you want different size worker nodes that are somehow labeled, so pods can land on appropriately sized nodes?

tactical-drone commented 7 years ago

Are you saying that you want different size worker nodes that are somehow labeled, so pods can land on appropriately sized nodes?

Yes. That is exactly what I want.

tactical-drone commented 7 years ago

@AlainRoy I found a way to change the cluster IP after installation. Everything works now. Before my pod containers had issues with DNS which was clashing with out company DNS.

That setting (SERVICE_CLUSTER_IP_RANGE) would be awesome.

AlainRoy commented 7 years ago

OK, we can consider adding that option in the future. Thanks for the suggestion.