kubernetes: photon service create resource allocation questions.

tactical-drone commented 7 years ago

Photon Platform Release 1.2.0

Hi.

Take a look at my final cluster VM layout that has a configuration of 3 x etc, 2 x master, 4 x worker which is installed on a tenant that has access to 4 ESXi hosts. Not all of my ESXi hosts are used and those that are used are not used in a redundant fashion:

Why are the 3 etcd deployed to the same ESXi host 10.0.0.122? That makes no sense to me. What exactly is the idea here? To get the cluster up and running and then move VMs around so that it makes more sense in terms of a ESXi host failing? I am a bit noobish when it comes to kubernetes but I thought that photon create service should do this work as only it knows about how ESXi photon control plane has been deployed and what hosts are available where? I have not been able to determine how ESXi hosts are chosen when spawning new photon-controller and kubernetes nodes.

It is also not clear how to deploy a HA photon-controller setup (using photon-setup) with all of this. It would be awesome if we could discuss how to make this entire system highly available by first discussing how to make the photon control plane highly available and then move on to how to make kubernetes that runs on top of photon highly available. The example should be based on the minimum setup that gives production fail over protection of say one entire ESXi host. So 3~5 ESXi host servers then with at least 3 etcd and at least 2 masters all deployed to different bare metal servers?

mwest44 commented 7 years ago

The photon controller scheduler randomly places VMs onto hosts hosts based on a placement score that is received from a randomly chosen set of hosts. So as long as a host meets any affinity specs in the VM create command and continues to have resource availability, VMs can be placed there. We are not attempting to smooth placement across hosts. In terms of K8 VMs, we do not currently have a way to set Anti-affinity rules in the scheduler, so these worker nodes can be bunched onto a few hosts, limiting availability. We have identified this as an enhancement and it is in the product plan, but not there yet.

In terms of the control plane itself, I don't understand your point about HA, unless you are talking about vcenter based HA type functionality. Auto-restart of control plane VMs in that model is not part of Photon Platform (though we will recreate failed K8 cluster nodes) The controller VMs are specifically placed on hosts by the admin that does the install. The YAML file defines the hosts. The control plane survives outage of individual controller nodes. Our internal LB is a single point of failure, however it is not a requirement. You can define an external LB if you want to eliminate that issue. Did I miss your point?

tactical-drone commented 7 years ago

Hi @mwest44.

Thanks for clearing up how K8 interacts with photon-controller provider to select ESXi hosts for pods. I hope you get that feature in soon.

I don't think you missed my point, but maybe let provide some questions I have to make my point clear about photon-controller HA:

What happens to your photon control plane if a ESXi host dies that contains your... 1.1 lightwave? 1.2 photon controller? 1.3 load balancer?
How does it help photon-contoller HA to deploy more than one: 2.1 lightwave controller ( does it even make sense to have more than 1? Do they share info? Does it make sense to have more than one at one site?) Can they work together behind a load balancer. I presume not. 2.2 photon-controller? Does it make sense to have more then 1? Can they work together if placed behind a load balancer? I presume so. 2.3 load-balancer (with an external load balancer then covering the photon control plane load balancers like you mentioned)

Therefor. It is not clear to me how the photon-control plane protects itself against host failure? Can you maybe elaborate more? My disaster recovery due diligence is lacking because it is unclear to me how this entire thing sticks together.

tactical-drone commented 7 years ago

@mwest44 For example, I have 4 ESXi hosts. What would be the minumum amount of lightwaves, controllers and load balancers I would need to survive two ESXi host failures.

At the moment my guess goes something like this:

ESXI-1:

lw-1
pc-1
lb-1

ESXI-2:

lw-2
pc-2
lb-2

ESXI-3:

lw-3
pc-3
lb-3

ESXI-4:

(empty)

External:

Some load balancer covering lb-1, lb-2, lb-3

Does this setup make sense at all? Like those 3 lighwaves? I don't think they can work together at all? You won't be able to place all 3 behind a load balancer and use that as your lighwave IP would you? The documentation is not clear? Or is it and I have not read it?

tactical-drone commented 7 years ago

@mwest44 If you see #105, there is configuration that comes with the installer suggests that you could have a HA lighwave setups by spreading them lightwave IPs out in the DNS configs of the other components.

It would have been nicer if there was a piece of documentation that explains how to make lightwave HA by spreading its IPs out in the DNS configs and then giving an example of what this config might look like. Instead of just say a random config file somewhere inside a installer that suggests this would might maybe work somehow.

snambakam commented 7 years ago

@pompomJuice

Lightwave is a multi-master replicated identity service. Photon Platform should failover to the next available instance of Lightwave. This feature is not available in PhotonPlatform-1.2

It is recommended to run at least (2) Lightwave instances per site (subnets providing high bandwidth low latency network connections)

PhotonPlatform nodes are also multi-master replicated. If one of these nodes goes down, the other nodes assume ownership of the ESXi hosts that were being managed by the failed node. All the resource management is divided amongst the Photon Platform nodes in the cluster.

mwest44 commented 7 years ago

@pompomJuice Agreed. Some of this doc is a work in process.

vmware-archive / photon-controller

kubernetes: photon service create resource allocation questions. #106