vmware / vic

vSphere Integrated Containers Engine is a container runtime for vSphere.
http://vmware.github.io/vic
Other
639 stars 173 forks source link

Milestone: HA support #744

Open hickeng opened 8 years ago

hickeng commented 8 years ago

Add vSphere HA support, both to container VMs and the applianceVM.

mdubya66 commented 7 years ago

This should work today. The ask was for this in 1.0. Making it high.

corrieb commented 7 years ago

We need to get a little more clarity here about what we mean by "HA support" in terms of what we're targeting for 1.1. HA support is not one thing, rather it covers a few different integration points and capabilities captured by this Epic. Given that the Epic is currently high priority, I want to tease out whether that should cover everything in the Epic or just one piece of it.

So let's lay this out. HA support means the following:

1a) When you power off a host, an endpoint VM is restarted on another host 1b) When you power off a host, a containerVM is restarted on another host 2) When a containerVM or endpoint VM fails (ambiguous - see below), they are restarted 3) When a user types --restart=\<true/false> into docker run, they expect to toggle HA

In considering these options, it's important to understand what vSphere is able to detect and how granular these settings can be. For example, you have to enable HA at the cluster level, but you can then disable it for particular VMs. As such, it would potentially make sense for it to be part of vic-machine to enable a highly available appliance by toggling HA on the cluster, since VIC machine has to run with admin privileges. This wouldn't be appropriate for a VIC user to be able to do via --restart=true.

The thing that's currently High Priority #3845 addresses (1a). At this point, it's unclear whether or not it addresses (1b) or even whether (1b) is a clear requirement. This needs to be clarified.

The heartbeat investigation done by @dougm referenced in this epic (#406) would correspond to (2) here. vSphere has the capability to monitor the heartbeat of a workload or guest tools and will restart the VM after a period of time if it doesn't get the heartbeat. However, the question of what the "heartbeat" of a container workload or endpoint VM might be is a little ill-defined, and in addition to that, the question of what VIC itself should do vs. what it might expect vSphere to do is an important one.

That then leads us to (3). What should --restart mean, do we want to implement it and if we do, how much does it need to hook into vSphere HA?

corrieb commented 7 years ago

So, here's a summary list of what I think needs to be decided here from a product perspective:

I think if we have clarity on those product questions, then we should be able to get better clarity on what we're delivering and testing for.

mdubya66 commented 7 years ago

Good questions. IIRC we agreed that heartbeat was not required.

corrieb commented 7 years ago

@mdubya66 Depends on the requirements. Doug concluded it's not required for the host power-off case, but it may well be required to satisfy some of the others.

mreferre commented 7 years ago

Do we believe that if a host with containerVMs running is powered off, those containerVMs should be restarted elsewhere if HA is enabled on the cluster? In other words, should restart be the default for the containers themselves in an HA cluster?

This concept is central to everything we have socialized re VIC so far. When you deploy Docker images on VIC they turn into full fledged VMs and inherit all of the benefits of VMs and their operational best practices (infrastructure resilience assists being a big chunk of it). We could debate whether or not that specific ContainerVM actually needs to be restarted (is it running solo? or is it one of the many VMs that form a single service? has it been instantiated manually or is it just a small gear of a larger self-healing stack?). However, to repeat myself, the VIC message has always been "Docker images turns into VM and you can manage them traditionally". Which means they are HA protected.

cgtexmex commented 7 years ago

Do we expect a vSphere admin to enable HA on a cluster that VIC is provisioned to, or should it be something they can set using VIC machine?

I'd expect they'd set it on the cluster -- the way they configure HA today -- and VIC machine is not involved. This is similar to question 2 and I believe warrants the same answer -- it's central to the socialized concept of VIC....

karthik-narayan commented 7 years ago

I think we have clarity here, but let me add my $0.02. Like others have mentioned, we tell our customers that they can manage the vSphere Integrated Containers workload like a VM. This would mean that we allow them to enable HA for these objects.

Powering off a host, host failure, VM failure are valid use cases for HA to kick in. The restart option in docker run requires a little more exploration.

Do we expect a vSphere admin to enable HA on a cluster that VIC is provisioned to, or should it be something they can set using VIC machine? This should be a vSphere operation, not a vic-machine option.

Do we believe that if a host with containerVMs running is powered off, those containerVMs should be restarted elsewhere if HA is enabled on the cluster? In other words, should restart be the default for the containers themselves in an HA cluster? Yes. Customers have the option of a) disabling HA on the cluster, b) picking a cluster that does not have HA enabled or c) disabling HA for the VM or the vApp (If this is possible).

What does HA for the VIC endpoint VM mean, outside of the host failure case? What kinds of failures should cause it to be restarted?

Do we care about detecting containerVM failure and restart outside of the host failure case? If so, should that be something configurable using --restart?

Do we want to support detection of storage and network connectivity issues with HA? It's certainly possible to turn that feature on today. Is it meaningful for either the endpoint or the containers?

For the last three questions, I'd like to understand the failures better and can get us some relevant customer info to help make a decision.

mhagen-vmware commented 7 years ago

Copy from email thread:

I am struggling to follow all the different threads on this issue at this point - so hopefully this is the "main" thread?

The testing perspective - I have started with 1a as pointed out by Ben, I had intention to add 1b as soon as I could get 1a actually working. Those are really the only 2 things I considered "HA". The heartbeat question for me seems to be tip-toeing into FT territory.. I don't know how VMware specifically defines the line between HA and FT, but taking proactive restarting/reassigning steps on VMs seems to me to be outside of the scope of HA.

And I agree with Matt that 1a is the very bare minimum that we need to support ASAP, otherwise the devs are out of luck and at the mercy of the VI admin to be available to restart the appliance on host failure. Whereas 1b shoots down their current container they were working on, but they can at least still create a new container if we have 1a.

mdubya66 commented 7 years ago

Should we focus on 1a and opportunistically 1b for the 1.1 release? In parallel @karthik-narayan can work to get his questions answered for the future deliverables.

mreferre commented 7 years ago

If we stick on the VIC (marketing) message and if we stick on the "pets" model (that VIC is supposed to be serving) I'd say 1b has higher priority than 1a.

In my mind:

VIC is currently being pitched to let users run docker images with the former model (hence 1b is more important).

As per FT Vs HA... none of these discussions involve FT (in VMware language). FT is the ability to run two different VMs in lock steps on two different hosts with the objective of having 0 downtime in the event of a host failure. What's being discussed here is what should trigger HA: a mere host shutdown/failure (which is obvious) or the detection of a failure of a service running inside a VM.

hmahmood commented 7 years ago

It looks like we have some consensus on what happens to VIC artifacts in an HA-enabled cluster when a host goes down:

  1. both container and endpoint VMs can be restarted. This is currently not supported, with the assumption being that during endpoint VM restart the container VMs' state does not change, which simplifies the problem of building up the system state in the endpoint VM. Since we want to support container/endpoint VM restart, this assumption is not going to hold, and makes the problem harder to solve (distributed snapshot).
  2. container VM restart on VM failure: I think here we just need follow user input, e.g. docker --restart, with the initial support just being to not restart the VM
  3. endpoint VM restart on VM failure: this should always happen in my opinion, and I don't think it should be hard to pull off

Regarding (1.), I know the customer demand is there, but is there any other alternative that would be acceptable? @mreferre

mreferre commented 7 years ago

@hmahmood not sure I am following this. #1 to me should be the result of a host failure and should be fully supported (I mean ideally, perhaps it is not supported now). I am not sure I can think of a workaround (primarily because we say that containerVMs and VCHs in general inherit all the characteristics of vSphere, HA included and foremost).

corrieb commented 7 years ago

I think the difficulty here is the fact that HA takes the restart policy and restart order out of our hands to some extent. If a host goes down with both the endpoint VM and containerVMs and those all get restarted concurrently on another host, we need to make damn sure there's no weird synchronization bugs - ie. Endpoint VM only reports half of the restarted containerVMs. I think that's the hardest problem to solve.

This is what @hmahmood is getting at IMHO. It partly comes down to our event handling in determining if any cached state is invalid.

mreferre commented 7 years ago

Perhaps leveraging HA priorities may help here. You could set the VCH endpoint VM to be the last one to come up or the first one to come up (depending on what you need to avoid inconsistencies).

hmahmood commented 7 years ago

@corrieb @mdubya66 @mreferre @mhagen-vmware

Update on HA:

  1. add option to vic-machine for HA appliance — need clarification on this; I am not sure what this is for
  2. add docker run --restart handling - should pass through to portlayer create — --restart=always is currently the behavior, both on host and vm (tether) failure; is this enough for 1.1?
  3. heartbeat support in tether — don't think this is needed; see #406
  4. configure vApp VMs to start on powerOn with --restart=always — this is the current behavior since HA will restart on host failure, and we reboot the OS if tether/vic-init fails (i.e. panics)
corrieb commented 7 years ago

(1) @hickeng can clarify. If this is a question of host failure, then I suspect we already cover it with what we've done. If it's WRT some kind of heartbeat on the portlayer/personality services then that should be covered under #406

(2) The interesting problem here is that HA is either enabled on the cluster (in which case, you'd need to explicitly set --restart=never to get different behavior) or you want restart=always but it's not enabled on the cluster, in which case we'd have to implement some behavior in the personality. To answer the question specifically - yes I think this is fine for 1.1. It's a policy applied by the admin.

(3) I think heartbeat support is still needed. Tether can be running fine, but the application could have hung or died in some way. The health-check stuff should be combined in with that.

(4) Not sure what this refers to. vApp powerOn?

ryandotclair commented 7 years ago

My .02 on this subject, for what it's worth: I think HA should be enabled by default for all containers/VCH endpoint. Even for "CNA" written apps. Yes, it can handle a failure easily, but end user has an expected stated in mind of how many containers they expect to be running (otherwise why'd the start it up to begin with?). I think we should ensure that expected state--it'd be silly to not restart it IMHO.

If we didn't, what's the expected workflow? They'd log in every day to see it failed and restart it manually? Or just leave it failed and start up a new one later?