rancher / catalog-dockerfiles

Dockerfiles for Rancher Catalog containers
Apache License 2.0
131 stars 102 forks source link

Service upgrade and create_index #19

Closed m4ce closed 8 years ago

m4ce commented 8 years ago

Hi,

I deployed a zookeeper cluster made of 3 nodes.

As I wanted to update the configuration and do a rolling upgrade, I opted for the following command:

rancher-compose -p zookeeper up --upgrade --batch-size 1 --interval 60000 

The problem is that rancher creates a new container which has a different index from the original container we wanted to upgrade.

The same issue applies to any cluster that makes use of node IDs, e.g. kafka.

How can we make sure that we retain the same index when doing an upgrade?

Cheers, Matteo

cloudnautique commented 8 years ago

Ah, this is a tricky one. We know that config updates and things like adding a node require rolling restarts. So we are adding the rolling restart building block for adding/removing a node. It will look similar to what you have there.

Something like: rancher-compose -p zookeeper restart --batch-size [--interval] and future enhancements might have healthcheck integration.

In the case of upgrade though, its always a new container, and so we increase the index.

We are also going to allow editing user metadata. Which might circumvent this a bit, because you could change the config... confd or scripts could update the config on a restart. This would require config files that allow for more flexible customization.

Would rolling restarts address this? Would the command primitives be enough? Or does the create index have to be reused? Its a guarantee that they won't be reused, but I see the case why it would be helpful.

m4ce commented 8 years ago

Hi @cloudnautique,

If you take kafka for instance, keeping the broker id is very important, if not imperative.

But if I run

 rancher-compose -p zookeeper restart --batch-size [--interval] 

Would this update the containers with any changes I have made in my docker-compose.yml or rancher-compose.yml? To my understanding, that requires an upgrade right?

Also, imagine upgranding zookeeper. When the new zookeepers containers come up, not only will they have different IDs (as the create_index increases), but their primary IP will have changed too.

Now, it would be sort of bad if one had restart each kafka instance in order to update the config with the new zookeepers IPs.

I think here we need an upgrade that really takes down the single container and recreates it with the same index and ip address, it's almost like a restart along with any update of the compose definition.

Cheers, Matteo

cloudnautique commented 8 years ago

Hi @m4ce,

The restart would allow new dynamic configuration to be loaded. So anything that was pulled from metadata or other service discovery mechanism should work on restart. You are correct, anything that changes on the service definition would require building a new container. The immutability of the container is built into Docker.

I get what you are saying now. Without keeping the same IP and index ZK would always lose quorum during an upgrade.

We will have to do something more here. ping @ibuildthecloud

Thanks for walking through this! -Bill

m4ce commented 8 years ago

@cloudnautique,

here's an idea.

We could use a persistent volume like /opt/kafka/conf.d where we store the broker_id the first time the container starts up (create_index). Any subsequent restart or upgrade, would still use that broker id.

However, in the case of a zookeeper upgrade, we need to make sure that our kafka instance re-connects to the zookeepers using the rancher.internal DNS rather than the ip address, as the new zookeeper containers will have different IPs.

It seems that rancher DNS doesn't support individual A records for the service instances? Could this be enabled?

e.g. zookeeper_zookeeper_1.rancher.interal

Also, I just tried an upgrade and it seems that the new container gets spawned randomly on any other host? Would it be possible to specify a preference to create it where it was running before?

Cheers, Matteo

cloudnautique commented 8 years ago

@m4ce,

If we moved to using a data volume sidekick, we will keep them together, and not recreate the data volume. I think this could work for the individual nodes, but the coordination across the cluster becomes unreliable after the initial create. It would be unsafe to add a node because you couldn't recreate the initial nodes config programmatically again(on the new node).

Until we figure something better out, you could use the existing process to bootstrap the cluster. Then do as your suggesting with the data volume, and then use Zookeeper itself as a coordination service for the cluster? :)

We have a label that can be applied to an individual container that allows it to request an IP address. I need to talk with the team, but I'm hoping something like that could be applied at a service level so the containers could get the same IP back.

Container level DNS keeps coming up, but the ephemeral nature of containers makes it somewhat impractical. You can achieve close to the same thing by using metadata.

m4ce commented 8 years ago

@cloudnautique,

so, basically sidekick would allow me to always place a container on the host where it was running before the upgrade? My understanding is that sidekick allows to schedule two or more containers always together on the same host, but again the choice of the host is random.

Right, one can certainly get out of the metadata service the primary ip for the containers. However, take again the kafka and zookeeper scenario.

When starting the Kafka cluster, part of the bootstrapping is to retrieve the IP addresses of the zookeeper containers and populate the zookeeper.connect property in the Kafka configuration file accordingly. Kafka cluster starts up, all good.

Then, I do a rolling upgrade in the zookeeper cluster with a reasonable interval and batch size, so that I do not provoke an outage. The new instances will have a different IP address (that's until we have that fancy label that says to roll over the previous IP address) which means that the kafka brokers will need restarting, as they are still pointing to the decommissioned zookeeper instances.

However, if we had a DNS record for each container, I wouldn't care about the change of IP addresses as I would use that in the zookeeper.connect property and let Kafka retry over and over to connect to the zookeeper instances.

Having to rely on metadata requires the config file to change and the broker to be restarted.

Let me know if I am missing something.

Thanks, Matteo

cloudnautique commented 8 years ago

Yeah the sidekick datavolume behavior is so that you can upgrade 'data' centric services without having to move the data around. It would be painful to upgrade an HDFS node or Elasticsearch node if you always had to move the data around. So we wanted an 'in place' upgrade scenario.

I think your initial assessment is correct in that we should allow a way to keep the same ip addresses when rebuilding containers and create_index.

As far as DNS, I know Elasticsearch (and a lot of other Java apps) only read from DNS on startup and then never check it again while the process is running. Not sure if Kafka has the same behavior. For now, Metadata is the only place with the authoritative list of containers and IP addresses in a service.

m4ce commented 8 years ago

Hi @cloudnautique,

thanks for the sidekick tip, tested it and it's great.

Even zookeeper, in version 3.4.7, has the ability to connect to other zookeeper instances by re-trying resolving their hostnames between each attempt (https://issues.apache.org/jira/browse/ZOOKEEPER-1506). Kafka not sure, I would have to test.

However, as you say, it's dependent on how the application was designed to work and wouldn't apply to all cases. Best would be to indeed keep a generic approach that would retain both ip address and create_index.

I will open a ticket under the rancher issues for the feature and link this request.

Thank you for the help so far!

Cheers, Matteo