paz-sh / paz

An open-source, in-house service platform with a PaaS-like workflow, built on Docker, CoreOS, Etcd and Fleet. This repository houses the documentation and installation scripts.
http://paz.sh
Other
1.08k stars 56 forks source link

Investigate simplifying, changing or replacing the Etcd/HAProxy service discovery layer #33

Open lukebond opened 9 years ago

lukebond commented 9 years ago

It's currently a bit brittle and difficult to understand / remember how it works.

Some options:

For me Weave would be really helpful, but there are multiple ways we could use it and we should have a discussion with the Weave developers about this. Some options:

squaremo commented 9 years ago

Use WeaveDNS as the service discovery mechanism and ditch HAProxy/Confd altogether

I think you'd want to keep some kind of load balancer for, well, load balancing (and rolling upgrades, rate limiting, and whatever). But this doesn't stop you using WeaveDNS -- you can still give the load balancer the appropriate host name in DNS and things will cheerfully connect to it, and this is still better than "random Docker ports" as you say.

WeaveDNS currently won't help you with updating backends to HAProxy -- or not much.

squaremo commented 9 years ago

WeaveDNS currently won't help you with updating backends

This doesn't detract from weave's usefulness, however.

rosskukulinski commented 9 years ago

@lukebond have you looked at vulcand at all? It's been on my list to experiment with, but haven't yet.

lukebond commented 9 years ago

@rosskukulinski I got as far as seeing that it uses SRV records and was put off. I would need to use ambassadors or something if I didn't want apps running on Paz to know about SRV records.

lukebond commented 9 years ago

@squaremo thanks for sharing your thoughts!

The existing HAProxy does round-robined load balancing for requests incoming from outside the cluster, and it also allows services to communicate internally by service name (ie. service discovery), again round-robined. All controlled by a fairly gnarly Etcd-Confd setup (https://github.com/yldio/paz-haproxy/blob/master/run.sh) - it works very well but the idea of having to change it always frightens me.

WeaveDNS can do the latter, and I could point HAProxy backends to the WeaveDNS name as @squaremo suggests, leaving internal routing entirely up to Weave.

We could keep the existing Etcd announcing sidekicks (useful for other things, eg. monitoring) and configure HAProxy for external load-balancing only when new services are added and removed. That would simplify things a lot, at the cost of losing the ability to do zero-downtime deployments.

To keep ZDD either Weave would have to add this feature (to buffer/drain-off connections when addresses are added and removed) or we'd have to keep all the existing Confd stuff, which would mean we haven't simplified much and have added another component.

Thoughts, @tomgco?

lukebond commented 9 years ago

With the release of Weave 0.11 and automatic IP assignment I see almost no reason not to switch to weave and simplify our service discovery and routing.

HAProxy will largely serve as a means of routing external requests by HTTP host header, and the backends can be Weave-assigned IP addresses. The usage of dnsmasq can be replaced by WeaveDNS so that services can speak to each other by service name as before, but via the Weave SDN.

Unfortunately I still see a need for the sidekick announce units.

hyperbolic2346 commented 9 years ago

@lukebond So HAProxy will sit on each host on an exposed port and route incoming requests by hostname via WeaveDNS? The role of WeaveDNS would be to resolve the internal service names like paz-web to an IP, but how would we get the round-robin load balancing from HAProxy then?

hyperbolic2346 commented 9 years ago

@lukebond Hmm, I'm not sure if we can change the haproxy setup we're using even if we incorporate weave. From the haproxy docs:

<address> is the IPv4 or IPv6 address of the server. Alternatively, a
          resolvable hostname is supported, but this name will be resolved
          during start-up.

If the hostname is only resolved during start-up, we'll still have the confd complexity we have now. I'm not sure if we gain much if it includes the IP or just the hostname. I'm still a fan of using weave here though because we could nuke the port numbers at the least, but I'm not sure if this is the silver bullet you wanted.

rcmorano commented 9 years ago

Hi,

I've just discovered your PAZ PaaS and I like it a lot! We share a lot of ideas :))

Have you considered to use skydns? It could provide a very light (SRV records) distributed DNS/service discovery on top of the already existant 'etcd' backend.

The SRV records should be easily translatable to any TCP/HTTP routing agent (aka HAProxy, hitch...).

lukebond commented 9 years ago

@rcmorano thanks for your interest in Paz! Glad you like it :)

My objection to SkyDNS in the past was the use of SRV records, but I didn't think of a way of translating them as you suggest. I didn't want to impose SRV records on containers running on Paz. I'm curious to know more as this is new to me :)

lukebond commented 9 years ago

@hyperbolic2346 to answer your first question:

@lukebond So HAProxy will sit on each host on an exposed port and route incoming requests by hostname via WeaveDNS? The role of WeaveDNS would be to resolve the internal service names like paz-web to an IP, but how would we get the round-robin load balancing from HAProxy then?

yes, that's more or less what I'm currently envisaging.

re load balancing, we'd be relying on WeaveDNS' random load balancing, which is arguably acceptable, but won't be as good.

to answer your second question:

If the hostname is only resolved during start-up, we'll still have the confd complexity we have now. I'm not sure if we gain much if it includes the IP or just the hostname. I'm still a fan of using weave here though because we could nuke the port numbers at the least, but I'm not sure if this is the silver bullet you wanted.

Although I was complaining above about the complexity of the HAProxy/Confd setup I also don't see a way around it at the moment. The best solution I can currently see is something like the following:

I hope that helps to answer your question. I'd be interested in your thoughts!!

rcmorano commented 9 years ago

@lukebond the approach I was thinking about is like the one exposed in this infoq.com article. skydock has already some work in this way, although it seems that project is stalled.

A simple DNS query could be easily parsed and translated into a list of service backends for HAProxy.

In the same manner, per paz-service HAProxy instances (I guess you could have more than one instance spreaded through the whole cluster: load balance your load balancers) could be DNS discovered and then set as a simple RR DNS record where applications could point to (in e.g. a haproxy load-balanced mysql-galera pool of servers).

It seems that marathon is using a similar approach through mesos-dns now.

Anyway, weave seems a really nice weapon if you don't want to mess with network matters!

joaojeronimo commented 9 years ago

Probably relevant in the topic of networking between containers, docker now has a "networking system".

hyperbolic2346 commented 9 years ago

@lukebond

ditch announce sidekick units and use gliderlabs/registrator (simplification)

I looked into this on my cluster and it does work as well as you'd like, but the registration in etcd is a little more complex. Instead of just /services/servicename being a key with ip/port json data or something, it instead is /services/servicename/hostname:service:port is the key and the ip/port is inside as just text.

This of course allows for scaling up to multiple machines per service easily, but makes the confd script a little more complicated. I still think this is the way to go currently. The docker networking system sounds like it will be incredibly easy to use once it is widely available and I would probably think switching to that would make the most sense in the long run unless the plan is still to get away from docker and use rkt. But then I saw email from the CoreOS team today talking about the open container project. It won't matter too much what front-end we use if all the containers are usable on any compliant system.

Anyway, still working towards this on my cluster and I will send a pull request when I get things sorted out.

lukebond commented 9 years ago

@joaojeronimo i'm still in the process of digesting these new announcements, i'm at the point where i'm able to articulate a few things though:

  1. Paz has three core strengths IMHO: a) simplicity, b) HAProxy networking "plumbing" so containers can talk to each other, c) a nice web UI on top of Fleet
  2. the new Docker networking features obviate the need for b, yet bring increased Docker daemon/platform tie-in
  3. resisting the "tie-in" (as i perceive it anyway) puts a at risk (simplicity). e.g. if I wanted to remain open to supporting rkt
  4. if someday Paz's unique selling point were only c, for me that would be disappointing
  5. who knows how things will change further in the future vis-à-vis Docker and rkt with the Open Container Project

I'm a bit conflicted.

lukebond commented 9 years ago

@hyperbolic2346 good stuff, thanks for the update.

i don't mind the added complexity of Confd template because it's already beyond comprehension :)

FWIW whatever happens re Docker networking etc. i think we'll probably still want HAProxy to do its job for external requests and therefore Registrator & Confd proposed changes still needed.

hyperbolic2346 commented 9 years ago

@lukebond So here's my latest thinking after playing around with registrator. Registrator dumps everything into the same directory and so I've learned that we need metadata in some way. For my personal web serving stuff I used the key as an indicator of "I should route requests to this machine". I would march all keys in webservers and that would contain everything accessible to the outside.

Once registrator floods everything into the same base key, we need a way to make that decision without using the key location. I've thought of a few ways to do this:

Consul seems to be the way that is the most popular. It looks like registrator is much more feature rich with consul and I am leaning towards adding another key value store in my cluster to solve this so I can continue to use registrator.

The metadata is an environment variable on the docker container in that setup and I think that is much cleaner than parsing the name. It makes sense to me to be there since at docker container creation time I know if I want this thing published.

lukebond commented 9 years ago

@hyperbolic2346 would it be possible to give a couple of contrived examples of what you mean re "everything in the same directory"?

hyperbolic2346 commented 9 years ago

@lukebond I made registrator dump to it's own directory and here is a snippet from etcdctl ls --recursive:

/registrator/services/dnsmasq-catch-53
/registrator/services/dnsmasq-catch-53/dogbert:paz-dnsmasq:53:udp
/registrator/services/dnsmasq-catch-53/dilbert:paz-dnsmasq:53:udp
/registrator/services/dnsmasq-catch-53/wally:paz-dnsmasq:53:udp
/registrator/services/dnsmasq-catch-53/ratbert:paz-dnsmasq:53:udp
/registrator/services/heapster_influxdb-8086
/registrator/services/heapster_influxdb-8086/catbert:influxdb:8086
/registrator/services/paz-haproxy-80
/registrator/services/paz-haproxy-80/dogbert:paz-haproxy:80
/registrator/services/paz-haproxy-80/wally:paz-haproxy:80
/registrator/services/paz-haproxy-80/ratbert:paz-haproxy:80
/registrator/services/paz-haproxy-80/catbert:paz-haproxy:80
/registrator/services/paz-haproxy-80/dilbert:paz-haproxy:80
/registrator/services/paz-orchestrator-1337
/registrator/services/paz-orchestrator-1337/wally:paz-orchestrator:1337
/registrator/services/cadvisor
/registrator/services/cadvisor/dogbert:cadvisor:8080
/registrator/services/cadvisor/dilbert:cadvisor:8080
/registrator/services/cadvisor/wally:cadvisor:8080
/registrator/services/cadvisor/ratbert:cadvisor:8080
/registrator/services/paz-scheduler
/registrator/services/paz-scheduler/dogbert:paz-scheduler:9002
/registrator/services/paz-service-directory
/registrator/services/paz-service-directory/wally:paz-service-directory:9001
/registrator/services/weave-6783
/registrator/services/paz-orchestrator-9000
/registrator/services/paz-orchestrator-9000/wally:paz-orchestrator:9000
/registrator/services/paz-web
/registrator/services/paz-web/ratbert:paz-web:80
/registrator/services/heapster_grafana
/registrator/services/heapster_grafana/catbert:grafana:8080
/registrator/services/heapster_influxdb-8083
/registrator/services/heapster_influxdb-8083/catbert:influxdb:8083
/registrator/services/paz-haproxy-1936
/registrator/services/paz-haproxy-1936/wally:paz-haproxy:1936
/registrator/services/paz-haproxy-1936/ratbert:paz-haproxy:1936
/registrator/services/paz-haproxy-1936/catbert:paz-haproxy:1936
/registrator/services/paz-haproxy-1936/dilbert:paz-haproxy:1936
/registrator/services/paz-haproxy-1936/dogbert:paz-haproxy:1936
/registrator/services/weavedns

I can't write a confd template that will easily tease out what we need to expose via haproxy without hard coding the things of interest. That or metadata of some sort.

Here are the internals:

 etcdctl get /registrator/services/paz-orchestrator-9000/wally:paz-orchestrator:9000
10.0.1.21:32769

If that was json with metadata like the consul version I could easily do what we need to do here.

lukebond commented 9 years ago

thanks. i also realise you gave an example in your earlier message; sorry about that.

i see the issue you're referring to now. how can we do a Confd template that can iterate like so: foreach service / foreach instance of service when there is no key in the path that is just service?

based on my reading of the Etcd section of the readme, there should be a separate section of the path for the service name. the separation between service name and service id seems to what we want if we have enough control over it.

certain i'm missing something since you're much more familiar with it than me, @hyperbolic2346

hyperbolic2346 commented 9 years ago

Yes, @lukebond . I seems that a service with just a single port exposed will get the name alone: /registrator/services/paz-web, but something with multiple ports ends up with a port on it as well: /registrator/services/paz-haproxy-1936 and /registrator/services/paz-haproxy-80.

If we could get metadata on it as well, we would know that this service needs to be exposed to haproxy or not. That's my biggest issue right now. I don't know if I should expose /registrator/services/weavedns for example. I also have the problem of what to name it, but I was expecting metadata as described in the docs:

type Service struct {
    ID    string               // <hostname>:<container-name>:<internal-port>[:udp if udp]
    Name  string               // <basename(container-image)>[-<internal-port> if >1 published ports]
    Port  int                  // <host-port>
    IP    string               // <host-ip> || <resolve(hostname)> if 0.0.0.0
    Tags  []string             // empty, or includes 'udp' if udp
    Attrs map[string]string    // any remaining service metadata from environment
}

It seems this is only provided to the storage mechanism and etcd currently just stores the ip:port.

lukebond commented 9 years ago

@hyperbolic2346 that seems like the best route doesn't it, getting JSON metadata into Registrator is Jeff is open to this. alas I know not Golang.

but i notice the following: https://github.com/gliderlabs/registrator/issues/131

and more importantly: https://github.com/gliderlabs/registrator/pull/76

perhaps we could help this person with the docs and tests for the PR, and rebase it correctly.

hyperbolic2346 commented 9 years ago

@lukebond I'm in the same boat about Golang, but it seems to be something I can avoid for only so long. :)

This might be the best option for us. We could use that PR to push web things into /services/http or something and just march that key for haproxy.

lukebond commented 9 years ago

perhaps @tomgco could help, he knows a little Golang