Open lukebond opened 9 years ago
Use WeaveDNS as the service discovery mechanism and ditch HAProxy/Confd altogether
I think you'd want to keep some kind of load balancer for, well, load balancing (and rolling upgrades, rate limiting, and whatever). But this doesn't stop you using WeaveDNS -- you can still give the load balancer the appropriate host name in DNS and things will cheerfully connect to it, and this is still better than "random Docker ports" as you say.
WeaveDNS currently won't help you with updating backends to HAProxy -- or not much.
WeaveDNS currently won't help you with updating backends
This doesn't detract from weave's usefulness, however.
@lukebond have you looked at vulcand at all? It's been on my list to experiment with, but haven't yet.
@rosskukulinski I got as far as seeing that it uses SRV records and was put off. I would need to use ambassadors or something if I didn't want apps running on Paz to know about SRV records.
@squaremo thanks for sharing your thoughts!
The existing HAProxy does round-robined load balancing for requests incoming from outside the cluster, and it also allows services to communicate internally by service name (ie. service discovery), again round-robined. All controlled by a fairly gnarly Etcd-Confd setup (https://github.com/yldio/paz-haproxy/blob/master/run.sh) - it works very well but the idea of having to change it always frightens me.
WeaveDNS can do the latter, and I could point HAProxy backends to the WeaveDNS name as @squaremo suggests, leaving internal routing entirely up to Weave.
We could keep the existing Etcd announcing sidekicks (useful for other things, eg. monitoring) and configure HAProxy for external load-balancing only when new services are added and removed. That would simplify things a lot, at the cost of losing the ability to do zero-downtime deployments.
To keep ZDD either Weave would have to add this feature (to buffer/drain-off connections when addresses are added and removed) or we'd have to keep all the existing Confd stuff, which would mean we haven't simplified much and have added another component.
Thoughts, @tomgco?
With the release of Weave 0.11 and automatic IP assignment I see almost no reason not to switch to weave and simplify our service discovery and routing.
HAProxy will largely serve as a means of routing external requests by HTTP host header, and the backends can be Weave-assigned IP addresses. The usage of dnsmasq can be replaced by WeaveDNS so that services can speak to each other by service name as before, but via the Weave SDN.
Unfortunately I still see a need for the sidekick announce units.
@lukebond So HAProxy will sit on each host on an exposed port and route incoming requests by hostname via WeaveDNS? The role of WeaveDNS would be to resolve the internal service names like paz-web to an IP, but how would we get the round-robin load balancing from HAProxy then?
@lukebond Hmm, I'm not sure if we can change the haproxy setup we're using even if we incorporate weave. From the haproxy docs:
<address> is the IPv4 or IPv6 address of the server. Alternatively, a
resolvable hostname is supported, but this name will be resolved
during start-up.
If the hostname is only resolved during start-up, we'll still have the confd complexity we have now. I'm not sure if we gain much if it includes the IP or just the hostname. I'm still a fan of using weave here though because we could nuke the port numbers at the least, but I'm not sure if this is the silver bullet you wanted.
Hi,
I've just discovered your PAZ PaaS and I like it a lot! We share a lot of ideas :))
Have you considered to use skydns? It could provide a very light (SRV records) distributed DNS/service discovery on top of the already existant 'etcd' backend.
The SRV records should be easily translatable to any TCP/HTTP routing agent (aka HAProxy, hitch...).
@rcmorano thanks for your interest in Paz! Glad you like it :)
My objection to SkyDNS in the past was the use of SRV records, but I didn't think of a way of translating them as you suggest. I didn't want to impose SRV records on containers running on Paz. I'm curious to know more as this is new to me :)
@hyperbolic2346 to answer your first question:
@lukebond So HAProxy will sit on each host on an exposed port and route incoming requests by hostname via WeaveDNS? The role of WeaveDNS would be to resolve the internal service names like paz-web to an IP, but how would we get the round-robin load balancing from HAProxy then?
yes, that's more or less what I'm currently envisaging.
re load balancing, we'd be relying on WeaveDNS' random load balancing, which is arguably acceptable, but won't be as good.
to answer your second question:
If the hostname is only resolved during start-up, we'll still have the confd complexity we have now. I'm not sure if we gain much if it includes the IP or just the hostname. I'm still a fan of using weave here though because we could nuke the port numbers at the least, but I'm not sure if this is the silver bullet you wanted.
Although I was complaining above about the complexity of the HAProxy/Confd setup I also don't see a way around it at the moment. The best solution I can currently see is something like the following:
I hope that helps to answer your question. I'd be interested in your thoughts!!
@lukebond the approach I was thinking about is like the one exposed in this infoq.com article. skydock has already some work in this way, although it seems that project is stalled.
A simple DNS query could be easily parsed and translated into a list of service backends for HAProxy.
In the same manner, per paz-service HAProxy instances (I guess you could have more than one instance spreaded through the whole cluster: load balance your load balancers) could be DNS discovered and then set as a simple RR DNS record where applications could point to (in e.g. a haproxy load-balanced mysql-galera pool of servers).
It seems that marathon is using a similar approach through mesos-dns now.
Anyway, weave seems a really nice weapon if you don't want to mess with network matters!
Probably relevant in the topic of networking between containers, docker now has a "networking system".
@lukebond
ditch announce sidekick units and use gliderlabs/registrator (simplification)
I looked into this on my cluster and it does work as well as you'd like, but the registration in etcd is a little more complex. Instead of just /services/servicename being a key with ip/port json data or something, it instead is /services/servicename/hostname:service:port is the key and the ip/port is inside as just text.
This of course allows for scaling up to multiple machines per service easily, but makes the confd script a little more complicated. I still think this is the way to go currently. The docker networking system sounds like it will be incredibly easy to use once it is widely available and I would probably think switching to that would make the most sense in the long run unless the plan is still to get away from docker and use rkt. But then I saw email from the CoreOS team today talking about the open container project. It won't matter too much what front-end we use if all the containers are usable on any compliant system.
Anyway, still working towards this on my cluster and I will send a pull request when I get things sorted out.
@joaojeronimo i'm still in the process of digesting these new announcements, i'm at the point where i'm able to articulate a few things though:
I'm a bit conflicted.
@hyperbolic2346 good stuff, thanks for the update.
i don't mind the added complexity of Confd template because it's already beyond comprehension :)
FWIW whatever happens re Docker networking etc. i think we'll probably still want HAProxy to do its job for external requests and therefore Registrator & Confd proposed changes still needed.
@lukebond So here's my latest thinking after playing around with registrator. Registrator dumps everything into the same directory and so I've learned that we need metadata in some way. For my personal web serving stuff I used the key as an indicator of "I should route requests to this machine". I would march all keys in webservers and that would contain everything accessible to the outside.
Once registrator floods everything into the same base key, we need a way to make that decision without using the key location. I've thought of a few ways to do this:
Consul seems to be the way that is the most popular. It looks like registrator is much more feature rich with consul and I am leaning towards adding another key value store in my cluster to solve this so I can continue to use registrator.
The metadata is an environment variable on the docker container in that setup and I think that is much cleaner than parsing the name. It makes sense to me to be there since at docker container creation time I know if I want this thing published.
@hyperbolic2346 would it be possible to give a couple of contrived examples of what you mean re "everything in the same directory"?
@lukebond I made registrator dump to it's own directory and here is a snippet from etcdctl ls --recursive:
/registrator/services/dnsmasq-catch-53
/registrator/services/dnsmasq-catch-53/dogbert:paz-dnsmasq:53:udp
/registrator/services/dnsmasq-catch-53/dilbert:paz-dnsmasq:53:udp
/registrator/services/dnsmasq-catch-53/wally:paz-dnsmasq:53:udp
/registrator/services/dnsmasq-catch-53/ratbert:paz-dnsmasq:53:udp
/registrator/services/heapster_influxdb-8086
/registrator/services/heapster_influxdb-8086/catbert:influxdb:8086
/registrator/services/paz-haproxy-80
/registrator/services/paz-haproxy-80/dogbert:paz-haproxy:80
/registrator/services/paz-haproxy-80/wally:paz-haproxy:80
/registrator/services/paz-haproxy-80/ratbert:paz-haproxy:80
/registrator/services/paz-haproxy-80/catbert:paz-haproxy:80
/registrator/services/paz-haproxy-80/dilbert:paz-haproxy:80
/registrator/services/paz-orchestrator-1337
/registrator/services/paz-orchestrator-1337/wally:paz-orchestrator:1337
/registrator/services/cadvisor
/registrator/services/cadvisor/dogbert:cadvisor:8080
/registrator/services/cadvisor/dilbert:cadvisor:8080
/registrator/services/cadvisor/wally:cadvisor:8080
/registrator/services/cadvisor/ratbert:cadvisor:8080
/registrator/services/paz-scheduler
/registrator/services/paz-scheduler/dogbert:paz-scheduler:9002
/registrator/services/paz-service-directory
/registrator/services/paz-service-directory/wally:paz-service-directory:9001
/registrator/services/weave-6783
/registrator/services/paz-orchestrator-9000
/registrator/services/paz-orchestrator-9000/wally:paz-orchestrator:9000
/registrator/services/paz-web
/registrator/services/paz-web/ratbert:paz-web:80
/registrator/services/heapster_grafana
/registrator/services/heapster_grafana/catbert:grafana:8080
/registrator/services/heapster_influxdb-8083
/registrator/services/heapster_influxdb-8083/catbert:influxdb:8083
/registrator/services/paz-haproxy-1936
/registrator/services/paz-haproxy-1936/wally:paz-haproxy:1936
/registrator/services/paz-haproxy-1936/ratbert:paz-haproxy:1936
/registrator/services/paz-haproxy-1936/catbert:paz-haproxy:1936
/registrator/services/paz-haproxy-1936/dilbert:paz-haproxy:1936
/registrator/services/paz-haproxy-1936/dogbert:paz-haproxy:1936
/registrator/services/weavedns
I can't write a confd template that will easily tease out what we need to expose via haproxy without hard coding the things of interest. That or metadata of some sort.
Here are the internals:
etcdctl get /registrator/services/paz-orchestrator-9000/wally:paz-orchestrator:9000
10.0.1.21:32769
If that was json with metadata like the consul version I could easily do what we need to do here.
thanks. i also realise you gave an example in your earlier message; sorry about that.
i see the issue you're referring to now. how can we do a Confd template that can iterate like so: foreach service / foreach instance of service
when there is no key in the path that is just service?
based on my reading of the Etcd section of the readme, there should be a separate section of the path for the service name. the separation between service name and service id seems to what we want if we have enough control over it.
certain i'm missing something since you're much more familiar with it than me, @hyperbolic2346
Yes, @lukebond . I seems that a service with just a single port exposed will get the name alone: /registrator/services/paz-web
, but something with multiple ports ends up with a port on it as well: /registrator/services/paz-haproxy-1936
and /registrator/services/paz-haproxy-80
.
If we could get metadata on it as well, we would know that this service needs to be exposed to haproxy or not. That's my biggest issue right now. I don't know if I should expose /registrator/services/weavedns for example. I also have the problem of what to name it, but I was expecting metadata as described in the docs:
type Service struct {
ID string // <hostname>:<container-name>:<internal-port>[:udp if udp]
Name string // <basename(container-image)>[-<internal-port> if >1 published ports]
Port int // <host-port>
IP string // <host-ip> || <resolve(hostname)> if 0.0.0.0
Tags []string // empty, or includes 'udp' if udp
Attrs map[string]string // any remaining service metadata from environment
}
It seems this is only provided to the storage mechanism and etcd currently just stores the ip:port.
@hyperbolic2346 that seems like the best route doesn't it, getting JSON metadata into Registrator is Jeff is open to this. alas I know not Golang.
but i notice the following: https://github.com/gliderlabs/registrator/issues/131
and more importantly: https://github.com/gliderlabs/registrator/pull/76
perhaps we could help this person with the docs and tests for the PR, and rebase it correctly.
@lukebond I'm in the same boat about Golang, but it seems to be something I can avoid for only so long. :)
This might be the best option for us. We could use that PR to push web things into /services/http or something and just march that key for haproxy.
perhaps @tomgco could help, he knows a little Golang
It's currently a bit brittle and difficult to understand / remember how it works.
Some options:
For me Weave would be really helpful, but there are multiple ways we could use it and we should have a discussion with the Weave developers about this. Some options: