moby / swarmkit

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Apache License 2.0
3.38k stars 616 forks source link

Define schema for mapping service resources into DNS #1242

Open stevvooe opened 8 years ago

stevvooe commented 8 years ago

While there has been discussion in https://github.com/docker/docker/pull/24973 and https://github.com/docker/swarmkit/issues/192, the adoption of a clear schema for mapping service resources into the DNS space is unclear.

The following presents a schema for mapping cluster-level FQDNs from various components:

Resource Component Structure Examples
Cluster <cluster> <cluster> local, cluster0
Namespace <namespace> <namespace>.<cluster> production.cluster0, development.local, system
Node <node> <node>.<cluster> node0.local
Job <job> <job>.<namespace>.<cluster> job0.production.cluster0
Slot <slot> <slot id>.<job>.<namespace>.<cluster> 1.job0.production.cluster0
Task <task> <task id>.<slot id>.<job>.<namespace>.<cluster> abcdef.1.job0.production.cluster0

@mavenugo @mrjana @aluzzardi

stevvooe commented 8 years ago

@aluzzardi This is related to what we discussed today.

@mavenugo Is there any progress on making this happen?

aluzzardi commented 8 years ago

@stevvooe How does this map with #1193 since we've adopted <e.ServiceAnnotations.Name>.<Slot>.<TaskID> as a model?

stevvooe commented 8 years ago

@aluzzardi That is just the naming convention, which goes left to right. DNS goes right to left. Completely compatible.

The only open item is the consistency of active slots. We need a way for tasks to discover all of the hostnames of other tasks, via DNS or other, regardless of DNS-RR or VIP mode. This will allow us to support host-identity based services, like zk, etcd, nats, etc.

mostolog commented 8 years ago

Perhaps I'm confused, but shouldn't DNS SRV records will perfectly match for this use case?

Regarding zookeeper config, as stated by documentation, the config file file should look like:

...
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
...

A Zookeeper image with a defined entrypoint, could execute:

nslookup -querytype=srv _zookepeer._tcp.swarmmode.com

# _service._proto.name.  TTL   class SRV priority weight port target.
_zookepeer._tcp.swarmmode.com.   100 IN    SRV 10       10     2888 zookeper.1.swarmmode.com
_zookepeer._tcp.swarmmode.com.   100 IN    SRV 10       10     2888 zookeper.2.swarmmode.com
_zookepeer._tcp.swarmmode.com.   100 IN    SRV 10       10     2888 zookeper.3.swarmmode.com

And echo records for each server to config file.

Each time a service-task is executed, it should publish SRV entries at swarm internal DNS, and when stopped, removed from it. Of course, I dare to say this without having any clue about Docker internal DNS (or whatever it uses)

Regards.

stevvooe commented 8 years ago

@mostolog That is somewhat the goal here but we first need a plan for mapping these into the SRV records in a consistent manner. At this point, it is fairly ad-hoc, which is very disappointing.

mostolog commented 8 years ago

I have been reading #192 and trying to reply, but at the end I just think I probably lack the needed knowledge to discuss this.

I'll only say that, IMHO, it makes much more sense to have abcdef-1.job-0.cluster-0 rather than abcdef.1.job0.production.cluster0.

mostolog commented 8 years ago

@stevvooe Somehow related to this:

docker run -h myhost...

adds an entry on /etc/host with: ip container-name Is there any way to specify a domain or set this entry to FQDN? ie: ip myhost.domain.com on /etc/host

Seems doing:

docker run -h myhost.domain.com...

it's not very polite, cause hostname should be only myhost

Is this a missing feature I could request on https://github.com/docker/docker ? Am I missing something? Thanks

stevvooe commented 8 years ago

192 is a little ambitious. I wrote that proposal very early on. We mostly keep it open to consider some of the namespacing concepts, but let's not consider that a hard fast model.

Like I've said before, this isn't about a single fix. There needs to be a concerted effort to manage the DNS name mapping.

vovimayhem commented 8 years ago

Will this allow the routing mesh to route requests to different services running on the same port? Is that even being planned for the routing mesh dns stuff???

stevvooe commented 8 years ago

@vovimayhem No, this is more about mapping services into DNS. To multiplex services on the same port, you'll need to introduce an L7 load balancer to manage that in your infrastructure.

vovimayhem commented 8 years ago

Thank you for clarifying that! I've been searching for clear info (a confirmation) about this for almost a month now!

mostolog commented 8 years ago

@stevvooe I'm back! :stuck_out_tongue_closed_eyes: Have you considered enabling DNS registering in out-of-docker DNS server?

It will also be interesting being able to register under a specific domain tree Background-related to this: could a docker node create multiple swarm clusters (each having a domain scope)?

stevvooe commented 8 years ago

@mostolog Let's keep the discussion focused on the proposal at hand. The presented ideas are interesting but they are orthogonal to the goal of creating a clear DNS-based service discovery, which is the topic at hand.

doxxx commented 7 years ago

I like your naming scheme proposal. What I'm wondering is if a container's canonical fully-qualified hostname could be the proposed task DNS name instead of simply the container ID? For example, if some application in the container does the equivalent of gethostname() and then getaddrinfo() it should return abcdef.1.job0.production.cluster0 instead of simply abcdef.

This would allow applications in containers to provide more useful hostnames to other applications, e.g. Apache Spark.

stevvooe commented 7 years ago

@doxxx Currently, docker containers have an expectation that their hostname is the container id (not the task id). This is insufficient, as it is not unique (truncated to only 48 bits of entropy!) and provide no notion of location. I am not sure if we can change this to something more correct inside the container.

With the way UTS namespace work, I would expect us to set the hostname to the abcdef and the domainname to 1.job0.production.cluster0 (corresponding to the slot). Assembling these would require calling gethostname and getdomain, resulting in the FQDN for the task.

mostolog commented 7 years ago

Don't know if close-related or related-enough to be considered...

I'm starting to get some "unfriendly" experiences with containers having too-long hostnames like: project-service-swarmnodeidwhichactuallyisquitelong-#slot created using 2.13 template naming. eg: {{.Node.ID}}.

Perhaps it would be great to be able to allow "nearest" container resolution a partial name (not FQDN). ie: A configuration asking for "mysql" running on a container under "com.domain.app" should look for "com.domain.mysql" while another running under "com.domain.sub.whatever.app" "com.domain.sub.whatever.mysql"

Is that already designed that way? Does it make sense? It is not related at all? Thanks ;)

stevvooe commented 7 years ago

@mostolog I think that is a reasonable assumption. One example might be having task 1 hit task 2 with just 2, since they are in the same job. We already do this to some degree but there will need to be clever setup in the domain naming approach.

mostolog commented 7 years ago

Thanks!

mostolog commented 7 years ago

Hi again.

Reviewing my notes I just confirmed docker stack deploy --compose-file stack.yml mystack creates services named like:

mystack_fooservice mystack_barservice

Already asked if dash can be configured instead of underscore on forums, but I was wondering if this won't have any effects on this issue (aka: define a FQDN for host containing "_") or something else. Just to let you know.

stevvooe commented 7 years ago

Already asked if dash can be configured instead of underscore on forums, but I was wondering if this won't have any effects on this issue (aka: define a FQDN for host containing "_") or something else. Just to let you know.

This is a HUGE bug.

@dnephin Are you guys going to fix this? Supporting _ in service names is a huge nono and will break any hope for a reasonable future. I'm surprised these passed validation.

dnephin commented 7 years ago

Until we have server side stacks I think it's a mistake to change this. We kept it consistent with Compose knowing that either namespaces or server side stacks would be a major change, and we didn't want to change it twice.

Services in a stack should be referenced by the scoped name (without the underscore) anyway, so the underscore shouldn't be relevant for anything except for the CLI.

stevvooe commented 7 years ago

Services in a stack should be referenced by the scoped name (without the underscore) anyway, so the underscore shouldn't be relevant for anything except for the CLI.

Phew!

Are there underscores in the actual service names? Because we to avoid having anything in the resolution path not be supportable in the future.

doxxx commented 7 years ago

Services in a stack should be referenced by the scoped name (without the underscore) anyway, so the underscore shouldn't be relevant for anything except for the CLI.

If a service provides it's fully-qualified domain name to another service (e.g. Apache Spark Master and Workers), that would include the underscore which has been known to cause problems in URL validation code in Spark.

mostolog commented 7 years ago

Services in a stack should be referenced by the scoped name (without the underscore) anyway, so the underscore shouldn't be relevant for anything except for the CLI.

Although compose works correctly when using link/depens_on, wouldn't this still be a problem when setting service names within container configuration files? eg:

services:
  mysql:
    ...
  php:
    ...
docker stack deploy --compose-file file.yml my

mystack_mysql.1...
mystack_php.1...

what name should I set on php's mysql_connect? IIUC, I have to actually use "mystack_mysql"

Are there underscores in the actual service names? Because we to avoid having anything in the resolution path not be supportable in the future.

Not to me...

stevvooe commented 7 years ago

Although compose works correctly when using link/depens_on, wouldn't this still be a problem when setting service names within container configuration files?

I don't think links or depends_on is supported in a distributed environment. There are odd scheduling issues that come up when you start introducing these kinds of features.

I tried this out and looks like we are pushing underscores into service names:

$ docker service ps redis-test_redis
ID            NAME                IMAGE         NODE                DESIRED STATE  CURRENT STATE           ERROR  PORTS
ehi3v814uudq  redis-test_redis.1  redis:latest  docker-XPS-13-9343  Running        Running 2 hours ago            
w2iti40qz180  redis-test_redis.2  redis:latest  docker-XPS-13-9343  Running        Running 58 seconds ago         
vu3bgifqim3r  redis-test_redis.3  redis:latest  docker-XPS-13-9343  Running        Running 58 seconds ago         
i34cxloom9sq  redis-test_redis.4  redis:latest  docker-XPS-13-9343  Running        Running 58 seconds ago         
raqf7lhkp9b0  redis-test_redis.5  redis:latest  docker-XPS-13-9343  Running        Running 58 seconds ago

This might hold this proposal back.

mostolog commented 7 years ago

Screw others! Change the world! Power to the people! DNS schema! ;)

ksachdev1 commented 7 years ago

Will this address this: I have a docker swarm on which say I deploy multiple instances of MongoDB. The instances on doing docker service ls services_mongo will show something along these lines

service_mongo.1 service_mongo.2 service_mongo.3 My web services need individually pingable names for the mongo service and the url for mongo is composed of taking all 3 names. On swarm the ping for individual instance is not supported today is what I understand it as based on https://github.com/moby/moby/issues/30546

If I have the above solution, in my yaml compose file, I can just set the replica to 3 for the mongo service block and I know I can access the services via above named entries. Is this possible? Without this, I am creating 3 separate service blocks to achieve the same.

benturner commented 7 years ago

Hi @stevvooe, is there any progress here? It seems like other attempts aimed at simple addressing for {{.Task.Slot}}.{{.Service.Name}} within swarm have been abandoned while waiting for this... Is there any way to resurrect https://github.com/moby/moby/pull/24973 to alleviate the problem in the short term in a way that would be compatible with your future work here ?

m4r10k commented 7 years ago

Yes, this would be really helpful! Because it should be possible to resolve through {{.Task.Slot}}.{{.Service.Name}} for cluster setups like Redis cluster join.

It is not very useful currently:

root@redis:/# nslookup 10.0.16.5
Server:     127.0.0.11
Address:    127.0.0.11#53

Non-authoritative answer:
5.16.0.10.in-addr.arpa  name = echo_app.3.8tppck42ohyz5j1dq49al80r3.echo_net.

Authoritative answers can be found from:

root@redis:/# nslookup echo_app.3.8tppck42ohyz5j1dq49al80r3.echo_net
Server:     127.0.0.11
Address:    127.0.0.11#53

Non-authoritative answer:
Name:   echo_app.3.8tppck42ohyz5j1dq49al80r3.echo_net
Address: 10.0.16.5

root@redis:/# nslookup echo_app.3.8tppck42ohyz5j1dq49al80r3.
Server:     127.0.0.11
Address:    127.0.0.11#53

Non-authoritative answer:
Name:   echo_app.3.8tppck42ohyz5j1dq49al80r3
Address: 10.0.16.5

root@redis:/# nslookup echo_app.3.                          
Server:     127.0.0.11
Address:    127.0.0.11#53

** server can't find echo_app.3: NXDOMAIN
deitch commented 7 years ago

(thanks for pointing me here @thaJeztah )

Overall, it is good to get this standardized. I think most of the naming convention here makes sense - essentially following the DNS rule of most-specific-to-least-specific from left-to-right. I see a few issues.

  1. <task>.<slot> feels reversed. Although the how it works docs say that a task is analogous to a slot (implying 1:1), the language here speaks more of the slot name. If we have 3 replicas of nginx, then the grouping is nginx, and we have slots 1, 2, 3 (or 0, 1, 2, if you prefer). So if we want to dot-separate, it would make more sense to be 1.nginx.production... and 2.nginx.production... etc. rather than nginx.0.
  2. Again <task>.<slot>: If having just the single digit feels strange (it does to me, but my bias), then give it a name <task>-<slot> (or <task>_<slot>) and make it an atomic part of the name. FWIW kube does that with statefulsets and it works pretty well (although consistency of restart is important there; more below).
  3. Hostname vs container ID: @stevvooe said that containers have "an expectation that their hostname is the container ID". That may be how it functions, but I don't think many "in the wild" expect it. If part of virtualization is making it mostly unaware, then container ID is mostly useless as a hostname (at least inside the container), and should be replaced. I always got the feel that container ID was just a "fallback", "well, not much anything better and we need lots of unique hostnames with container proliferation, so...". I think the <task>-<slot> or similar as the hostname makes a lot of sense. I also think injecting some other vars that would be in the FQDN above (namespace, etc.) might be useful.

I will admit I am cheating; I have done some heavy kube lifting, run into some of these issues, and seen how the availability of things like hostnames and env vars make a big difference.

One more point: if we start having predictable hostnames (or at least resolvable names) for instances, people will come to expect that they are consistent. If I can reach a particular container via nginx.0 (or 0.nginx or nginx-0), then if it dies, and swarm starts a new one, is that nginx-0 too? Or nginx-3? It might not matter for stateless nginx, but it sure does for things like etcd or zookeeper or redis etc.

Finally, any intent to put these into non-swarm mode (compose)? Or is the general thrust that over time non-swarm mode will cease to exist, and a single docker engine is just a single-node swarm?

m4r10k commented 7 years ago

To follow this up I would like to drop in docker/libnetwork#1855 here. At some point, we need predictable hostnames because otherwise setups, like cluster setups as mentioned in moby/moby#30546, in a dynamic environment wouldn't possible. Its not a fact about the container itself, because the container is only an envelope for the running process, e.g. Redis inside. This process itself is defined by the given configuration and in this configuration you would like to use resolvable hostnames, because they stay static nginx.0 or 0.nginx or whatever.0.you.want.1 in the config but the ip will not.

The same is true for config mounting. You have a share with

/share/0.nginx
/share/1.nginx
....

Then maybe you want to use -v /share:/share and in the config inside the container you can then use (pseudo config): datadir=/share/$HOSTNAME . This is possible via --hostname={{.Service.Name}}-{{.Task.Slot}} already today, but you will not be able to resolve this hostnames via DNS. Why?

Yes I know that there is a problem with scaling setups, but you won't scale setups which are stateful like a cluster with three nodes. But you want this service to be deployable via Swarm, for example, with zero configuration. And therefore hostnames should also be DNS registered like in my PR.

eyz commented 7 years ago

I have a fork that I'm testing in my local environments with the following changes to Docker Engine 17.06 and 17.07 (some hard-coded logic for now, but with the intention to move to templates and submit applicable PRs):

The shared IPAM server processes Docker IPAM driver calls, as well as handles API calls from a PowerDNS authoritative resolver. I am planning to delegate a subdomain, such as swarm.mycompany.com to my Swarm-aware IPAM-backed PowerDNS server, so our Docker hosts and development workstations can resolve instantiated containers, even if they move between Docker Swarm hosts (as with the above changes, the volume names, hostnames, container names, and IPs remain static to the instantiated slot instance).

Ideally, we would likely resolve such as: 1.consul.uat.swarm.mycompany.com. This service and slot ID ordering, as I believe it was mentioned however, goes against the current standard, which is for the service name to come first, then task slot ID, then task ID -- such as: consul.1.yp6q0y0f48qz24z90mlt7oq7t

I would love a more integrated solution, however -- especially before I get too deep into using this. Anything Moby can offer to assist in these areas is very much welcomed.

Please note that I have quite a bit of time available to work on items in these areas, as this is an area my company is extremely invested in to manage our next infrastructure design, so if you have thoughts on items in these areas I may be able to provide time: at minimum, for testing, and at most, partial or complete code.

CC @mavenugo, as I've spoken with him about my environment in the past, and he may be interested in more (updated) scope

I also agree that DNS should be arranged such as: [slot id].[service id].[stack].[cluster] -- in order of smallest to largest, as that is how domains are organized

mostolog commented 7 years ago

Hi @stevvooe

We are making a few tests with swarm, stacks and DNS If our stack is deployed "normally", containers will see each other and they'll have randomized hostnames However, when setting hostnames, the aren't able to resolve/dig each other. There seem to be a bunch of related issues already open.

It seems this topic has spread into multiple/zillions of issues, making it hard to follow/mantain What about an [epic] issue to summarize all them?

IMHO this topic is also taking too long to be define, although doesn't seem is an implementation blocking. Is there anything we could do to push it forward?

Thanks

m4r10k commented 7 years ago

This issue is the overall discussion, the great solution. Look at docker/libnetwork#1855, there I've made a PR with a possible solution which I guess is what you need.

mostolog commented 7 years ago

@kleinsasserm yes, but it has been idle for quite a while, isn't it? Why is hasn't been approved/merged? Are we missing some roadmap plans? Is that a bad practice we shall avoid?

To sum up: we are on the same car and - I don't know why - but it seems we aren't moving.

m4r10k commented 7 years ago

Yes, sad but true. Every six days or so I try to refresh my PR. Currently I was on vacation. I even don't know why my PR is not going to be merged. But I will still try it. Thank you for your response on my PR!

trapier commented 7 years ago

How could we get a view of DNS records stored within docker?

https://github.com/moby/moby/pull/31710

Something like the following might do the trick. Prints info for containers on attachable networks and the VIP for services using that endpoint mode. Does not print:

docker network inspect --verbose --format '{{range $name, $service := .Services}}{{if eq $name ""}}{{range .Tasks}}{{.Name}} {{.EndpointIP}}{{printf "\n"}}{{end}}{{else}}{{$name}} {{$service.VIP}}{{end}}{{end}}' $NETWORK |column -t
mostolog commented 7 years ago

Thank @trapier

As you said, it prints service's vip, but it doesn't contain any reference to containers (id) that could help me understand why if containers are unnamed (hostnamed random by default) they are able to resolve, while setting a specific hostname they can't.

@stevvooe Any comments on this issue/PR/behavior?

m4r10k commented 7 years ago

Oh, I just need some time to find it again, and maybe I mentioned it not clearly enough in my PR. So I debugged it once again. Lets assume the following two swarm stacks echo and echoa, which upon started will create these containers.

CONTAINER ID        IMAGE                   COMMAND                 CREATED             STATUS              PORTS                NAMES
4740fde7c5ac        n0r1skcom/echo:latest   "python3 -u /echo.py"   3 minutes ago       Up 3 minutes        3333/tcp, 3333/udp   echoa_app.2.z8weq0rralpx71to3luhi2oss
e2a90f29b495        n0r1skcom/echo:latest   "python3 -u /echo.py"   3 minutes ago       Up 3 minutes        3333/tcp, 3333/udp   echoa_app.1.7fk8h5sct8png4r7xbu5ox34w
f80aa02479fb        n0r1skcom/echo:latest   "python3 -u /echo.py"   5 minutes ago       Up 5 minutes        3333/tcp, 3333/udp   echo_app.2.yx5j98hhcgk6pnev8hqauq2to
e7c58268f4b4        n0r1skcom/echo:latest   "python3 -u /echo.py"   5 minutes ago       Up 5 minutes        3333/tcp, 3333/udp   echo_app.1.og46rf0bp8tsl1v4z8ultomnt

The stack echo is started with no hostname parameter in the swarm compose file, so the CONTAINER ID is equal with the hostname in the container itself. This point is important!

# docker exec echo_app.2.yx5j98hhcgk6pnev8hqauq2to hostname
f80aa02479fb

So, if I do the same for a container which is started with the hostname parameter inside the swarm compose file, this will be different!

# docker exec echoa_app.1.7fk8h5sct8png4r7xbu5ox34w hostname
echoa_app1

BUT I am still able to nslookup the CONTAINER ID from inside this containers:

docker exec echoa_app.1.7fk8h5sct8png4r7xbu5ox34w nslookup 4740fde7c5ac
Server:     127.0.0.11
Address:    127.0.0.11#53

Non-authoritative answer:
Name:   4740fde7c5ac
Address: 10.0.1.3

Do you see it? I can lookup the CONTAINER ID of the echoa_app.2 container from inside the echoa_app.1 container without any problems. What I found out is, that always only the CONTAINER ID is used for DNS registration, never the hostname.

What I did in my PR is to use sb.config.hostName from the container sandbox object to register the hostname in addition. Normally the code only uses n.ID(). If the hostname is the same as the CONTAINER ID nothing happens. If they are different, then the hostname will be DNS registered additionally.

Conclusion: As it still looks to me from the code, only CONTAINER ID is used for DNS registration, never the hostname (what my PR does). But any other hint is welcome. And sorry if I am wrong.

mostolog commented 7 years ago

@kleinsasserm Thanks for such a clarifying explanation. Indeed, everything lead to that (now scientifically-supported) conclusion.

I guess the main concern is not registering hostnames, as they can be non-unique, they can change IPs along time...but still it makes much sense to me to merge your PR. Or maybe it's just an Imbusywithmoreimportantstuffatthismoment.

Anyway, I think we still have to way for an answer from @docker-team

m4r10k commented 7 years ago

Thank you for your response, you are welcome! True, we will have to wait. :innocent:

stevvooe commented 7 years ago

@mostolog This issue is about the schema of mapping tasks to DNS in a more sane manner. For discussions regarding the current behavior, open another issue in moby.

m4r10k commented 7 years ago

😄 I have already opened one here moby/moby#34239 and there docker/swarmkit#2325 which is referenced in the PR docker/libnetwork#1855 that I mentioned here. No need to open up another one which will probably get closed as duplicate but what it really needs is, that someone of the maintainers should have a look there.

mostolog commented 7 years ago

@stevvooe IMHO it's related enough to be discussed here, as we are discussing why node names whould be added into DNS tree.

AKA: if I mess my containers having the same names, docker could complaint about it, or even dont work at all, but if they are going to be unique (AKA: using {{.Task.Slot}} template)...let me do it! https://github.com/docker/swarmkit/issues/1242#issuecomment-272039199 https://github.com/docker/swarmkit/issues/1242#issuecomment-272110280 https://github.com/docker/swarmkit/issues/1242#issuecomment-319943259 (bullet #3)

stevvooe commented 7 years ago

@mostolog @kleinsasserm Sorry but this is not a general hostname discussion thread. This issue is about the schema for mapping names to DNS. If there are other relevant discussions already open, have them there. I know it is annoying, but we need to keep the discussion topical.

The issues you are describing should be discussed on the issues in moby. If you want me to join those discussions, pull me in. I have no clue why your PR isn't merged and I am not familiar with the current behavior. I'll try to help as much as I can, but please be patient.

As far as getting this done, I am not really working in this area any more. If someone wants to take on a more complete proposal, I would be more than willing to support and advise.

m4r10k commented 7 years ago

OK.

nicodmf commented 6 years ago

I agree with the schema proposed here in the first message and close moby/moby#30546, As i say in the last issue, i think each component should be multi naming (ex: service.taskid, cluster.taskid, etc.)

farfeduc commented 5 years ago

When will this feature be available ? I would like to use the <Slot ID>.<Service> or <Service>.<Slot ID>.

kinghuang commented 5 years ago

I'd also like to echo support for this. It's been 3 years since this issue was opened, and there's still no easy way to configure things like ZooKeeper.

antoineco commented 5 years ago

If I understand this issue correctly, there is still no way to assign predictable DNS names to tasks inside a Swarm. Is that correct?

I was expecting to be able to use the {{.Task.Name}} template and resolve my tasks to mytask.1, mytask.2... but those task names are always suffixed with a random id (mytask.1.p8d7aufb80h8f8dwtfcmsyzy4).