Support multiple domains per datacenter

mtougeron commented 10 years ago

It would be extremely helpful for us if consul supported multiple domains per datacenter. This would help us be able to segment the clients connected to the cluster while still considering them part of a single datacenter.

For example, on the server(s) it would support something like { "datacenter": "us-west-1", "domain": ["dev.internal", "qa.internal"] }

If ClientA had { "datacenter": "us-west-1", "domain": "dev.internal", "service": { "name": "foo" } } and ClientB had { "datacenter": "us-west-1", "domain": "qa.internal", "service": { "name": "foo" } }

foo.service.us-west-1.dev.internal would resolve to ClientA foo.service.us-west-1.qa.internal would resolve to ClientB

Or perhaps support something similar via tags?

Basically we want to avoid running multiple clusters of consul servers for each "environment" in a datacenter. We also want to avoid having to add the environment name to the service names. If we did dev.foo.service.us-west-1.internal (using a 'dev' tag) it creates a higher chance of error that the app code sets the wrong environment.

p.s., this may be similar to https://github.com/hashicorp/consul/issues/208 but it seemed different enough that I opened another ticket.

armon commented 10 years ago

Does this mean a single node exists in the dev and qa clusters at the same time? I am rather confused as to the use case.

carlivar commented 10 years ago

+1. I posted yesterday to the consul mailing list about this.

The drawback of running multiple clusters of consul servers, is if an agent wants access to multiple clusters you'd also have to run multiple agents. That could get messy. For example, Bamboo is our build server. It's "production". A QA host might need to access production Bamboo but QA everything else. There are other services that are shared amongst multiple environments, so a "hard scope" doesn't work for us.

This domain idea is solid. What I'd like to see is another layer between service name and datacenter name.

Instead of:

foobar-mysql.service.chicago.consul

I would love:

foobar-mysql.service.qa.chicago.consul

The environment could be optional, returning results from service-providers with the same environment by default (just how datacenter works).

I don't want to use tags either because it breaks down when tags mean different things. For example, QA and Production might both have MySQL masters and slaves. I can't do:

master.qa.foobar-mysql.service.consul

Because multiple tags aren't supported in DNS queries (maybe that's one way to solve this problem, though). But I think we would like things a little more error-proof on the client side. Currently if QA and Production both publish the service foobar-mysql and a client uses foobar-mysql.service.consul they would get results across both environments. That's what we want to avoid.

carlivar commented 10 years ago

Just realized I should correct one thing above. We probably wouldn't have the "multiple agents need to run" problem for agents that need to access multiple environments. They would use their own datacenter+environment by default and could be explicit when they need something else.

So datacenter value could work if 1) people are willing to run multiple server clusters (not ideal) or 2) if Consul supported multiple datacenter values in a cluster. These options apply if the layer between service name and datacenter described above isn't desired.

armon commented 10 years ago

It seems in general people want the namespacing that data centers provide, but without the need to run multiple clusters per environment (prod, qa, stage). I think domains are redundant to data centers to solve this, since they are both namespacing mechanisms.

Instead, I think multi-tenancy of environments is wanted. I'll think on this, but it is a challenging problem. For now, the simplest approach is to run multiple instances of Consul (1 per environment) on the same physical hardware.

carlivar commented 10 years ago

the simplest approach is to run multiple instances of Consul (1 per environment) on the same physical hardware.

Agreed. We have 13 environments. We add and remove them fairly frequently, usually due to a team branching and needing their own dev or qa environment. So you can see that this may be simple, but doesn't scale all that well.

mtougeron commented 10 years ago

Does this mean a single node exists in the dev and qa clusters at the same time?

No. It means that there is a node in a dev cluster & another node in a qa cluster. We want the app code to just look for "foo.service.internal" and not have to worry about which cluster it is connected to.

It seems in general people want the namespacing that data centers provide, but without the need to run multiple clusters per environment

Yes

the simplest approach is to run multiple instances of Consul (1 per environment) on the same physical hardware.

This is actually quite difficult to do with Chef in a way that provides a good HA solution. Also, if an instance in AWS goes down we lose 1/3 of servers across all environments instead of just the one. :(

armon commented 10 years ago

Got it. A better solution to multi-tenancy seems to be the consensus. I agree whole heartedly. I'll start thinking about how to support this nicely.

Unfortunately, even with multi-tenancy, losing a server would affect all the environments. No way to solve that if the hardware is shared.

mtougeron commented 10 years ago

I'll start thinking about how to support this nicely.

You rock!

Unfortunately, even with multi-tenancy, losing a server would affect all the environments.

Yup. But at least it would only be for cross-dc queries to the dc that lost a node. :(

niclashoyer commented 10 years ago

multi-tenancy would be great! I just planned to run multiple clusters, but if consul supports that out of the box, it would save a lot of headaches (for us :smile: ).

anthonybishopric commented 9 years ago

This would help us a lot as well. We also have a number of distinct environments / domains within the same datacenter that are firewalled off from each other; setting up many consul server clusters is our current approach but it's not ideal for many reasons.

tdeckers commented 9 years ago

I'm looking at the use case for larger enterprises, where multiple teams contribute to an eco-system of services. Individual teams within the company are considered 'tenants' and should be able to contribute and manage services they own, without impacting others. One thought is to use hierarchies of consul - where tenants use a private consul cluster for project internal orchestration. This approach will still require a common, multi-tenant top-level consul cluster to discover 'published' services from other teams. This all starts with the question - is consul a good fit for these types of use cases?

armon commented 9 years ago

@tdeckers Typically, you want to treat Consul as a shared platform within a large organization. Teams individual manage and expose their services, but they do it on a shared cluster. Having tons of independent Consul clusters will quickly become a burden to operators, as there is so many more Consul servers to reason about. The goal is to get the ACLs to the point where they are sufficiently advanced enough to enable even the most complex multi-tenant use cases.

ckauhaus commented 9 years ago

+1

We have a use case here where we would really need some sort of enforceable name spaces for services. So mechanism to restrict a particular agent to manage services only within a defined name space would be greatly appreciated.

DanyC97 commented 9 years ago

+100 time, much appreciated!

nati commented 9 years ago

Hi Is this still Open? or This can be done by today's ACL support on consul? if so I would like to know some example configuration.

slackpad commented 9 years ago

Hi @nati this is still in work. ACLs are getting much richer for the upcoming 0.6 release. You can see the upcoming documentation here - https://github.com/hashicorp/consul/blob/master/website/source/docs/internals/acl.html.markdown.

glenngillen commented 9 years ago

@slackpad that sounds great! I was going to try and hack it by creating a namespace/directory for each tenant and then restricting accordingly. But the benefits of preventing leaking via service registration and DNS have convinced me I should hold off.

If there's some way to help with the development let me know. I'd like to try and do more than add another +1 :smile:

scalp42 commented 9 years ago

The ACL updates are welcomed but we lose the possibility of discovering many environments for example on demand.

slackpad commented 9 years ago

@scalp42 I'm not sure I follow. The ACLs should be rich enough for you to provide read-only access to services if so desired. Do you have a specific use case in mind that we wouldn't hit?

dvusboy commented 9 years ago

I'm not sure how ACL will help with DNS-based discovery that doesn't use SRV records (because, say, service ports are known) but just vanilla name resolution. There is a distinction between ACL management of service registration for different tenants and lookup. What if I want some kind of domain name convention such that I can support search in /etc/resolv.conf? This is where multiple subdomains within the same Consul cluster comes in handy. For example, I have this /etc/resolv.conf

nameserver 127.0.0.1
search mine.services.consul shared.services.consul

I'm looking to connect to service foo which happens to not be running under my domain, it'd just fall back onto the shared domain. This lookup part has nothing to do with tenancy. Rather, it's about logical grouping. I don't really need ACL on the lookup.

ssorathia commented 9 years ago

So considering that this case has been open for over a year, am I to assume that from a service discovery perspective, that native support for multiple environments will not be happening? I'm with @dvusboy that I'm not clear how ACL's help with DNS-based discovery. I guess I could use tags, but I'd have to wrap my head around how we would deal with that given our current environment. Plus using tags would likely be more prone to errors as not all of our applications are completely 'environment' aware.

I will likely have to go the multiple consul clusters route to prevent that from happening, which is something I was hoping to avoid.

momania commented 8 years ago

+1 for adding a segregation layer between the DC and the servicename, so we can run multiple environments on a single big consul cluster.

Cinderhaze commented 8 years ago

We are currently using an setup with a similar breakdown, and we were wondering how to address this with consul.

We have high level logical groupings of servers we call domains (infrastructure, development, opsapp, testapp, sandboxapp) and we can create 'subdomains' when we stand up our own stack (developer initials, branch name via ci, etc)

We currently manage a host file with puppet (terrible solution) and we can access instances like 'servicename.daw.sandbox.xyz' where .xyz is added to our no proxy rules.

We have the ability on an instance to access nexus.development.xyz to access the common development nexus server, but we can access nexus.daw.development.xyz if we are testing changes to our development nodes, or nfs.daw.infrastructure.xyz if we are testing changes to our nfs server, etc.

We may end up constructing our service names like NAS-daw-inf.service.consul if we can't find a reasonable alternative.

TL;DR, +1 for this feature, or a way to use this feature without running multiple consul clusters

slackpad commented 8 years ago

Linking https://groups.google.com/d/msgid/consul-tool/c7a5ff91-ff4c-4ea7-a4d1-af5d7c5e8a72%40googlegroups.com?utm_medium=email&utm_source=footer here which has some implementation ideas around tags.

rkno82 commented 8 years ago

What is the latest status of that feature? We do have the same requirement, which comes down to use one central clusters and creating namespaces, which allow's us to control that one service can only change within their namespace while reading others is okay.

slackpad commented 8 years ago

Hi @rkno82 this is still a ways out though it is on our roadmap. There are some architecture implications to think through, but we know that a lot of folks are interested in this capability.

tdeckers commented 8 years ago

Still on this topic.. ACL's are now pretty elaborate. However, in a multi-tenant environment I want to make sure tenants service names aren't overlapping. I might have two project teams (two tenants for my infra) that are creating a 'web' component as part of their service. Both might want to create web.service.consul. I'd be looking for a way to make them register web.tenant1.service.consul and web.tenant2.service.consul. EDIT: i'd put ACL's on *.tenantx.service.consul so that only respective tenants can update.

On top of these private, intra-application (or intra-tenant) service, I'll have these tenants publish public services (within our enterprise): useful1.service.consul and useful2.service.com. Anyone running into this situation? Any ideas how to solve this problem with current consul? New features needed?

bfgoodrich commented 8 years ago

Is this still on the roadmap? If so, is multi-tenant support still a ways out or is this something coming soon at this point?

ssenaria commented 8 years ago

Just ran into this issue today. Any updates? Does anyone have a workaround to this? I dont want to have to provision a bunch of Consul servers to separate each environment.

hany commented 7 years ago

Also ran into this recently. Any updates on this issue? It's been over 2 years since this issue was first opened.

gneill794 commented 7 years ago

Hi consul experts! I am new to consul and I have ran in to this very same issue! I don't want to use tags as they are implemented, because of the evil it could cause by accidentally mistaking one unimportant environment with another very important environment.

However, I was thinking this could be solved using tags ... if one could specify on a service definition something such as,

discovery:"tags-only"

where in this service discovery mode you could restrict matching to available services having a matching tag; and that's it.

e.g. given a redis service ...

redis.service.consul. -> would never provide matches

but

dev.redis.service.consul --> would provide a match.

It's quite possible I am trivializing it... but if this feature were there; it would work for me. I will offer to create a patch and contribute if consul folks think it's a doable thing...

TIA, George

thatsk commented 6 years ago

Looks like on datacenter if we have 3 domains dev, qa prod. Consul server can join to each other domain which is not correct. IN datacenter team can form multiple domain. Domain can restrict to each other.

damobrisbane commented 6 years ago

IMO I dont know whether this should remain on the roadmap at all, because of the reservations already expressed. Is not the complexity due to a mismatch between hierarchical (domains) verses flat (single service) namespace? Whatever solution tries to "address" the issue, will just create more complexity and edge cases.

The current scheme - of dropping the datacenter part from a query - is already an optimisation around a hierarchical namespace, being flattened. Any implementation by including dev, qa, prod, etc, in the same vein, will inevitably be more complex, as I understand.

A workaround which I have found, is to follow a convention of only using fully qualified form when accessing shared services, rather than expecting the shared service namespace to be flattened. So my dev, test, qa nodes and services follow this:

[dev] myapplication.node.mydomain [qa] myapplication.node.mydomain [prod] myapplication.node.mydomain

[dev, shared repository] repo.node.shared.mydomain [qa, shared repository] repo.node.shared.mydomain

mtougeron commented 6 years ago

@damobrisbane I'm not opposed to closing this. I think things have progressed enough from 2014 to make this unnecessary.

damobrisbane commented 6 years ago

@mtougeron are you able to expand a little or provide some link on how the product has evolved to meet the scenario that prompted this. Does it depend on enterprise version, specific design patterns or anything else when supporting "..multiple domains per datacenter".. Cheers

SailingYYC commented 6 years ago

@mtougeron and @armon, I'm with @damobrisbane.

@mtougeron doesn't provide any additional information on how this problem can be solved with the application as it stands today.

In our use case, we maintain approximately 150 environments, wherein each is a near replica of the rest. Maintaining 150 consul clusters is not feasible. Namespacing by environment, provides the resource isolation we'd need.

resolv.conf on each node in an environment (eg. environment = env100):

search env100.dc1.consul dc1.consul consul company.com
nameserver 192.168.0.1

This also greatly simplifies end-user access as there is consistency in accessing services and servers (all servers and services for an environment are easily determinable by a human and match from environment to environment).

hanshasselberg commented 4 years ago

I am happy to let you know this issue is solved, now that namespaces arrived in Consul 1.7.0 Enterprise: https://github.com/hashicorp/consul/blob/master/CHANGELOG.md#170-february-11-2020.

hashicorp / consul

Support multiple domains per datacenter #290