Closed bacoboy closed 8 years ago
@bacoboy This is an interesting idea! We are working on a new feature for Consul that allows for richer lookup logic when resolving services, and this notion of a linear resolution order could definitely fit in there. That mechanism will be more flexible than the meta-DC and much easier than modifying logic in the existing APIs. The basic idea is to create APIs for creating custom DNS endpoints which dynamic behavior, things like this. I'm going to tag this as an enhancement so we keep this use case in mind as we are working on that feature.
Yea, depending on who you talk to, some people favor dumb clients (1 request/1 response). Others like clients to decide by altering the query (or making multiple calls). This clearly falls into the first category as it puts the DC fallback logic into server-side configuration easily managed via chef/puppet/etc...
Even with the new tomography features in 0.6, I think this feature still has its place where specific fallback is desired. However, I'd be interested in your thoughts @armon...
@bacoboy did you have a chance to look at prepared queries? You can use network tomography to select the next best N datacenters, or you can give an explicit list of fallback datacenters, or both.
Yes, but in this case I'm not looking to create a nearness relationship based on network transport speed, but more of a specific fallback chain. It also removes the logic from the client where it asked for 1 thing and how the fallback occurs isn't a concern of the client. Using the query would put more logic in the client than I want for this specific use case (I'd rather manage the fallback relationship out of band from the client using something like chef, etc).
Hi @bacoboy if you set NearestN
you can give a specific list of fallback datacenters - you aren't required to use nearness at all.
Also, clients don't typically create their own queries, they are created once and then clients just get the id of the query to execute. Those fields in the link above are for defining a query, but you don't need anything other than the id to execute the query. The client can just look up <id>.query.consul
via DNS or make an HTTP request to fetch the results, they won't be exposed to any details of the fallback logic or any other parts of the query. You can alter existing queries on the fly to change the behavior without any changes to your clients, or you could register new queries and give clients the new query id.
@bacoboy Our goal is that prepared queries would be the solution to this, in a more generic way. As @slackpad said, you can use the tomography for a "zero touch" failover configuration, but you can also specify the specific fallback order if you care to.
I agree that the queries allows control from the client
, but in cases where I don't want the client to know anything other than the local consul agent (because updating a zillion client configurations would be bad), the crux of this request is to move this fallback logic to the consul agent configuration.
Yes, tomography allows server side fallback if you want "closeness" to be your fallback mechanism, but that not what I want. I want a server-side way of saying which way to go if not found. @armon you said:
but you can also specify the specific fallback order if you care to
Are you referring to the client side query again or is there a configuration server-side I'm unaware of to specify fallback? I looked again, but didn't see anything.
Hi @bacoboy I think you might be misunderstanding how prepared queries work. You define the query one time and it's stored on the servers. Clients just execute the query by name so they don't know anything about how the query is defined. It works like this:
Failover
section.<id>.query.consul
using DNS, or they execute the query using https://www.consul.io/docs/agent/http/query.html#execute over HTTP.The clients don't have any idea what the query does or how it's configured (remember they don't have to post any of the information you gave the servers in the first step, they just use the ID you gave them). If you change the query's setup later using https://www.consul.io/docs/agent/http/query.html#specific then any client that executes again with that ID will get the new configuration, you don't have to update them at all.
Please let me know if this helps, or if you have any more questions about how these work. If I understand what you are looking for, I think it sounds really close.
You are correct, this is functionally equivalent, but unless you can wildcard the service name (it doesn't appear to be a regex -- and a regex would be expensive I'm sure), if I have 1000 services, I have to inject 1000 nearly identical queries AND have the additional overhead of doing this for new services being added and cleaning up unused queries for decommissioned services.
If you look back at my original suggestion, it is a simple extension for default behavior similar to when you don't specify a dc
in the query. Prepared queries are fine grained service level fallback rules -- which you can implement for my use case if I want to connect every single dot. My proposal is for datacenter level fallback rules. Again looking at the current functionality:
if service request DC.nil? // DC specified in query?
service = service.myDC // If not append my DC
end
if service.DC found: // Look it up specifically
return service.DC // Got one
else
return NOT FOUND // Sorry Charlie!
end
And the additional of an optional parent
configuration property:
if service request DC.nil? // DC not specified
if service.THIS_DC exists // check locally
return service.THIS_DC // found locally
else
if parent property set?
return lookup(service.PARENT, WAN) // ask parent
else
return NOT FOUND // current logic for compatability
end
end
else
if service.DC=XXXX matches something: // asked for something specific
return service.XXXXX // return if I know about it
else
return NOT FOUND // you asked for specific and it doesn't exist
end
end
Here I set 1 rule, 1 time per DC. I'd still be able to use prepared queries if I need finer control on a per-service level.
@bacoboy ok I understand the difference for the case where there are many, many services and you have good parity between DCs and it makes sense to fallback queries for any service.
We've got a new extension to prepared queries landing soon that will allow this type of behavior in the form of prepared query templates - https://github.com/hashicorp/consul/pull/1764. You'll be able to define a template prepared query that matches multiple (and potentially all) services within a datacenter and lets you apply prepared query logic to them.
Here's an example query that you could register in cn-north-1
to get the fallback ordering as shown in the diagram above. Note that the Name
prefix is empty, so it'll match any service queried:
{
Name: "",
"Template": {
"Type": "name_prefix_match"
},
"Service": {
"Service": "${name.full}",
"Failover": {
"Datacenters": ["ap-southeast-1", "us-west-1", "us-east-1"]
}
}
Once this was configured in cn-north-1
, then looking up *.query.consul
would try to resolve locally first and then fall back to the listed datacenters. See the PR for more details.
Closing this out - prepared query templates allow you prefix match service names (up to an empty prefix that matches any service with a single query). This shipped in 0.6.4 so I think we are good here. Please let me know if you have any questions.
Support server-side hierarchial lookups using optional
parent
DC configuration propertyWhile I know that issue #154 is related to address some kind of fallback processing when a service isn't found in the queried datacenter, I'm looking for something different which I don't think has been discussed yet.
In the current implementation, the logic looks something like this:
The proposed implementation in #154 pushes HOW the lookup should fallback into the query issued (meaning, the client would need to know) to something like this:
What I am proposing is managing the fallback logic in the consul configurations server-side by creating a parent relationship between the consul clsuters (known via less chatty WAN) that the client doesn't need to know anything about. It would use the WAN relationship and an optional configuration on the server side called
parent
. This is the fallback DC to use if there is no match when one is specified. And you keep asking up the chain until something in found (or not). In pseudocode:The
lookup(service.PARENT, WAN)
means, do a pass-thru call to the configuredparent
DC. This seems like it would scale much better and deal with geographic configurations such as this:In this example, I'm trying to serve customers in london, china, and the US.
I certainly don't want people in london calling "local" services in asia, I'd run those in eu-west-1. Chances are there are services I can only run centrally (say a service that backs the central inventory DB -- in this example us-east-1). Let's say that I don't have a license to sell things in china so I move those services to asia, but I serve static content out of china to not hit my customers with china firewall processing.
In this way, I use GSLB load balancing to find "close" entry points to the site, but once in the system, if I can't find what I need "close" I keep going UP until I find what I need. If I can't find it ANYWHERE, THEN return a NOT FOUND.
Clearly it is assumed people don't create loops in their configurations...
I believe this could be done quicker than #154 since the API wouldn't change and the semantics are the same if
parent
isn't configured. It also keeps decision logic off the client since they shouldn't care what the fallback plan should be -- they just want an answer...Looking back on my notes this is also related to #208 so referencing here for completeness...
Thoughts?