hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.25k stars 4.41k forks source link

"Leader service" with DNS support #1584

Open omallo opened 8 years ago

omallo commented 8 years ago

A very nice feature of Consul is the load balancing provided by the DNS service when looking up a service by hostname if there are multiple instances of a service. However, if you have multiple instances of a service but at any given time you only want one of them to be serving requests, this seems not to be supported by Consul out of the box. You could certainly implement this functionality using the leader election mechanism outlined here but I think that it would be an interesting option to support this through Consul's service mechanism including DNS support, maybe something like the following:

Such a mechanism would make it possible to get automatic failover for a service.

Would something like this make sense?

slackpad commented 8 years ago

This is closely related to the question here - https://groups.google.com/d/msg/consul-tool/UO_va4iwpuk/TxKiJr8EEgAJ.

I'm not sure we'd want to actually do the leader election, but it would be interesting to add a feature where you could get the node (or set of nodes) holding a specific lock (or semaphore). This might be something we could do using prepared queries.

arodd commented 8 years ago

It would be nice to limit the number of nodes returned in a prepared query such that it only ever returns one result and consistently the same result unless that node goes down.

LordFPL commented 8 years ago

+1

sirajmansour-infotrack commented 7 years ago

Cool request, i have a use case for that too !

keepwatch commented 7 years ago

Yeah, for my simple use cases, I'd love a deterministic DNS result except if health status changes. This would be right up that alley.

olliebrennan commented 6 years ago

Just thought I would chime in here. The ability to limit a prepared query (via DNS) to a set number of nodes would be super useful for my use case too.

p0pr0ck5 commented 4 years ago

I wrote a small tool that provides this sort of mechanism; it watches Consul health checks for a given service and returns one and only one record for that service. If the health index changes and the previously 'active' service is no longer healthy, it relies on a subsequent service registration address to respond to queries. https://github.com/p0pr0ck5/hobson

victor-sudakov commented 4 years ago

Maybe something like the following:

A backup service is configured with a special "priority" or "backup" parameter. If there are other healthy nodes with the priority value higher than that of the service on this node, this node's IP address is excluded from DNS responses for the service.

victor-sudakov commented 4 years ago

I wrote a small tool that provides this sort of mechanism; it watches Consul health checks for a given service and returns one and only one record for that service. If the health index changes and the previously 'active' service is no longer healthy, it relies on a subsequent service registration address to respond to queries. https://github.com/p0pr0ck5/hobson

How do you configure which particular IP address is returned out of several available?

mpilone commented 4 years ago

I did something similar using the Consul API directly and hiding that behind a custom DnsResolver in my code. It supports a "leader preference" of NONE, PREFERRED, or REQUIRED. Obviously it only works with tools that can easily use a custom DNS resolution mechanism so having this built into Consul would be nice.

For the "required" case where you only want the leader and nothing else will do, you could use a tag and then do a DNS lookup on the service and tag name. You'd have to dynamically update the service tags when leader status changes, but that should be straight forward.

private List<CatalogService> resolveToService(String service) {
    List<CatalogService> services = consul.catalogClient().getService(service)
        .getResponse()
        .stream()
        .filter(s -> datacenter == null || datacenter.equals(s.getDatacenter().orElse(null)))
        .collect(Collectors.toList());

    // NONE: return as is.
    if (leaderPreference == LeaderPreference.NONE) {
      return services;
    }

    // Now we know we need leader information so we look it up.
    String leaderServiceId = resolveLeader(service);

    // PREFERRED: sort leader to the front
    if (leaderPreference == LeaderPreference.PREFERRED) {
      return services.stream()
          .sorted((s1, s2) -> {
            if (s1.getServiceId().equals(leaderServiceId)) {
              return -1;
            }
            else if (s2.getServiceId().equals(leaderServiceId)) {
              return 1;
            }
            else {
              return 0;
            }
          })
          .collect(Collectors.toList());
    }

    // REQUIRED: keep only the leader
    return services.stream()
        .filter(s -> s.getServiceId().equals(leaderServiceId))
        .collect(Collectors.toList());
  }

  private String resolveLeader(String service) {
    String key = String.format("/service/%s/leader", service);

    // Lookup the key.
    Optional<Value> opValue = consul.keyValueClient().getValue(key);
    if (opValue.isEmpty()) {
      return null;
    }

    // Get the session holding the lock.
    Optional<String> opSession = opValue.get().getSession();
    if (opSession.isEmpty()) {
      return null;
    }

    // Get the service ID from the value.
    Optional<String> opValueString = opValue.get().getValueAsString();
    if (opValueString.isEmpty() || opValueString.get().isBlank()) {
      return null;
    }

    return opValueString.get();
  }
mkeeler commented 4 years ago

Consul's DNS can already support at least some of the use cases mentioned in this issue with Tags.

If you have a service definition like so.

{
   "service": {
      "name": "web",
      "port": "1234",
      "address": "198.18.0.1"
   }
}

Then on the leader server you would want to also include something like an active tag like so:

{
   "service": {
      "name": "web",
      "port": "1234",
      "address": "198.18.0.1",
      "tags": ["active"]
   }
}

Then a DNS query for web.service.consul would give you all web services but active.web.service.consul would give you only those instances with the active tag set. Then it becomes up to you to decide which service should have the tag or not.

This doesn't do it automatically and there is certainly the potential for improvement there.

victor-sudakov commented 4 years ago

Then a DNS query for web.service.consul would give you all web services but active.web.service.consul would give you only those instances with the active tag set. Then it becomes up to you to decide which service should have the tag or not.

This doesn't do it automatically and there is certainly the potential for improvement there.

Sorry but this is a different matter. Imagine that 198.18.0.1 goes down. Then there will be no active.web.service.consul instance at all. No other backup node will assume its place.

mpilone commented 4 years ago

I believe the idea is that you would manage the "active" tag on the services. For example, when a service is elected leader, it adds that tag to itself. If it goes down, within a few seconds some other service will elect itself as leader to replace the downed service and it will add the tag to itself. So you should always have a tagged "active" service as long as some service has elected itself as leader, however you decide to manage that in your system (usually with a session lock on a /leader key).

Basically tags give you a way to limit DNS results, but it is up to you to manage those tags and either assign them to one service (leader) or a group of services depending on your needs. You may have to dynamically update the tags as the state of the system changes (new leader, service health changes, etc).

victor-sudakov commented 4 years ago

I believe the idea is that you would manage the "active" tag on the services. For example, when a service is elected leader, it adds that tag to itself.

Certainly, when you have a service smart enough to manipulate tags (Patroni is a good example) you don't need the feature we are discussing here. The feature we are discussing is meant more for dumb services which may even not know about Consul.

Just like in HAProxy or nginx, you can mark a backend or upstream as "backup" without the backend even knowing about it.

mkeeler commented 4 years ago

@victor-sudakov Consul does not currently have anything to do automatic failover although it does sound like that could be useful. Prepared queries can failover to service instances in another datacenter but not a secondary service within the same DC.

arodd commented 4 years ago

The idea of returning the results based on the index seems like it would be useful for these scenarios. Similar to a prepared query having the near parameter. An age parameter or similar with oldest or newest flags to return results sorted by the oldest or newest instance of a service.

victor-sudakov commented 4 years ago

The idea of returning the results based on the index seems like it would be useful for these scenarios. Similar to a prepared query having the near parameter. An age parameter or similar with oldest or newest flags to return results sorted by the oldest or newest instance of a service.

There can be different approaches but I think we all agree that some mechanism of filtering and ordering of DNS responses is desirable.

engel75 commented 3 years ago

Would my feature request #9780 work as well?