hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.31k stars 4.42k forks source link

Dynamic tags applied like health checks. #1048

Open fidian opened 9 years ago

fidian commented 9 years ago

In issue #867 I suggested an idea to make tags that depend on the result of scripts, just like health checks.

I run mongo in the cloud with multiple machines all spun up from the same image. On boot they will query for mongodb.service.consul and join the cluster. That all works flawlessly. In being a good Ops person I have a cron job that will kill random machines in my infrastructure at random times. It will eventually hit the mongodb master, the system will hiccup and a slave will be promoted automatically. Life is fantastic.

In comes Legacy Software that must connect directly to the master mongodb instance. I would like to have master.mongodb.service.consul resolve to the one IP of the master in the cluster.

Current solution (runs via cron on all machines):

  1. Get my service definition through API
  2. Check the status of the cluster. This determines if we should or should not have a tag.
  3. Determine if the service definition's tag list needs to be updated.
  4. If an update is required, POST data back to the API.

Ideal solution:

  1. Set up my service definition with dynamic tags.
  2. Write a script that returns the status of the cluster, with an exit code of 0 meaning to apply the tag.
  3. Let consul update itself automatically.

Sample JSON (one static tag, one dynamic tag):

{
    "service": {
        "name": "mongodb",
        "tags": [
            "fault-tolerant",
            {
                "name": "master",
                "script": "/usr/local/bin/mongo-is-master.sh",
                "interval": "10s"
            }
        ],
        "address": "127.0.0.1",
        "port": 8000,
        "checks": [
            {
                "script": "/usr/local/bin/mongo-health-check.sh",
                "interval": "10s"
            }
        ]
    }
}

This sort of solution could apply to issue #155 and #867, and possibly other.

ryanuber commented 9 years ago

Interesting idea. I think the work-around you mentioned is a decent way of doing this, but I'm going to leave this open as a thought ticket for now. Thanks!

Kosta-Github commented 9 years ago

@fidian with respect to your statement: "On boot they will query for mongodb.service.consul and join the cluster."

Can you describe this a bit more, since I want to setup something similar for a redis cluster. Do you use some handcrafted script (e.g., via consul-tenplate or the REST API) for querying for mongodb.service.consul to get all registered nodes for that service or are you relying on the DNS mechanism for that? At least one problem with solely relying on the DNS mechanism is, that if the node registeres itself (e.g., with registrator) within the consul cluster before it does the DNS lookup for mongodb.service.consul it might get back its own IP address, which would not be helpful to join the cluster... :-)

walrusVision commented 9 years ago

This would useful for services like zookeeper which dynamically elects a leader node among themselves every time a node joins or leaves the cluster and the leader has the setting on so that it no longer accepts client connections. Having dynamic tags like this via check would make so I could query consul for the non-leader nodes and not have a client trying to connect to the leader at all.

fidian commented 9 years ago

@Kosta-Github asked how I manage to auto cluster my mongo instances.

  1. Consul is hooked up through dnsmasq.
  2. Consul is started before mongo.
  3. The health check fails unless mongo reports success and mongo is part of a cluster. This second part is vital - the health check fails until mongo is in a cluster.
  4. The init script for mongo queries DNS for other members in the cluster. This will only report mongo instances that are already in a replica set.
    • If IPs are found, become a slave and connect to the IP that we found.
    • With no IPs, configure as a master and enable the replica set, which then makes the health check pass.

The only snag is that I must start one instance of mongo initially so it will bootstrap the replica set. Once it is running I am able to add and remove instances to my replica set.

Kosta-Github commented 9 years ago

@fidian thanks for the explanation; just one more question: how does your dnsmasq config look like? :-)

fidian commented 9 years ago

@Kosta-Github it looks like the following. I'd also answer questions off this issue. Feel free to email me directly at fidian@rumkin.com so we don't continue to pollute this thread.

server=/consul./127.0.0.1#8600
igoratencompass commented 9 years ago

+1 for this feature request

eloycoto commented 9 years ago

+1

hugochinchilla commented 9 years ago

+1

xakraz commented 9 years ago

+1

jh409 commented 9 years ago

+1

adbourne commented 9 years ago

+1

memelet commented 8 years ago

This would be very very nice. There are all kinds of things for which clients need to connect the mast expliclty. A dynamic tag would be so elegant. So much better then a bunch of add scripts to tweak tags.

danielbenzvi commented 8 years ago

+1

wyhysj commented 8 years ago

+1

123BLiN commented 8 years ago

+1 tag plus script would be very usful to implement custom DNS response logic

richard-hulm commented 8 years ago

Currently have to run two 'services' for a similar situation, have a "redis" service which includes all nodes in the cluster, then a "redis-master" service

This has the unfortunate side-effect of meaning most of the redis nodes are always 'failing' the health check because theyre not the master..

Would definitely appreciate this feature as a way around this

slackpad commented 8 years ago

Consul 0.6 added a "tag override" feature that's useful for implementing schemes like this, though the logic is run outside of Consul, not from Consul itself as suggested here. Here's the issue that brought it in https://github.com/hashicorp/consul/issues/1102.

Here's a bit of the documentation, from https://www.consul.io/docs/agent/services.html:

The enableTagOverride can optionally be specified to disable the anti-entropy feature for this service. If enableTagOverride is set to TRUE then external agents can update this service in the catalog and modify the tags. Subsequent local sync operations by this agent will ignore the updated tags.

This would let an external agent like a script working with redis-sentinel to apply the tags to the current master via Consul's catalog API.

mvanderlee commented 8 years ago

+1 Would love to see this instead of the workaround with tag overriding.

jcua commented 8 years ago

+1

PedroAlvarado commented 8 years ago

+1

onnimonni commented 8 years ago

This is brilliant idea :), I would also want this for redis cluster!

nickwales commented 8 years ago

+1 This would give us the ability to determine which application version should receive LB traffic in marathon.

rafaelcapucho commented 8 years ago

+1

tomwganem commented 8 years ago

+1

avdva commented 8 years ago

Hi, I've added support for dynamic tags here, branch dynamic-tags. If you are interested in this feature, please build and test it, any critique is appreciated. If everything is ok, I'll make a PR. The syntacs for service registration is the following:

{
    "service": {
        "name": "mongodb",
        "tags": ["tag1"],
        "dynamictags": [
            {
                "name": "master",
                "script": "/usr/local/bin/mongo-is-master.sh",
                "interval": "10s"
            }
        ],
        "address": "127.0.0.1",
        "port": 8000,
        "checks": [
            {
                "script": "/usr/local/bin/mongo-health-check.sh",
                "interval": "10s"
            }
        ]
    }
}
Techcadia commented 7 years ago

Was there ever a pull request for this topic. Still looks like something that was needed.

rhamon commented 7 years ago

+1 This is much better than the current enableTagOverride or multiple service work around imho. Please pull this!

avdva commented 7 years ago

I've mergred master branch from hashicorp/consul into my dynamic-tags branch. If you are interested in this feature, please, build and test it. We've tested it in our environment and it worked. However, I'd like to receive more feedback before I make a PR. Error reports will be highly appreciated.

rhamon commented 7 years ago

A colleague tried to build it and add to our internal debian repo but was apparently stuck in dependency hell and gave up.

Le mer. 14 déc. 2016 7:53 AM, Aleksandr Demakin notifications@github.com a écrit :

I've mergred master branch from hashicorp/consul into my dynamic-tags branch. If you are interested in this feature, please, build and test it. We've tested it in our environment and it worked. However, I'd like to receive more feedback before I make a PR. Error reports will be highly appreciated.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hashicorp/consul/issues/1048#issuecomment-267026455, or mute the thread https://github.com/notifications/unsubscribe-auth/AGq6azO9gVfYXioHfjReVPKgKmwmTnHqks5rH-bHgaJpZM4FHT43 .

andremarianiello commented 7 years ago

This feature would be great for my use case. I would really like to see this merged in eventually.

Sieabah commented 7 years ago

+1 Consul DNS even with the two service method takes 15 to 30 minutes to propagate in the UI, API, and DNS.

slackpad commented 7 years ago

@Sieabah that sounds like a function of DNS caching some place - you can adjust the TTL value to maybe improve that. The API/UI shouldn't have any delay.

Sieabah commented 7 years ago

@slackpad I have all of the DNS caching set to 0. Querying the API and ignoring the DNS takes about the same amount of resolution.

I'm sure there is something misconfigured as when I monitor the two boxes they're saying "synced service:mongo" and "synced service:primary-mongo". With the current service definition I'm able to get it to 5 minutes. During that time both services actually say they're the primary (in the UI and API) even when in the logs they switch immediately.

{
  "service": {
    "name": "primary-mongo",
    "tags": ["primary", "mongo"],
    "port": 27017,
    "check": {
      "name": "primary",
      "script": "python ~/consul_check_tags.py $(mongo --eval 'db.isMaster().ismaster' | grep 'true')",
      "interval": "5s",
      "timeout": "1s"
    }
  }
}

I've tried both reregistering via the API, reloading the config during the health check, reloading from the API. I don't know what is making it take 5 minutes to propagate to a cluster of 3 server and 2 client other than the anti-entropy timeout of syncing only every 1 minute?

slackpad commented 7 years ago

@Sieabah we have a few issues we are looking into like https://github.com/hashicorp/consul/issues/2970, but it may be worthwhile for you to open a new GH issue so we can try to track down what's happening to you. Better to do it on a different issue than this one.

caquino commented 7 years ago

:+1: this would simplify a lot of "workaround" we did to achieve this functionality to have master/slave tags

adamlc commented 6 years ago

Did we get anywhere with this? I'm looking for something similar at the moment where I have a service that has a master / slave type setup.

ramukima commented 5 years ago

Not sure if Prepared Queries can be used to apply such rules. However, dynamic tagging is a good idea. Any plans to get it in ?

drawks commented 5 years ago

This still looks like a great idea, but I see no indication of any traction to having it merged. Anyone care to give us an update?

nicholasamorim commented 5 years ago

This still looks like a great idea, but I see no indication of any traction to having it merged. Anyone care to give us an update?

ShimmerGlass commented 5 years ago

This would be useful for us as well. What are your thoughts on how to design this ? IMO a simple way would be to add a field like "output_as_tag": true to the check declaration struct. When set to true, the check output (as seen in Output in /v1/health/service/<service> query for example) would be captured and set as a tag, either on the node for a node check, or on the service for a service check. If the value change, the previously set tag would be removed and the new one be added. This tags would also be applied to sidecar services to ensure compatibility with Connect.

They are a few points to address tho :

pierresouchay commented 5 years ago

@Aestek I think this kind of stuff would be, I agree really useful. For now we have lots of services such as:

Having a way to merge those services in 1 single service and just add a tag leader would be great.

I know several systems where checks for this kind of features can also be simple HTTP checks, so limiting it to scripts is a bit less interesting. I am not convinced by scrapping the output of regular checks to get the new tags because:

The https://github.com/hashicorp/consul/issues/1048#issuecomment-247585117 looks like a sensible approach (I mean, not linked to existing checks), because:

I did not check in details what has been done in https://github.com/avdva/consul/tree/dynamic-tags but it sounds to me like the right approach. While limiting the ability to have very dynamic things, it would greatly ease implementation (by avoiding conflicts on several checks most notably)

avdva commented 5 years ago

I'll try to resurrect my branch soon, Will see, if it still works.

pierresouchay commented 4 years ago

@avdva we are really interested by this, tell us when you do so ;)

ShimmerGlass commented 4 years ago

@avdva Did you have time to resurrect your branch ? Hope it's not too complicated with all the conflict their must be since 2016

RedStalker commented 4 years ago

+1

hanshasselberg commented 4 years ago

This is an interesting idea and I could imagine us adding such feature. The best way to get it in is to create a PR so that we have something to discuss. That would also make it easier to see the impact.

exFalso commented 4 years ago

+1. I think @ShimmerGlass's suggestion is great, the tag should come from the script itself. This covers OP's use case but would solve additional ones. In our case, we have a dynamically generated ID in some of our services (the ID comes from dedicated hardware, and must be generated within the service), and it'd be great if we could propagate this ID to consul. A great way to solve this is to have a periodic script return the tag(s) to be applied.

chris93111 commented 4 years ago

+1

EmPRio93 commented 4 years ago

+1