Open pauldix opened 8 years ago
:+1:
Hello @pauldix, What you suggest is, I think, what I tried to explain here: https://github.com/influxdb/telegraf/issues/193#issuecomment-140688471 Prometheus supports a wide range of discovery (consul included). I'm personally interested in kubernetes discovery.
@pauldix @rvrignaud see PR about etcd here : https://github.com/influxdata/telegraf/pull/651
Hi @titilambert, your PR is really useful to update telegraf configuration dynamically, such as changing input and outputs configurations from time to time, but for service discovery in a system such as AWS, mesos or kubernetes where things scale dynamically, something like the service discovery features implemented in prometheus would be really great.
@rvrignaud explanation is here, and the prometheus documentation shows the different possibilities supported.
Having this feature would definitively make me move to influxdb, but keep using the prometheus instrumentation library.
@chris-zen that's very interesting ! I'm agree with you, I would love to see that, but this kind of service discovery is more for scheduled (polling) monitoring systems (like Prometheus), isn't it ? I dont know if a decentralized (pushing) system like Telegraf is adapted to this...
What do you think about?
Yes, agree that it is specially important for polling. But telegraf is already supporting polling inputs such as the one for prometheus. Right now the prometheus input only allows static config, but it would be very useful to support service discovery too. My understanding is that telegraf is quite versatile and allows both pull and push models, but the pull model without service discovery is worthless in such dynamic environments.
Just dropping this here for reference on what I think is a good service discovery model (from prometheus): https://prometheus.io/blog/2015/06/01/advanced-service-discovery/. Same as mentioned above but I think this blog post is a little more approachable than their documentation.
I think that the "file-based" custom service discovery will be easy to implement. Doing DNS-SERV, Consul, etc. will take a bit more work, but certainly doable.
I'm imagining some sort of plugin system for these, where notifications on config changes and additions could be sent down a channel, and whenever Telegraf detects one of these it would apply and reload the configuration.
My preference would be to start with a simple file & directory service discovery. This would be an inotify goroutine that would basically send a service reload (SIGHUP) to the process when it detects a change in any config file, or any config file added or removed to a config directory.
This could be extended using https://github.com/docker/libkv or something similar that would launch a goroutine that would overwrite the on-disk config file(s) when it detects a change (basically a very simple version of confd)
This would solve some of the issues that I have (and that @johnrengelman and @balboah raised) with integrating with a kv-store. In essence, we wouldn't be dependent on a kv-store, and we wouldn't have any confusion over the currently-loaded config, because the config would always also be on-disk.
curious what others think of this design, I'm biased but this is my view:
pros:
cons:
I like it. That was one thing that used to be tricky with Redis. You could make commands to alter the running config, but then if you restarted your server without updating the on disk config then you're hosed.
File write isn't a big deal. Not like they're going to be updating the config multiple times a second, minute, or even hour.
@pauldix You might be updating your config multiple times per hour and up if you are in a highly dynamic environment, like an AWS Autoscaling Group or a Docker Swarm/Kubernetes/fleetd/LXD container thingie. But even then, @sparrc 's proposed implementation sounds very good, combining flexibility with resiliency (you aren't depending on your KV/network always being up). +1
Hi guys, any updates with this monitoring methodology? My company starts to implement mesos and marathon as scheduler and we find the services monitoring (mysql,es etc.) very difficult with the current telegraf monitoring architecture and it seems that the only way right now is use Prometheus as you mentioned above because of the support in dynamic SD monitoring.
@sparrc can you please share the current state design?
Thanks
Hi to everybody , I'm new to this discussion and I would like to add my Point of view.
Everybody knows how important is now add ability to our agents to get configuration and discover configuration change from a centralized configuration system on our systems.
As I have been read in this thread ( and others https://github.com/influxdata/telegraf/pull/651) , there diferent ways to got remote configuration.
https://github.com/docker/libkv ( for etcd or other KV store backends) https://github.com/spf13/viper ( for remote config storage)
Any way the most important thing ( IMHO ) is add the ability to manage easily changes on all our distributed agents. I think when there is not any available solution the easiest way should be the best. So I did yesterday a really simple proposal on https://github.com/influxdata/telegraf/issues/1496, that could be easily coded in a few lines of code. ( the same behaviour if you can switch to the https://github.com/spf13/viper library).
Once added this simple feature , we'll can continue discussion on other more sophisticated way to get configurations and integration with know centralized systems. ( like etcd, and others).
I vote for add first a simple centralized way and after an integrated solution. Both will cover the same functionality on different scenarios.
what do you think about?
@toni-moreno the most simple way to manage it is via files. Although the http getting might be simple for your scenario, I can imagine ways in which it can get complicated (just see the httpjson plugin for examples). Like I said, this feature needs to first be coded as a file watcher and then we can develop plugins around changing the on-disk file(s).
There is one commonly used abstraction pattern available, the only thing what would be needed is hot config reloading:
https://github.com/kelseyhightower/confd/ is a single binary which watches any (many) kind(s) of backend(s) and templates the configuration file upon detected changes.
I'm about to implement something for rancher catalogue items. https://github.com/influxdata/influxdata-docker/pull/9 is related.
The pattern is rather simple to manage with sidekicks and shared volumes.
One step further:
@sparrc I think this is almost a no brainer, as only the signalling to the telegraf process would need some extra thought, the rest is taken care of.
the signaling would simply be the file changing on disk, there is no need for confd to directly signal to Telegraf as far as I understand it.
Absolutely right.
@sparrc Hi sparrc, any new updates on this?
Hi guys, very interesting discussion, I'm totally agree with having telegraf 'separate' of etcd/viper/etc, however it needs somehow track any file changes performed for those apps, and being able to apply those changes 'on-the-fly'.
Does anyone knows if this is going to be the way to go, and how is going to be implemented?
@3fr61n, yes, the initial implementation will be a file/directory watcher that will be able to dynamically reload the configuration any time that the file(s) change.
I'm not sure the "how" yet, maybe this: https://github.com/fsnotify/fsnotify
Any updates on this feature? What is the recommended way to do service/dynamic config discovery today? I'm curious what third-party solutions people using if this is not natively supported?
any update?
This is something I am working on. The current plan is similar to described above, but instead of using inotify it will continue to require a signal to trigger config changes. Once this is done we should be able to work on creating more elaborate configuration plugins.
Hi
In the consul branch we have some proof of concept
Each time any kv is modified on consul all container are notified then render their config templates and reload their processes (not the contairner)
We are still on beta testing, because it's a huge change compare with the actual infrastructure
If you want to test it, fell free to use it
Sent from my iPhone
On 11 Jul 2017, at 20:47, Daniel Nelson notifications@github.com wrote:
This is something I am working on. The current plan is similar to described above, but instead of using inotify it will continue to require a signal to trigger config changes. Once this is done we should be able to work on creating more elaborate configuration plugins.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
This POC includes dynamic management of input plugin configurations; however, it went a different route than using KV stores and service discovery. I just wanted to share since the agent changes might be helpful in part or as a whole for a service discovery approach.
The README on that POC include a demo write-up of how the "managed input" concept works in practice.
Thanks @itzg, I'll take a look at it. @3fr61n can you link to the code you are referring too?
This would be incredibly useful to have for selfish reasons.
Prometheus kubernetes discovery using annotations is pure gold. I would love to have this in telegraf.
https://github.com/prometheus/prometheus/tree/master/discovery/kubernetes
any update?
I hope to have a pull request up soon for further discussion, it will contain a configuration plugin system in a style similar to the current input/output plugins.
@abraithwaite Can you take a look at the kubernetes_services
option we added to the prometheus input and see if it works for your use case, it is only on the master branch but you can use the nightly builds.
Unfortunately not. The value that prometheus provides with Kubernetes is that you configure metrics collection via the service (with kubernetes annotations) and not through the metrics collection agent.
This enables users to configure everything they need without having to setup something outside the scope of their own services.
I can provide examples if needed, just lemme know.
@abraithwaite can you link me to the Kubernetes documentation for the method you are using?
Haven't seen any official documentation, actually. Just pieced it together from code, examples and blog posts:
https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml https://coreos.com/blog/prometheus-and-kubernetes-up-and-running.html https://github.com/prometheus/prometheus/issues/2989 https://github.com/prometheus/prometheus/issues/2009 https://movio.co/en/blog/prometheus-service-discovery-kubernetes/
FWIW, I don't use prometheus with Kubernetes but the concept is extremely valuable and I'd still love to see it here.
I looked at the telegraf code though and I'm certain you'd need to add service discovery as a first class configuration method.
Just to clarify, the kubernetes_services
option allows you to use the Kubernetes DNS cluster add-on to find and scrape prometheus endpoints without needing to update your Telegraf configuration file when a service is started/stopped.
Right, I understand that. It still requires an explicit dependency between the service and telegraf, instead of an implicit one.
When using annotations, there is no PR a user has to make to update the telegraf config in order to start getting metrics from their service collected.
I can agree that "Prometheus kubernetes discovery using annotations is pure gold. I would love to have this in telegraf." We use this to have prometheus dynamically find new targets. Would love to move back to telegraf for collection of metrics and uptime if this was supported.
Hi, I'm pretty new to the TICK stack and getting used to this. We are trying to setup the TICK stack as the monitoring platform for our organization. One question that has been pounding up is on how we manage the configurations - for instance if we need to monitor one service/process on a server we would have to make changes to the config on the server and restart telegraf. On doing some research I found this page and I think I'm posting my concern on the right place. Do we have a working model to manage configuration centrally?
@narayanprabhu I use Puppet to ease that kind of pain. It knows all the services that are “ensured” on each server, and that makes it easier to deploy a matching Telegraf config.
Sent with GitHawk
@voiprodrigo Yes puppet is a good option, unfortunately my organization does not have that solution. They mainly rely on SCCM for windows deployment and Ansible for the linux. This thread says that there is a UI option being built for chronograf to manage agent configs, is that option still being built. Wondering if that is coming up anytime soon?
And there is something about etcd where we can have one config consumed by other telegraf agents - is this some option that would help out my use case. Is this something that works for windows as well?
@danielnelson any update?
Work is on hold right now (for the first item here), but I'm tempted to break this issue up into several issues:
in influxdb 2.0 alpha version, it has telegraf config generation ui. and telegraf was guided to take config from influxdb. but influxdb seems to have no edit config
feature yet.
so, here's question, do telegraf have any plan to synchronize config from influxdb 2.0?
Hello, so reloading (file) config without restart has not been implemented yet? It's a pitty. @blaggacao haven't you mentioned it "almost a no brainer"?
I'd like to use telegraf with a sidecar creating the configs...
EDIT: seems like there is a --watch-config
, will try that immediately
If there's some standard service discovery to connect to like Consul, it would be cool to have Telegraf connect to that and automatically start collecting data for services that Telegraf supports.
So when a new MySQL server comes on, Telegraf will automatically start collecting data from it.
Just an idea. Users could also get this by just having Telegraf part of their deploys when they create new servers.