influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.63k stars 5.58k forks source link

Broken core product promise: missing documentation how to setup more than one input stream #16004

Open wolfgangr opened 3 weeks ago

wolfgangr commented 3 weeks ago

Use Case

e.g. my home automation scenario on a farm:

Expected behavior

After reading the documentation, I can install and configure telegraf to deliver its promised functions

Actual behavior

Additional info

Thank you, friends, for delivering this maybe great product. :bow: But - sorry to say - imho largely inappropriate documentation (really sorry) is prone to spoil all your diligent work. :sweat:
Please allow me some humble feedback to assist you in improving it.

Quick fix ahead: :pray: Put a "further reading"-Link to this blog post onto every documentation page that contains anything related to "config*" https://www.influxdata.com/blog/telegraf-best-practices/

Make clear in the documentation

"somewhere on the internet" I remember a chart with input / processing / aggregation / output streams, but even searching twice, I can't find it in the documentation any more. Would be a good start, though.

I'd consider moving /lib/systemd/system/telegraf.service to /etc/systemd/system, since that where I'd expect service files intended to be fiddled with. Anything in /lib/.. imho commonly is considered "only to be fiddled with by brave developers on a testing environment"

In particular, I found this section completely misleading https://docs.influxdata.com/telegraf/v1/configuration/#telegraf-doesnt-support-partial-configurations

Putting [global tags] into every config file (as one might infer from "valid configuration"), yields the misleading W! Overlapping settings in multiple agent tables are not supported: may cause undefined behavior

This is short to end up in "Following the official documentation causes undefined behaviour". Not really what I expect for a celebrated product... :face_with_spiral_eyes:

Searching the internet for e.g. this message or for "telegraf multiple config files" yield loads of other victims crying for the same problem. Most of them only partially answered - or not at all. Some gracefully "send my your configs and I, the Master, will savior you". Well, fine, for one.

Only after hours of searching, dozens of config changes and restart, buckets & api-tokens deleted and recreated, short of thrashing your great piece of work and reverting to good old PERL demons, I luckily encontered said "best practice" article.

So why not teaching telegraf users how to do fishing instead giving them a piece of fish?

srebhan commented 3 weeks ago

@wolfgangr thanks for your honest feedback! We are well aware that we are weak on the documentation side. Any help there is appreciated as developers are often not able to take a (novice) user's view on things.

This being said, I'm currently discussing with our docs team on how we can auto-generate the website documentation from what we have on GitHub to avoid misalignment and make the repository the single source of truth. This will probably take a while and the documentation structure might change a few times (split markdown files, add missing information in plugins, separate out developers docs etc.). However we will try to preserve the content.

So if you want to contribute and don't mind we are moving stuff around, please feel more than welcome to improve what we have!

wolfgangr commented 2 weeks ago

This sounds like an invitation, eh? :thinking:

developers are often not able to take a (novice) user's view on things.

I can feel with you. Ages ago, when I was engaged in IT trainig, I recieved best marks when as a trainer I was just one lessen ahead of my clients :see_no_evil:

So I've learned that I'm not the best trainer, too. While I might urge myself to novices view, there's no intrinsic disposition to do so, like I've seen on "borne" trainers. I'd rather jump up the learning curve and quickly leave others behind.

So for the moment, I'll simply trace my learning here as an example where I found the core hurdles - and solutions. Maybe it can serve as a starting point / collection bin for skd of "description of big picture" which I feel missing.

wolfgangr commented 2 weeks ago

Next thing after said "best practice" blogpost I encountered with "heureka" feelings was the genuine source documentation:

https://github.com/influxdata/telegraf/tree/release-1.32/docs
in particular
https://github.com/influxdata/telegraf/blob/release-1.32/docs/CONFIGURATION.md

There is also an ASCII template for the "big picture" I've mentioned
https://github.com/influxdata/telegraf/blob/release-1.32/docs/AGGREGATORS_AND_PROCESSORS.md and even some idea of routing glimpses through the text

Up to that finding, I (misleadingly?) considered https://docs.influxdata.com/telegraf/v1/
as the authoritative source.

So may be, just pointing to the genuine / authoritative github docu in the footer of autogenerated (?) ("bee and flower"-type?) manuals might ease the jump up the learning ladder?

wolfgangr commented 2 weeks ago

Recent "heureka" out of the "config syntax quagmire" I found this morning:

https://github.com/influxdata/telegraf/blob/release-1.32/docs/TOML.md
https://toml.io/en/v1.0.0

And to get a live feeling while playing with configs, I installed a command line TOML parser ..... apt-get install yq

Whit that I can e.g. print a logical (JSON ?) representation of my config,

{
  "outputs": {
    "influxdb_v2": [
      {
        "urls": [
          "${INFLUX_URL}"
        ],
        "token": "foo-bar-very-secret-asdf==",
        "organization": "${INFLUX_ORG}",
        "log_level": "debug",
        "bucket": "lost_and_found",
        "bucket_tag": "target_bucket",
        "exclude_bucket_tag": false
      }
    ]
  }
}

I also may strip it down to skd. of minmimalistic (mybe even canonical?) TOML
So I can keep all the comments in my configs and easily zoom out to the forest without dropping all the trees.

$ tomlq  -t . generic_influx2_out.conf
[outputs]
[[outputs.influxdb_v2]]
urls = [ "${INFLUX_URL}",]
token = "Vo9lLij71a7ZdEqggk6TfyCiZ9F_RmWybm527a-vj1_bCfaZ8RZNvvGFPMiqVM2B801C9UlgI5x8k23dmVAHwQ=="
organization = "${INFLUX_ORG}"
log_level = "debug"
bucket = "lost_and_found"
bucket_tag = "target_bucket"
exclude_bucket_tag = false
wolfgangr commented 2 weeks ago

Next thing I'd look for is a comprehensive guide to routing.
Is there any such thing around already?
If not, what were the proper place and format to create one?
Could I expect assistance for answering questions and proof reading?
@srebhan , does this sound like a "may be yes" ?

The (obvious?) picture in mind when opting for the telegraf/influx/grafana chain was

At the top (where data comes from) and middle (storage = Influx) to bottom (grafana charting, influx UI and Flux for casual queries) the picture gets shape.

The opaque section is how the variety of sources captured by telegraf find their way through filters and aggregates to telegraf buckets.

Can I use the namedrop and tag routing as mentioned here
https://www.influxdata.com/blog/telegraf-best-practices/
extend throughout all the chain?
Particularly in filters and aggregates, too?

wolfgangr commented 2 weeks ago

Is this

cat *.conf | tomlq .
cat *.conf | tomlq . -t

what the clause

"effective configuration is the union of all the files"

boils down to?

So to understand how config pieces are merged, we search the web for " merge TOML", OK?

As far as I could figure out now

srebhan commented 2 weeks ago

@wolfgangr sorry for the late reply, but even Telegraf maintainers need some rest on weekends. ;-)

My initial post was meant as an invite. ;-) So yes, I would love to see a "Getting started" type of guide in our docs including the schematic interplay of plugins, description of differences between "listener"/"consumer" type of event-driven plugins and "normal" plugins that query on a fixed interval, routing, filtering etc with some pointers to additional documentation.

As some posts of you piled up, could you collect your questions and post them here or in the #telegraf channel on our Slack?