influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.69k stars 5.59k forks source link

Update readme for ping input to clarify supported platforms/`ping` versions #5665

Closed victorhooi closed 5 years ago

victorhooi commented 5 years ago

The current ping plugin appears to depend on iputils-ping, per the README.

However, this package is Linux-specific, and not supported on any other platform (e.g. FreeBSD, Windows etc.)

Is there a specific reason we require the Linux version of ping? Can there be some mode/functionality that also works on say, FreeBSD?

(My specific use case is for use on a pfSense box, with Telegraf installed in order to provide latency stats).

sawo1337 commented 5 years ago

I use it on Windows without issues? Windows Server 2016 box.

glinton commented 5 years ago

@victorhooi What pfsense version are you using? From what I can see in pfsense 2.4.4, ping has the same options and output as telegraf expects. Did you try and it failed?

glinton commented 5 years ago

I just confirmed the ping input plugin works on pfsense 2.4.4. Maybe we need to update the readme

victorhooi commented 5 years ago

Oh - that's fantastic news!

Sorry, yes, I was going off the Telegraf README which seemed to suggest I needed the Linux-only version of ping. (I was actually curious why this was).

Are you able to share your pfSense telegraf config? I can test it on one of my instances today.

glinton commented 5 years ago

I just ran the ping plugin on it, so it's purely a POC config, nothing useful whatsoever:

[agent]
  interval="1s"
  flush_interval="1s"
  omit_hostname=true

[[inputs.ping]]
  urls = ["someurl.com"]
  count = 3

[[outputs.file]]
  files = ["stdout"]
victorhooi commented 5 years ago

I added the following as a custom directive for Telegraf in pfSense:

[[inputs.ping]]
  urls = ["example.org"]
  count = 3

(My Telegraf package is already configured to output to InfluxDB, and I can confirm that works. Fro the agent config, I believe pfSense Telegraf already defaults to an interval of 1.0 second, and I think ping should still work with the default flush_interval, and with setting hostnames?)

However, when I check InfluxDB, the ping plugin only seems to return a single field (result_code):

> select * FROM ping
name: ping
time                host                   result_code url
----                ----                   ----------- ---
1554363432000000000 ang-router.localdomain 0           example.org
1554363440000000000 ang-router.localdomain 0           example.org
1554363450000000000 ang-router.localdomain 0           example.org
1554363460000000000 ang-router.localdomain 0           example.org
1554363470000000000 ang-router.localdomain 0           example.org
1554363480000000000 ang-router.localdomain 0           example.org
1554363490000000000 ang-router.localdomain 0           example.org
1554363500000000000 ang-router.localdomain 0           example.org
1554363510000000000 ang-router.localdomain 0           example.org
1554363520000000000 ang-router.localdomain 0           example.org
1554363530000000000 ang-router.localdomain 0           example.org
1554363540000000000 ang-router.localdomain 0           example.org
1554363550000000000 ang-router.localdomain 0           example.org
1554363560000000000 ang-router.localdomain 0           example.org
1554363570000000000 ang-router.localdomain 0           example.org
1554363580000000000 ang-router.localdomain 0           example.org

Why is it not writing the other fields? (e.g. packets_transmitted, packets_received, percent_packets_loss etc.)

Super confused...

victorhooi commented 5 years ago

Looking at https://github.com/influxdata/telegraf/issues/4613 - could it be that the output format is different somehow?

victorhooi commented 5 years ago

I saw this post and there was a suggestion to try the following command-line:

ping -c 8 -n -s 16 -i 1.0 -W 1.0 8.8.8.8

I ran this on a Ubuntu host:

victorhooi@unifi-monitoring:~$ ping -c 8 -n -s 16 -i 1.0 -W 1.0 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 16(44) bytes of data.
24 bytes from 8.8.8.8: icmp_seq=1 ttl=59 time=0.965 ms
24 bytes from 8.8.8.8: icmp_seq=2 ttl=59 time=0.567 ms
24 bytes from 8.8.8.8: icmp_seq=3 ttl=59 time=0.575 ms
24 bytes from 8.8.8.8: icmp_seq=4 ttl=59 time=0.527 ms
24 bytes from 8.8.8.8: icmp_seq=5 ttl=59 time=0.659 ms
24 bytes from 8.8.8.8: icmp_seq=6 ttl=59 time=0.789 ms
24 bytes from 8.8.8.8: icmp_seq=7 ttl=59 time=0.677 ms
24 bytes from 8.8.8.8: icmp_seq=8 ttl=59 time=0.743 ms

--- 8.8.8.8 ping statistics ---
8 packets transmitted, 8 received, 0% packet loss, time 142ms
rtt min/avg/max/mdev = 0.527/0.687/0.965/0.138 ms

I then ran this on pfSense:

[2.4.4-RELEASE][admin@ang-router.localdomain]/root: ping -c 8 -n -s 16 -i 1.0 -W 1.0 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 16 data bytes

--- 8.8.8.8 ping statistics ---
8 packets transmitted, 8 packets received, 0.0% packet loss, 8 packets out of wait time
round-trip min/avg/max/stddev = 11.254/23.166/41.044/10.736 ms

Does that tell us anything useful?

Also, might be nice to add to ping plugin README what the default command-line it runs under the hood is, if that helps diagnosis.

glinton commented 5 years ago

How did you install telegraf on your pfsense box? The builtin package manger version is 1.6.3. I'm going to assume it's that version, as 1.10.x works fine. Until that gets updated, you can (and should) manually install a newer version. As simply as dropping the newer binary over top of the old one.

victorhooi commented 5 years ago

On FreeBSD, "-W" sets the timeout in milliseconds .

On Linux, "-W" sets the timeout in seconds.

https://unix.stackexchange.com/questions/63651/what-is-the-difference-between-ping-w-and-ping-w

Also, I realised you can run telegraf with --test on the pfSense box to help debug some issues:

/usr/local/bin/telegraf -config=/usr/local/etc/telegraf.conf --test

Anyhow, I try to set the ping timeout in my telegraf.conf:

[[inputs.ping]]^M
  urls = ["example.org"]^M
  count = 3
  timeout = 1000.0

but I get an error:

* Plugin: inputs.ping, Collection 1
2019-04-04T17:35:07Z E! Error in plugin [inputs.ping]: host example.org: ping: illegal option -- w
usage: ping [-AaDdfnoQqRrv] [-c count] [-G sweepmaxsize] [-g sweepminsize]
            [-h sweepincrsize] [-i wait] [-l preload] [-M mask | time] [-m ttl]
            [-P policy] [-p pattern] [-S src_addr] [-s packetsize] [-t timeout]
            [-W waittime] [-z tos] host
       ping [-AaDdfLnoQqRrv] [-c count] [-I iface] [-i wait] [-l preload]
            [-M mask | time] [-m ttl] [-P policy] [-p pattern] [-S src_addr]
            [-s packetsize] [-T ttl] [-t timeout] [-W waittime]
            [-z tos] mcast-group, exit status 64
> ping,host=ang-router.localdomain,url=example.org result_code=0i 1554399307000000000

I thought timeout was meant to use -W (uppercase), not -w (lowercase)?

Sorry for the number of comments, trying to include all the steps taken and information as I try to troubleshoot this myself, in case it helps.

victorhooi commented 5 years ago

Ah yes - I am using the inbuilt pfSense package, which is indeed 1.6.3. This is running on a Netgate XG-7100 (amd64).

I downloaded the latest 1.10.2 binary from here using curl.

I had to stop the Telegraf service on pfSense, as FreeBSD complained about /usr/local/bin/telegraf being busy:

[2.4.4-RELEASE][admin@ang-router.localdomain]/usr/local/bin: cp /tmp/telegraf/usr/bin/telegraf .
cp: ./telegraf: Text file busy

I then restarted the service, and can confirm it all works now!

Was is just a bug with the ping plugin in the older Telegraf?

Also - I noticed there's no arm package for FreeBSD on releases. Is that intentional? (This is for devices like the Netgate SG-3100, which I think is arm64 - there's this PR for pfSense to build the package, but if Influx also provides a binary, it means I can drop-in replace as I did above).

glinton commented 5 years ago

no arm package for FreeBSD on releases. Is that intentional?

I'm not certain, it could be though, due to lack of demand..

danielnelson commented 5 years ago

You may want to consider using the packages from FreeBSD ports https://www.freshports.org/net-mgmt/telegraf/

victorhooi commented 5 years ago

I believe pfSense pulls from FreeBSD ports - but they're usually delayed by a few months (or longer, in some cases, I believe).

Using the drop-in replacement binary, as @glinton suggested above worked well on the Netgate XG-7100 (x64) based hardware.

Netgate also make several ARM-based devices. Would be super useful if there were arm64 binaries available as well, to use whilst we waited for pfSense to update their packages.

glinton commented 5 years ago

You have to configure pfsense to pull from the ports. If configuration isn't for you, you can manually add the package using:

pkg add http://pkg0.cyb.freebsd.org/FreeBSD:11:amd64/latest/All/telegraf-1.10.1.txz
danielnelson commented 5 years ago

I updated the documentation to be less confusing. 90593a07b87b06e03b85c6a2a5879ae957006ad1

@girgen Maybe we could add arm support to the build.py file. I looked at this patch but I must be missing something, it doesn't seem like this is enough to support tgz and the right ARM flags. Is there an additional patch?

glinton commented 5 years ago

Closed in https://github.com/influxdata/telegraf/commit/90593a07b87b06e03b85c6a2a5879ae957006ad1

girgen commented 5 years ago

@girgen Maybe we could add arm support to the build.py file. I looked at this patch but I must be missing something, it doesn't seem like this is enough to support tgz and the right ARM flags. Is there an additional patch?

Yes, you also need to copy the files according to https://svnweb.freebsd.org/ports/head/net-mgmt/telegraf/Makefile?r1=485905&r2=490433

cp src/github.com/shirou/gopsutil/disk/disk_freebsd_386.go  \
     src/github.com/shirou/gopsutil/disk/disk_freebsd_arm.go
cp src/github.com/shirou/gopsutil/cpu/cpu_freebsd_386.go  \
     src/github.com/shirou/gopsutil/cpu/cpu_freebsd_arm.go
danielnelson commented 5 years ago

I'm confused about the patch to build.py specifically, if I make the same changes and run it like:

./scripts/build.py --package --platform=freebsd --arch=all
...
[ERROR] build: Invalid ARM architecture specified: armv6
[ERROR] build: Please specify either 'armel', 'armhf', or 'arm64'.

Now looking at your Makefile I think perhaps you aren't using build.py at all, so perhaps this patch isn't needed. Also, in Telegraf 1.10 and later those two files should already be included in gopsutil.

danielnelson commented 5 years ago

@victorhooi Could you create a new feature request issue for FreeBSD arm package?

girgen commented 5 years ago

The two "addtional" files, created by the cp commands, are necessary. I cannot say from the top of my head if the build.py patch really makes a difference.

victorhooi commented 5 years ago

Done - created FR https://github.com/influxdata/telegraf/issues/5714 to add ARM binaries for FreeBSD