influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.62k stars 5.58k forks source link

Freebsd 14 telegraf inputs.wireguard is silent. #14397

Open bunnyevans opened 10 months ago

bunnyevans commented 10 months ago

Relevant telegraf.conf

# 3------1---------2---------3---------4---------5---------6---------7---------8
# 2023.12.07 test
# 3------1---------2---------3---------4---------5---------6---------7---------8
#
[[inputs.wireguard]]
  # does this work?
  devices = ["wg0", "wg1"]

Logs from Telegraf

2023-12-07T05:01:28Z I! Starting Telegraf 1.21.4

System info

Telegraf 1.21.4, freebsd, FreeBSD 14.0-RELEASE

Docker

No response

Steps to reproduce

  1. create simple config based on: https://github.com/influxdata/telegraf/tree/release-1.21/plugins/inputs/wireguard

  2. , above config is in telegraf.conf.TEST here.

  3. /usr/local/bin/telegraf --config=/usr/local/etc/telegraf.conf --config-directory=/usr/local/etc/telegraf.conf.TEST --test --debug

  4. ...

Expected behavior

results of some sort as per the documentation.

Actual behavior

silence

Additional info

Both wg0 and wg1 exist, but even removing the "devices =" line produces only silence.

powersj commented 10 months ago

Hi,

There are some troubleshooting steps in the plugin readme, specifically:

1) making sure that telegraf has the right capabilities 2) ensuring wg show shows the devices you think are there`

Can you provide the output of the above command?

There does appear to be a scenario where if enumerate devices from the library we use returns no devices, that there would be no errors and as a result no output. That may be happening here.

bwcarp commented 10 months ago

I decided to try reproducing this, and I got a message after installing wireguard that it was only compatible with FreeBSD 12 and about to be removed. It recommended wireguard-go which I believe works quite differently. Which of these are you running on FreeBSD 14?

Edit: I see, you get bad error messages trying either because it's native to the kernel now and all you need is wireguard-tools. Removing foot from mouth and testing now that I have a wireguard connection.

bwcarp commented 10 months ago

Turning on debug, I get this message:

2023-12-21T02:57:30Z W! [inputs.wireguard] No Wireguard device found with name wg0

The setcap instruction isn't relevant for FreeBSD. I decided to make sure the telegraf user could look into this.

# sudo -u telegraf ifconfig -v wg0
wg0: flags=10080c1<UP,RUNNING,NOARP,MULTICAST,LOWER_UP> metric 0 mtu 1420
        options=80000<LINKSTATE>
        inet 10.8.0.5 netmask 0xffffffff
        groups: wg
        nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD>
# sudo -u telegraf wg show
interface: wg0
  public key: [redacted]
  listening port: 22979

peer: [redacted]
  endpoint: [redacted]
  allowed ips: 0.0.0.0/0
  latest handshake: 38 seconds ago
  transfer: 1.86 MiB received, 68.71 KiB sent
  persistent keepalive: every 30 seconds

So at this point, I started suspecting the wgctrl library you're using as it hasn't been updated in a year. I wrote a quick program that just prints out client.Device("wg0") and it actually worked fine.

# sudo -u telegraf ./wireguard-freebsd-test 
&{wg0 FreeBSD kernel AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= [redacted] 22979 0 [{[redacted] AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= [redacted] 30s 2023-12-20 21:57:47.0403048 -0500 EST 23418581 1048772 [{0.0.0.0 00000000}] 1}]}

So this does seem to be a problem in Telegraf. Maybe the output on 14 isn't what it expects?

telegraf-1.29.1
Name           : telegraf
Version        : 1.29.1
Installed on   : Wed Dec 20 20:39:34 2023 EST
Origin         : net-mgmt/telegraf
Architecture   : FreeBSD:14:amd64

I dropped in on the issues board because I'm looking for a way to kill time on Christmas. Maybe I'll look into this.

bwcarp commented 10 months ago

The version of wgctrl that Telegraf is using is before they added FreeBSD support, so this never worked. It needs the version bumped and some labels tweaked. I'll start testing.

bwcarp commented 10 months ago

Sorry for over-commenting in the thread in the middle of the night, I got inspired.

I have this working now, however, it will only build with wgctrl's os_freebsd.go module if CGO_ENABLED is set to 1, which is explicitly set to 0 in the Telegraf Makefile. I'm assuming that's there on purpose because of a known problem?

powersj commented 10 months ago

if CGO_ENABLED is set to 1, which is explicitly set to 0 in the Telegraf Makefile. I'm assuming that's there on purpose because of a known problem?

This is really unfortunate that the library started using cgo. We do not support adding cgo dependencies or code in Telegraf. Telegraf produces a static binary and our binaries are cross built, adding cgo code would prevent that or limit that ability.

The version of wgctrl that Telegraf is using is before they added FreeBSD support, so this never worked.

Is there a version between what we currently use, before they started CGO that we can update to?

bwcarp commented 10 months ago

The component that monitors the FreeBSD kernel implementation of wireguard is specifically what requires access to C libraries. If it worked on FreeBSD in the past, it seems Telegraf and the old 2021 wgctrl supported userspace implementations, so wireguard-go probably reported metrics just fine. As it is, it looks like upgrading wgctrl for FreeBSD kernel support isn't a supported change.That's a bummer.

To offer the person originally asking an alternative, I think https://github.com/MindFlavor/prometheus_wireguard_exporter just parses the output of the wg command. I'd have to try it; I use it on Linux and it's fine. Could have telegraf scrape/ship to influx.

bwcarp commented 10 months ago

Thinking about it, I'll offer an idea, you can advise if it's a good or not.

It could be made optional in the config to gather metrics using the commands from wireguard-tools as opposed to the wgctrl library, parsing the output like Telegraf does with nvme/smartctl/some other things.

powersj commented 10 months ago

Does the CLI have a JSON or other format option? We strongly dislike parsing CLI output due to changes, whitespace, etc., but if there is a parseable output, we are much more likely to add support for that.

bwcarp commented 10 months ago

Looking into it, no. In their contrib folder, there is a json command that can be built separately, but all it does is what I was thinking about doing anyway (parses the other binary's format): https://github.com/WireGuard/wireguard-tools/blob/master/contrib/json/wg-json

Format hasn't changed as long as I've been using it, but I understand your concern. Too bad it's not a native feature. In the case of Linux/userspace though, in the long run that wgctrl dependency might still need to be dropped or forked due to future golang changes or other discovered issues.

interface: wg0
  public key: [redacted]
  private key: (hidden)
  listening port: 51280

peer: [redacted]=
  endpoint: [redacted]
  allowed ips: 10.8.0.5/32
  latest handshake: 9 hours, 54 minutes, 7 seconds ago
  transfer: 85.00 MiB received, 2.11 GiB sent

peer: [redacted]=
  endpoint: [redacted]
  allowed ips: 10.8.0.7/32, abcd:abcd:abcd::7/128
  latest handshake: 11 hours, 18 minutes, 45 seconds ago
  transfer: 248.01 MiB received, 11.02 GiB sent
powersj commented 10 months ago

I am not opposed to an opt-in parsing option. I think the fact that the library we were using will not be able to be updated without work can further justify this as well.

Is this something you are interested in contributing?

What I would look for is:

bwcarp commented 10 months ago

Yeah, was dropping in because I've been using InfluxDB for years on hobby projects and felt like giving back. Do you have a doc or wiki somewhere for general requirements for contributing to Telegraf?

powersj commented 10 months ago

Sweet! We have some guidelines here:

https://github.com/influxdata/telegraf/tree/master/docs/developers

If you have more specific questions, you can ask them here (although I'm about to disappear for the rest of the year) or in our community slack. In general I would say @srebhan and I prefer to see a PR, even if you are not sure about it and we can work with you to resolve any issues!

bwcarp commented 10 months ago

Thanks, I'll get a draft PR up in a week or so after I've verified it runs right on Linux and FreeBSD for a few days.

Also, apparently there is an argument to have it print tab separated values with a new peer on every line, just had to read the man page more clearly, so we shouldn't have to worry about formatting changes.

bunnyevans commented 10 months ago

Dumb question here, not being even remotely understanding of the gubbins of telegraf.

wireguard is now a first-class citizen in freebsd 14 and as a result it shows up in netstat like this:

wg0: flags=10080c1<UP,RUNNING,NOARP,MULTICAST,LOWER_UP> metric 0 mtu 1280
        options=80000<LINKSTATE>
        inet 172.25.248.35 netmask 0xffff0000
        groups: wg
        nd6 options=101<PERFORMNUD,NO_DAD>
wg1: flags=10080c1<UP,RUNNING,NOARP,MULTICAST,LOWER_UP> metric 0 mtu 1280
        options=80000<LINKSTATE>
        inet 10.20.30.64 netmask 0xffffff00
        groups: wg
        nd6 options=101<PERFORMNUD,NO_DAD>

and

bunny@turbinia:~ % netstat -i -I wg1
Name    Mtu Network        Address        Ipkts Ierrs Idrop    Opkts Oerrs  Coll
wg1    1280 <Link#6>       wg1          4497637     0     0  2552503 25927     0
wg1       - 10.20.30.0/24  10.20.30.64  4450794     -     -  2471502     -     -

Perhaps this might be an easier route? Don't forget to shoot me down if I am completely wrong!

bwcarp commented 10 months ago

Apologies, this thread became kind of a brain dump of my troubleshooting. I'll break it down a little better.

So the reason Telegraf isn't able to monitor Wireguard with the Go library it ships with is that, although the newer version of that library supports FreeBSD's native kernel implementation, it needs access to certain C libraries to study it. This is ultimately a breaking change in the case of Telegraf, so that's how we wound up at plan B.

ifconfig/netstat don't provide very much of the information we're looking for, but if you install wireguard-tools and run wg0 show all dump [exclude the "dump" for something easier to read but harder to script], you'll see most of that information. In order to get the information in parity with the current labels it outputs, I just have to gather a couple OS details and that should be everything. I'm hoping to work on this next week.

bwcarp commented 10 months ago

Hoping for an easier way than duplicating or if'ing all these functions for if wg_path is specified in the config, I decided to take a detour to see if I could eliminate the C dependency in the wgctrl library instead.

Something to note, this cgo dependency only affects builds with GOOS set to freebsd or openbsd. Comment in the code about it:

// Package wgh is an auto-generated package which contains constants and // types used to access WireGuard information using ioctl calls.

Now it seems they are using sys/unix for most of their functions, but they're initializing things per architecture. The FreeBSD client lives here: https://github.com/WireGuard/wgctrl-go/blob/master/internal/wgfreebsd/client_freebsd.go

I'm now wondering if I should spend some time with FreeBSD's docs around ioctl and see if I can remove these C dependencies and just use Go's sys/unix, but I imagine they probably did it this way because of a limitation with that, and I'd subsequently need to do it for OpenBSD as well. If wgctrl can be changed, all that it would take to fix the wireguard plugin in Telegraf would be to update the go.mod and the deviceTypeNames map to support the additional operating systems in the new version (this is how I got it working on a FreeBSD vm, but I had to enable CGO).

This will delay me a bit, but if it can be done this way that's ultimately better I think.

powersj commented 7 months ago

I am going to mark this upstream, as the client library we are currently using appears to require cgo for freebsd support. It does look like openbsd does not require cgo, so it may point to how this could get resolved.

If anyone does end up either updating the upstream library or want to create a freebsd specific config option, feel free to.