munin-monitoring / munin

Main repository for munin master / node / plugins
http://munin-monitoring.org
Other
1.98k stars 471 forks source link

munin-update problem reading from simple plugin #1499

Open jonmcd11 opened 1 year ago

jonmcd11 commented 1 year ago

munin-update gets no data from plugin when running from munin-cron, works fine with command line test. master node and node hosting the plugin are the same

munin-update.log:

2022/11/26 14:35:01 [INFO]: Starting munin-update 2022/11/26 14:35:01 [INFO] starting work in 378829 for pihole/192.168.1.2:4949. 2022/11/26 14:35:01 [INFO] starting work in 378828 for localhost.localdomain/127.0.0.1:4949. 2022/11/26 14:35:01 [INFO] node localhost.localdomain advertised itself as zbucket instead. 2022/11/26 14:35:07 [INFO]: Munin-update finished for node pihole;pihole (5.88 sec) 2022/11/26 14:35:15 [WARNING] 2 lines had errors while 0 lines were correct (100.00%) in data from 'fetch speedtest' on localhost.localdomain/127.0.0.1:4949 2022/11/26 14:35:15 [INFO]: Munin-update finished for node localdomain;localhost.localdomain (13.65 sec) 2022/11/26 14:35:15 [INFO] Reaping Munin::Master::UpdateWorker<pihole;pihole>. Exit value/signal: 0/0 2022/11/26 14:35:15 [INFO] Reaping Munin::Master::UpdateWorker<localdomain;localhost.localdomain>. Exit value/signal: 0/0 2022/11/26 14:35:15 [INFO]: Munin-update finished (13.87 sec)

command line run: munin@zbucket:/home/jon$ /usr/share/munin/munin-update --debug --host localhost.localdomain --service speedtest

2022/11/26 14:44:39 [DEBUG] Creating new lock file /var/run/munin/munin-update.lock
2022/11/26 14:44:39 [DEBUG] Creating lock : /var/run/munin/munin-update.lock succeeded
2022/11/26 14:44:39 [INFO]: Starting munin-update
2022/11/26 14:44:39 [DEBUG] Starting work
2022/11/26 14:44:39 [DEBUG] Active workers: 0/16
2022/11/26 14:44:39 [DEBUG] Active workers: 1/16
2022/11/26 14:44:39 [DEBUG] Starting Munin::Master::UpdateWorker<localdomain;localhost.localdomain>
2022/11/26 14:44:39 [DEBUG] Creating new lock file /var/run/munin/munin-localdomain-localhost.localdomain.lock
2022/11/26 14:44:39 [DEBUG] Creating lock : /var/run/munin/munin-localdomain-localhost.localdomain.lock succeeded
2022/11/26 14:44:39 [DEBUG] Reading state for localdomain-localhost.localdomain in /var/lib/munin/state-localdomain-localhost.localdomain.storable
2022/11/26 14:44:39 [INFO] starting work in 382314 for localhost.localdomain/127.0.0.1:4949.
2022/11/26 14:44:39 [INFO] node localhost.localdomain advertised itself as zbucket instead.
2022/11/26 14:44:39 TLS set to "disabled".
2022/11/26 14:44:39 [DEBUG] Negotiating capabilities
2022/11/26 14:44:39 [DEBUG] Writing to socket: "cap multigraph dirtyconfig
".
2022/11/26 14:44:39 [DEBUG] Node says /cap multigraph dirtyconfig/
2022/11/26 14:44:39 [DEBUG] Writing to socket: "list zbucket
".
2022/11/26 14:44:39 [DEBUG] for my speedtest (sensors_volt sensors_temp cpuspeed irqstats hddtemp_smartctl ntp_kernel_pll_off open_inodes if_err_enp5s0 apache_volume netstat nfsd meminfo df interrupts apache_processes load smart_sda if_enp5s0 smart_sdc proc_pri lpstat df_inode diskstats forks ntp_offset apache_accesses processes nfsd4 ntp_kernel_pll_freq http_loadtime entropy smart_sdd samba uptime ntp_kernel_err open_files smart_sde memory swap munin_stats smart_sdb threads users fw_packets vmstat speedtest cpu)
2022/11/26 14:44:39 [DEBUG] Fetching service configuration for 'speedtest'
2022/11/26 14:44:39 [DEBUG] Writing to socket: "config speedtest
".
2022/11/26 14:44:39 [DEBUG] Reading from socket: "graph_category network\ngraph_title Speedtest\ngraph_args --base 1000 -l 0\ngraph_vlabel DL / UL\ngraph_scale no\ndown.label DL\ndown.type GAUGE\ndown.draw LINE1\nup.label UL\nup.type GAUGE\nup.draw LINE1\ngraph_info Graph of Internet Connection Speed".
2022/11/26 14:44:39 [DEBUG] config: 0.048098 sec for 'speedtest' on localhost.localdomain/127.0.0.1/4949
2022/11/26 14:44:39 [DEBUG] Now parsing config output from plugin speedtest on localhost.localdomain
2022/11/26 14:44:39 [DEBUG] update_rate 0 for speedtest on localhost.localdomain/127.0.0.1:4949
2022/11/26 14:44:39 [DEBUG] No service data for speedtest, fetching it
2022/11/26 14:44:39 [DEBUG] Writing to socket: "fetch speedtest
".
2022/11/26 14:44:39 [DEBUG] data: 0.054442 sec for 'speedtest' on localhost.localdomain/127.0.0.1/4949
2022/11/26 14:44:39 [DEBUG] Now parsing fetch output from plugin speedtest on localhost.localdomain/127.0.0.1:4949

2022/11/26 14:44:39 [FETCH from speedtest] down.value 45.09 2022/11/26 14:44:39 [FETCH from speedtest] Storing 45.09 in down 2022/11/26 14:44:39 [FETCH from speedtest] up.value 11.35 2022/11/26 14:44:39 [FETCH from speedtest] Storing 11.35 in up

2022/11/26 14:44:39 [DEBUG] rrd filename: /var/lib/munin/localdomain/localhost.localdomain-speedtest-up-g.rrd
2022/11/26 14:44:39 [DEBUG] Updating /var/lib/munin/localdomain/localhost.localdomain-speedtest-up-g.rrd with 1669495479:11.35
2022/11/26 14:44:39 [DEBUG] asking for a rrd of size : normal
2022/11/26 14:44:39 [DEBUG] rrd filename: /var/lib/munin/localdomain/localhost.localdomain-speedtest-down-g.rrd
2022/11/26 14:44:39 [DEBUG] Updating /var/lib/munin/localdomain/localhost.localdomain-speedtest-down-g.rrd with 1669495479:45.09
2022/11/26 14:44:39 [DEBUG] set_spoolfetch_timestamp(1669495479)
2022/11/26 14:44:39 [DEBUG] Writing to socket: "quit
".
2022/11/26 14:44:39 [DEBUG] quit: 0.000299 sec on localhost.localdomain/127.0.0.1/4949
2022/11/26 14:44:39 [DEBUG] Everything went smoothly.
2022/11/26 14:44:39 [DEBUG] Writing state for localdomain-localhost.localdomain in /var/lib/munin/state-localdomain-localhost.localdomain.storable
2022/11/26 14:44:39 [DEBUG] about to write '/var/lib/munin/state-localdomain-localhost.localdomain.storable'
2022/11/26 14:44:39 [DEBUG] Exit status 0 for worker Munin::Master::UpdateWorker<localdomain;localhost.localdomain>
2022/11/26 14:44:39 [INFO]: Munin-update finished for node localdomain;localhost.localdomain (0.17 sec)
2022/11/26 14:44:39 [INFO] Reaping Munin::Master::UpdateWorker<localdomain;localhost.localdomain>.  Exit value/signal: 0/0
2022/11/26 14:44:39 [DEBUG] Work done
2022/11/26 14:44:39 [DEBUG] Creating new lock file /var/run/munin/munin-datafile.lock
2022/11/26 14:44:39 [DEBUG] Creating lock : /var/run/munin/munin-datafile.lock succeeded
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.graph_category" = "network"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.graph_title" = "Speedtest"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.graph_args" = "--base 1000 -l 0"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.graph_vlabel" = "DL / UL"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.graph_scale" = "no"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.graph_info" = "Graph of Internet Connection Speed"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.graph_order" = "down up"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.up.draw" = "LINE1"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.up.type" = "GAUGE"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.up.graph_data_size" = "normal"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.up.label" = "UL"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.up.update_rate" = "300"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.down.graph_data_size" = "normal"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.down.draw" = "LINE1"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.down.type" = "GAUGE"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.down.label" = "DL"
2022/11/26 14:44:39 [DEBUG] munin_set_var_path: Setting var "localdomain;localhost.localdomain:speedtest.down.update_rate" = "300"
2022/11/26 14:44:39 [DEBUG] Writing state to /var/lib/munin/datafile.storable
2022/11/26 14:44:39 [DEBUG] about to write '/var/lib/munin/datafile.storable'
2022/11/26 14:44:39 [INFO]: Munin-update finished (0.19 sec)

Plugin speedtest:


case $1 in
    config)
    echo "graph_category network"
    echo "graph_title Speedtest"
    echo "graph_args --base 1000 -l 0"
    echo "graph_vlabel DL / UL"
    echo "graph_scale no"
    echo "down.label DL"
    echo "down.type GAUGE"
    echo "down.draw LINE1"
    echo "up.label UL"
    echo "up.type GAUGE"
    echo "up.draw LINE1"
    echo "graph_info Graph of Internet Connection Speed"
    exit 0;;
    esac

OUTP=`cat /tmp/speedtest.out`
DOWNLOAD=`echo "$OUTP" | grep Download | sed 's/[a-zA-Z:]* \([0-9]*\.[0-9]*\) [a-zA-Z/]*/\1/'`
UPLOAD=`echo "$OUTP" | grep Upload | sed 's/[a-zA-Z:]* \([0-9]*\.[0-9]*\) [a-zA-Z/]*/\1/'`

echo "down.value $DOWNLOAD"
echo "up.value $UPLOAD"

The node is running speedtest-cli from a cron job, every 5 minutes:

/usr/local/bin/speedtest-cli --simple > /tmp/speedtest.out

To Reproduce Steps to reproduce the behavior:

  1. install speedtest-cli using pip, easy_install, from packages, or from github
  2. set up plugin on munin node
  3. restart munin processes
  4. monitor munin_update.log on master node
  5. See error
  6. run munin-update from command line as user munin /usr/share/munin/munin-update --debug --host localhost.localdomain --service speedtest

Expected behavior no errors for speedtest in munin-update log in either case

Desktop (please complete the following information):

yunal commented 1 year ago

I have to say that the most likely cause of this is some issue with the plugin.

I've played with your plugin-fragment and tested if speedtest is locale sensitive. The plugin-fragment is simple enough. Speedtest is not locale sensitive so it appears it will never print the speed as "99,9" but rather always as "99.0" which is what munin needs.

The only way for the part of the plugin-fragment you've given us (which is not the whole plugin, it does not handle config) is if the /tmp/speedtest.out file is empty or unreadable. So I have to ask: Did the file exist and was it readable at the time the log with the error message was written?

Also, please have a look at the munin-node log for the times when munin-update said there were errors in the plugin output.

EDIT: oops, I'm sorry. Somehow I only registered the 6 last lines of the plugin, and not in fact the whole plugin which you had provided. It's still simple enough that I think the only way it can fail is if the /tmp/speedtest.out file does not exist when it's executed.

jonmcd11 commented 1 year ago

The /tmp/speedtest.out file has content and is being updated via the cron job as expected. I get it the whole set up is sparse

I previously had a problem with the file not being readable, returning other errors. I changed the systemd configuration for munin to set PrivateTmp to false so the file is readable. I figured this out with the --ignore-systemd-properties flag on the munin-run command.

What I can't figure out or find any reference for is why a plugin update that works when running munin-update in debug mode from the command line as user munin will not work under munin-update running normally under munin-cron.

yunal commented 1 year ago

Hi, please have a look in the munin-node log (on the node where the plugin is executed) for the times munin-update fails.

jonmcd11 commented 1 year ago

munin-update log:

2022/11/30 15:10:01 [INFO]: Starting munin-update
2022/11/30 15:10:01 [INFO] starting work in 4081 for pihole/192.168.1.2:4949.
2022/11/30 15:10:01 [INFO] starting work in 4080 for localhost.localdomain/127.0 .0.1:4949.
2022/11/30 15:10:01 [INFO] node localhost.localdomain advertised itself as zbucket instead.
2022/11/30 15:10:06 [WARNING] 2 lines had errors while 0 lines were correct (100.00%) in data from 'fetch speedtest' on localhost.localdomain/127.0.0.1:4949
2022/11/30 15:10:07 [INFO]: Munin-update finished for node pihole;pihole (6.00 sec)
2022/11/30 15:10:17 [INFO]: Munin-update finished for node localdomain;localhost.localdomain (15.40 sec)
2022/11/30 15:10:17 [INFO] Reaping Munin::Master::UpdateWorker<pihole;pihole>. Exit value/signal: 0/0
2022/11/30 15:10:17 [INFO] Reaping Munin::Master::UpdateWorker<localdomain;localhost.localdomain>.  Exit value/signal: 0/0
2022/11/30 15:10:17 [INFO]: Munin-update finished (15.61 sec)
2022/11/30 15:15:01 [INFO]: Starting munin-update
2022/11/30 15:15:01 [INFO] starting work in 5294 for pihole/192.168.1.2:4949.
2022/11/30 15:15:01 [INFO] starting work in 5295 for localhost.localdomain/127.0.0.1:4949.
2022/11/30 15:15:01 [INFO] node localhost.localdomain advertised itself as zbucket instead.
2022/11/30 15:15:05 [WARNING] 2 lines had errors while 0 lines were correct (100.00%) in data from 'fetch speedtest' on localhost.localdomain/127.0.0.1:4949
2022/11/30 15:15:07 [INFO]: Munin-update finished for node pihole;pihole (5.92 sec)
2022/11/30 15:15:14 [INFO]: Munin-update finished for node localdomain;localhost.localdomain (13.17 sec)
2022/11/30 15:15:14 [INFO] Reaping Munin::Master::UpdateWorker<pihole;pihole>.  Exit value/signal: 0/0
2022/11/30 15:15:14 [INFO] Reaping Munin::Master::UpdateWorker<localdomain;localhost.localdomain>.  Exit value/signal: 0/0
2022/11/30 15:15:14 [INFO]: Munin-update finished (13.37 sec)

munin-node.log from same time period:

Resolved [*]:4949 to [::]:4949, IPv6
Not including resolved host [0.0.0.0] IPv4 because it will be handled by [::] IPv6
Binding to TCP port 4949 on host :: with IPv6
Setting gid to "0 0"
2022/11/30-15:05:13 [2645] sd_notify: looking for NOTIFY_SOCKET environment variable
2022/11/30-15:05:13 [2645] sd_notify: preparing connection to '/run/systemd/notify'
2022/11/30-15:05:13 [2645] sd_notify: connected to socket '/run/systemd/notify'
2022/11/30-15:05:13 [2645] sd_notify: sending READY signal to '/run/systemd/notify'
2022/11/30-15:10:01 CONNECT TCP Peer: "[::ffff:127.0.0.1]:37148" Local: "[::ffff:127.0.0.1]:4949"
2022/11/30-15:15:01 CONNECT TCP Peer: "[::ffff:127.0.0.1]:57236" Local: "[::ffff:127.0.0.1]:4949"
yunal commented 1 year ago

It's been a while. Sorry. There are no error messages on the node.

Can you do 'telnet localhost 4949 | tee mn.log ' and then inside telnet type 'fetch speedtest' and after that 'quit' and post the output captured in mn.log as a file attachment here? (file attachment to be able to see any special characters that might or might not be in the output).