mikaku / Monitorix

Monitorix is a free, open source, lightweight system monitoring tool.
https://www.monitorix.org
GNU General Public License v2.0
1.12k stars 167 forks source link

"System load average and usage" Graphs Empty #91

Closed mgruben closed 9 years ago

mgruben commented 9 years ago

Yesterday, the three graphs in the "System load average and usage" section (system.rrd) started reporting only nan values, while other graphs appear to have been unaffected. (xref this thread)

The only thing I can think of that would have interfered with Monitorix' recording abilities is upgrading a variety of packages on my host system (below).

I'm happy to provide additional information, just not sure what would be helpful

[2015-05-15 21:27] [PACMAN] Running 'pacman -Syu'
[2015-05-15 21:27] [PACMAN] synchronizing package lists
[2015-05-15 21:28] [PACMAN] starting full system upgrade
[2015-05-15 21:41] [ALPM] transaction started
[2015-05-15 21:41] [ALPM] upgraded pcre (8.36-2 -> 8.37-1)
[2015-05-15 21:41] [ALPM] upgraded gdk-pixbuf2 (2.31.3-1 -> 2.31.4-1)
[2015-05-15 21:41] [ALPM] upgraded gtk-update-icon-cache (2.24.27-1 -> 3.16.3-2)
[2015-05-15 21:41] [ALPM] upgraded libdrm (2.4.60-2 -> 2.4.61-1)
[2015-05-15 21:41] [ALPM] upgraded libedit (20141030_3.1-1 -> 20150325_3.1-1)
[2015-05-15 21:41] [ALPM] upgraded cracklib (2.9.1-1 -> 2.9.4-1)
[2015-05-15 21:41] [ALPM] upgraded libtirpc (0.2.5-1 -> 0.3.0-1)
[2015-05-15 21:41] [ALPM] upgraded mesa (10.5.4-1 -> 10.5.5-1)
[2015-05-15 21:41] [ALPM] upgraded mesa-libgl (10.5.4-1 -> 10.5.5-1)
[2015-05-15 21:41] [ALPM] upgraded librsvg (1:2.40.9-1 -> 1:2.40.9-2)
[2015-05-15 21:41] [ALPM] upgraded adwaita-icon-theme (3.16.0-1 -> 3.16.2.1-1)
[2015-05-15 21:41] [ALPM] upgraded android-tools (5.1.0_r3-1 -> 5.1.1_r2-1)
[2015-05-15 21:41] [ALPM] upgraded dhcpcd (6.8.1-1 -> 6.8.2-1)
[2015-05-15 21:41] [ALPM] upgraded gawk (4.1.1-1 -> 4.1.2-1)
[2015-05-15 21:41] [ALPM] upgraded lcms2 (2.6-1 -> 2.7-1)
[2015-05-15 21:41] [ALPM] upgraded sqlite (3.8.9-1 -> 3.8.10.1-1)
[2015-05-15 21:41] [ALPM] upgraded libtasn1 (4.4-1 -> 4.5-1)
[2015-05-15 21:41] [ALPM] upgraded gnutls (3.4.0-1 -> 3.4.1-1)
[2015-05-15 21:41] [ALPM] upgraded gtk3 (3.16.2-1 -> 3.16.3-2)
[2015-05-15 21:41] [ALPM] upgraded gcr (3.15.92-1 -> 3.16.0-1)
[2015-05-15 21:41] [ALPM] upgraded git (2.3.7-1 -> 2.4.1-1)
[2015-05-15 21:41] [ALPM] upgraded npth (1.1-1 -> 1.2-1)
[2015-05-15 21:41] [ALPM] upgraded libksba (1.3.2-1 -> 1.3.3-1)
[2015-05-15 21:41] [ALPM] upgraded libassuan (2.1.3-1 -> 2.2.0-1)
[2015-05-15 21:41] [ALPM] upgraded pinentry (0.9.0-1 -> 0.9.1-1)
[2015-05-15 21:41] [ALPM] upgraded gnupg (2.1.3-2 -> 2.1.3-3)
[2015-05-15 21:41] [ALPM] upgraded gpgme (1.5.3-1 -> 1.5.4-1)
[2015-05-15 21:41] [ALPM] upgraded gtk2 (2.24.27-1 -> 2.24.28-1)
[2015-05-15 21:41] [ALPM] installed c-client (2007f-5)
[2015-05-15 21:41] [ALPM] upgraded imap (2007f-4 -> 2007f-5)
[2015-05-15 21:41] [ALPM] upgraded lib32-gnutls (3.4.0-1 -> 3.4.0-2)
[2015-05-15 21:41] [ALPM] upgraded lib32-libcups (2.0.2-1 -> 2.0.2-2)
[2015-05-15 21:41] [ALPM] upgraded lib32-libpciaccess (0.13.3-1 -> 0.13.4-1)
[2015-05-15 21:41] [ALPM] upgraded lib32-libdrm (2.4.60-1 -> 2.4.61-1)
[2015-05-15 21:41] [ALPM] upgraded lib32-mesa (10.5.4-1 -> 10.5.5-1)
[2015-05-15 21:41] [ALPM] upgraded lib32-mesa-libgl (10.5.4-1 -> 10.5.5-1)
[2015-05-15 21:41] [ALPM] upgraded libass (0.12.1-1 -> 0.12.2-1)
[2015-05-15 21:41] [ALPM] upgraded libmariadbclient (10.0.17-2 -> 10.0.18-2)
[2015-05-15 21:41] [ALPM] upgraded libmm-glib (1.4.6-1 -> 1.4.8-1)
[2015-05-15 21:41] [ALPM] upgraded libnm-glib (1.0.0-2 -> 1.0.2-3)
[2015-05-15 21:41] [ALPM] upgraded libpulse (6.0-1 -> 6.0-2)
[2015-05-15 21:41] [ALPM] installed glpk (4.55-1)
[2015-05-15 21:41] [ALPM] installed coin-or-coinutils (2.10.7-1)
[2015-05-15 21:41] [ALPM] installed coin-or-osi (0.107.4-1)
[2015-05-15 21:41] [ALPM] installed intel-tbb (4.3_20150209-1)
[2015-05-15 21:41] [ALPM] installed suitesparse (4.4.4-1)
[2015-05-15 21:41] [ALPM] installed coin-or-clp (1.16.6-1)
[2015-05-15 21:41] [ALPM] installed coin-or-cgl (0.59.4-1)
[2015-05-15 21:41] [ALPM] installed coin-or-cbc (2.9.4-1)
[2015-05-15 21:41] [ALPM] installed coin-or-mp (1.8.1-1)
[2015-05-15 21:41] [ALPM] upgraded libreoffice-fresh (4.4.2-1 -> 4.4.3-1)
[2015-05-15 21:41] [ALPM] upgraded libsecret (0.18-1 -> 0.18.2-1)
[2015-05-15 21:41] [ALPM] warning: directory permissions differ on /var/log/lighttpd/
filesystem: 755  package: 700
[2015-05-15 21:41] [ALPM] warning: directory permissions differ on /var/cache/lighttpd/
filesystem: 755  package: 700
[2015-05-15 21:41] [ALPM] upgraded lighttpd (1.4.35-1 -> 1.4.35-2)
[2015-05-15 21:41] [ALPM] upgraded linux (4.0.1-1 -> 4.0.2-1)
[2015-05-15 21:41] [ALPM-SCRIPTLET] >>> Updating module dependencies. Please wait ...
[2015-05-15 21:41] [ALPM-SCRIPTLET] >>> Generating initial ramdisk, using mkinitcpio.  Please wait...
[2015-05-15 21:41] [ALPM-SCRIPTLET] ==> Building image from preset: /etc/mkinitcpio.d/linux.preset: 'default'
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> -k /boot/vmlinuz-linux -c /etc/mkinitcpio.conf -g /boot/initramfs-linux.img
[2015-05-15 21:41] [ALPM-SCRIPTLET] ==> Starting build: 4.0.2-1-ARCH
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [base]
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [udev]
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [autodetect]
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [modconf]
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [block]
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [filesystems]
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [keyboard]
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [fsck]
[2015-05-15 21:41] [ALPM-SCRIPTLET] ==> Generating module dependencies
[2015-05-15 21:41] [ALPM-SCRIPTLET] ==> Creating gzip-compressed initcpio image: /boot/initramfs-linux.img
[2015-05-15 21:41] [ALPM-SCRIPTLET] ==> Image generation successful
[2015-05-15 21:41] [ALPM-SCRIPTLET] ==> Building image from preset: /etc/mkinitcpio.d/linux.preset: 'fallback'
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> -k /boot/vmlinuz-linux -c /etc/mkinitcpio.conf -g /boot/initramfs-linux-fallback.img -S autodetect
[2015-05-15 21:41] [ALPM-SCRIPTLET] ==> Starting build: 4.0.2-1-ARCH
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [base]
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [udev]
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [modconf]
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [block]
[2015-05-15 21:41] [ALPM-SCRIPTLET] ==> WARNING: Possibly missing firmware for module: aic94xx
[2015-05-15 21:41] [ALPM-SCRIPTLET] ==> WARNING: Possibly missing firmware for module: wd719x
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [filesystems]
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [keyboard]
[2015-05-15 21:41] [ALPM-SCRIPTLET]   -> Running build hook: [fsck]
[2015-05-15 21:41] [ALPM-SCRIPTLET] ==> Generating module dependencies
[2015-05-15 21:41] [ALPM-SCRIPTLET] ==> Creating gzip-compressed initcpio image: /boot/initramfs-linux-fallback.img
[2015-05-15 21:42] [ALPM-SCRIPTLET] ==> Image generation successful
[2015-05-15 21:42] [ALPM] upgraded linux-headers (4.0.1-1 -> 4.0.2-1)
[2015-05-15 21:42] [ALPM] upgraded man-pages (3.83-1 -> 4.00-1)
[2015-05-15 21:42] [ALPM] upgraded mariadb-clients (10.0.17-2 -> 10.0.18-2)
[2015-05-15 21:42] [ALPM] upgraded mariadb (10.0.17-2 -> 10.0.18-2)
[2015-05-15 21:42] [ALPM] upgraded nm-connection-editor (1.0.0-2 -> 1.0.2-1)
[2015-05-15 21:42] [ALPM] upgraded network-manager-applet (1.0.0-2 -> 1.0.2-1)
[2015-05-15 21:42] [ALPM] warning: /etc/NetworkManager/NetworkManager.conf installed as /etc/NetworkManager/NetworkManager.conf.pacnew
[2015-05-15 21:42] [ALPM] upgraded networkmanager (1.0.0-2 -> 1.0.2-3)
[2015-05-15 21:42] [ALPM] upgraded opencl-mesa (10.5.4-1 -> 10.5.5-1)
[2015-05-15 21:42] [ALPM] warning: /etc/pacman.d/mirrorlist installed as /etc/pacman.d/mirrorlist.pacnew
[2015-05-15 21:42] [ALPM] upgraded pacman-mirrorlist (20150315-1 -> 20150514-1)
[2015-05-15 21:42] [ALPM] upgraded pciutils (3.3.0-1 -> 3.3.1-1)
[2015-05-15 21:42] [ALPM] upgraded php (5.6.8-2 -> 5.6.9-1)
[2015-05-15 21:42] [ALPM] upgraded php-cgi (5.6.8-2 -> 5.6.9-1)
[2015-05-15 21:42] [ALPM] upgraded php-xsl (5.6.8-2 -> 5.6.9-1)
[2015-05-15 21:42] [ALPM] upgraded qt4 (4.8.6-5 -> 4.8.6-6)
[2015-05-15 21:42] [ALPM] upgraded ranger (1.7.0-1 -> 1.7.1-1)
[2015-05-15 21:42] [ALPM] upgraded rrdtool (1.4.9-1 -> 1.5.3-1)
[2015-05-15 21:42] [ALPM] upgraded rsync (3.1.1-2 -> 3.1.1-3)
[2015-05-15 21:42] [ALPM] upgraded seahorse (3.15.92-1 -> 3.16.0-1)
[2015-05-15 21:42] [ALPM] upgraded simple-scan (3.14.2-1 -> 3.16.1.1-1)
[2015-05-15 21:42] [ALPM] upgraded speech-dispatcher (0.8.1-1 -> 0.8.2-1)
[2015-05-15 21:42] [ALPM] upgraded vi (1:050325-4 -> 1:070224-1)
[2015-05-15 21:42] [ALPM] upgraded vlc (2.2.1-2 -> 2.2.1-3)
[2015-05-15 21:42] [ALPM] upgraded vte-common (0.40.0-2 -> 0.40.2-1)
[2015-05-15 21:42] [ALPM] upgraded wine (1.7.42-1 -> 1.7.42-2)
[2015-05-15 21:42] [ALPM] upgraded xdotool (2.20110530.1-3 -> 3.20150503.1-1)
[2015-05-15 21:42] [ALPM] upgraded youtube-dl (2015.04.26-1 -> 2015.05.10-1)
[2015-05-15 21:42] [ALPM] transaction completed
[2015-05-15 21:43] [PACMAN] Running 'pacman --color auto -Sy'
[2015-05-15 21:43] [PACMAN] synchronizing package lists
mikaku commented 9 years ago

hi mgruben,

I'd say that this [2015-05-15 21:42] [ALPM] upgraded rrdtool (1.4.9-1 -> 1.5.3-1) could be the cause of this malfunction. I've not tested such new RRDtool version yet so it could introduce something new that Monitorix is not supporting right now.

Can you, please, paste your Monitorix log file right after the last start to see if there are some error messages that could give us a clue?

Thanks.

mgruben commented 9 years ago

Your observation suggests that downgrading to 1.4.9-1 would fix the graphs. Indeed, this is the case: after issuing #pacman -U /var/cache/pacman/pkg/rrdtool-1.4.9-1-x86_64.pkg.tar.xz, and #systemctl restart monitorix, the "system" graph is now properly displaying values.

I will post the requested portion of my Monitorix system.rrd file this evening, when I have more time to teach myself proper rrdtool syntax (you made the awesome Monitorix; I'm not also going to make you teach me rrdtool)

mikaku commented 9 years ago

Thanks for your feedback and glad to know that all is working again.

So if I understand correctly the only graph that was affected by the new 1.5 RRDtool branch is system? I mean, the rest of graphs were working finely with RRDtool 1.5.3?

Just let me know.

mgruben commented 9 years ago

As far as I can tell that is correct. I've also been trying to get the "disk" graphs working (both under rrdtool 1.4 and 1.5) for my new SSD, but I suspect the problem there exists between my keyboard and my chair.

On May 18, 2015, at 7:54 AM, Jordi Sanfeliu notifications@github.com wrote:

Thanks for your feedback and glad to know that all is working again.

So if I understand correctly the only graph that was affected by the new 1.5 RRDtool branch is system? I mean, the rest of graphs were working finely with RRDtool 1.5.3?

Just let me know.

— Reply to this email directly or view it on GitHub.

mikaku commented 9 years ago

OK, please, don't forget to paste the log file right after a Monitorix start, just to see if there is an error message from the system graph. Also, check the HTTP logfile (either the built-in server in monitorix-httpd.log or from your external HTTP server, depends on what are you using).

Regarding the disk graph feel free to ask whatever you need, either on a new issue, IRC, Mailing List or an email direct to me.

Regards.

mgruben commented 9 years ago

rrdtool is getting the better of me, so I thought I'd just link you to copies of the logs I've hosted in the interest of time, see http://76.209.20.97:9997/

The relevant times are 2015-05-16, at about 11am (this is when I restarted my machine, apparently thus completing the upgrade to the rrdtool 1.5 branch), and 2015-05-18 at about 7am (when I downgraded to the rrdtool 1.4 branch)

mikaku commented 9 years ago

Thanks for sharing all this information. I think the problem is in this line:

Sun May 17 08:16:00 2015 - ERROR: while updating /var/lib/monitorix/system.rrd: /var/lib/monitorix/system.rrd: Function update_pdp_prep, case DST_GAUGE - Cannot convert '' to float

For some unknown reason there is a value that RRDtool is unable to convert to float and hence, the update process fails to save the values in the system.rrd file. That's why you are getting NaN values all the time.

The best way to help to fix this is to upgrade again to RRDtool 1.5 branch and start Monitorix adding the parameter -d system in order to debug the values collected by the system graph. Then, let me know the new log file.

Besides all this, I saw in your log file that you have enabled to get NVIDIA statistics but you don't have installed the NVIDIA official drivers. Please, disable the gpu0 key in the lmsens graph or install the NVIDIA drivers.

Can't exec "nvidia-smi": No such file or directory at /usr/lib/monitorix/Monitorix.pm line 177.
Sun May 17 08:15:00 2015 - Monitorix::get_nvidia_data: ERROR: 'nvidia-smi' command is not installed.
Can't exec "nvidia-smi": No such file or directory at /usr/lib/monitorix/lmsens.pm line 280.

Also, it looks like you have enabled the lighttpd graph but you have not configured the Lighttpd server properly.

Sun May 17 08:15:00 2015 - lighttpd::lighttpd_update: ERROR: Unable to connect to 'http://localhost:8078/server-status?auto'.

All in all, please read the monitorix.conf(5) man page and enable only the graphs that cover your real resources.

Thanks.

mgruben commented 9 years ago

(1) Your help is much appreciated, thank you for taking the time to go through this with me,

(2)(a) In Arch linux, I modified the monitorix.service file in /etc/systemd/system/multi-user.target.wants to read, in part, ExecStart=/usr/bin/monitorix -c /etc/monitorix/monitorix.conf -p /run/monitorix.pid -d system (adding the -d system to the end, as it was not present before). I then stopped the monitorix service, issued a systemctl daemon-reload, and restarted the monitorix service. (b) I then issued a pacman -U ~rrdtool1.5~, followed by a systemctl restart monitorix.service. As expected, the system graphs again are not displaying current values. (c) I've copied system.rrd, /var/log/monitorix, and /var/log/monitorix-httpd to my aforementioned host, http://76.209.20.97:9997/ for your review

(3) Regarding NVIDIA, I thought erroneously that since my nvidia value under the <graph_enable> section is set to n, that I had disabled calls to NVIDIA statistics; I have now removed the line reading gpu0 = nvidia from the lmsens list (since I actually have a radeon card and the nvidia drivers wouldn't help me anyway)

(4) I (very) occasionally use lighttpd in conjunction with a Project Gutenberg database I'm working on, and haven't disabled the lighttpd entry in the <graph_enable> section, but more out of wishful thinking on my part that I'll actually have time to devote to that project. I have now disabled that entry, both because you've suggested it and because I really don't use lighttpd enough to justify leaving the logging on

mikaku commented 9 years ago

Well, after reading your new log file I think that I have found the cause of the problem:

Tue May 19 07:47:00 2015 - system::system_update: N:0.18:0.22:0.29:224:223:1:0:0:0:0:7918332:135032:4155804:2786868:::0:0:0:0:0
Tue May 19 07:47:00 2015 - ERROR: while updating /var/lib/monitorix/system.rrd: /var/lib/monitorix/system.rrd: Function update_pdp_prep, case DST_GAUGE - Cannot convert '' to float

As you can see, there are 2 values that are undefined, and while this hasn't been a problem for the old stable branch 1.4 of RRDtool, it looks like the new 1.5. branch can't deal with it.

The above commit fixes that problem, so if you download the current system.pm the new RRDtool 1.5 branch should work finely.

Feel free to overwrite your current system.pm with this new one, and let me know how it works. Thanks.

mgruben commented 9 years ago

I downloaded the 'raw' system.pm from the above commit into ~/system.pm, then # chmod 644 ~/system.pm, # chown root:root system.pm, # mv /usr/lib/monitorix/system.pm{,.bak}, and finally # mv ~/system.pm /usr/lib/monitorix/system.pm

I see actual, non-NaN values in the three "System load average and usage" graphs, so on its face this commit has fixed my issue with rrdtool 1.5.3-1.

I have also copied /var/lib/monitorix/system.rrd, /var/log/monitorix, and /var/log/monitorix-httpd into my host for your review.

As far as I can tell, this issue has been solved.

mikaku commented 9 years ago

Perfect, thanks for your feedback. Best regards.