munin-monitoring / munin

Main repository for munin master / node / plugins
http://munin-monitoring.org
Other
1.97k stars 469 forks source link

RRD files not created, with Unable to create, Invalid step: must be >= 1 #1086

Closed robotpiotr closed 5 years ago

robotpiotr commented 5 years ago

Describe the bug Hi All! I have problem to get working munin on Centos with munin-nodes on Ubuntu 18.04 LTS. We monitor dozen of servers with different OS version (Centos 6, Centos 5, Ubuntu 16.04, Ubuntu 18.04). We use munin from system repo, and in case of Centos, from EPEL library (currently 2.0.40).

I run into problem with data from munin-nodes on three servers on Ubuntu 18.04. They return something that looks like correct data while debugin through 'nc', but graphs are broken (png files missing), as there is no RRD files for that hosts. Process fails with "Invalid step: must be >= 1" message. Lines from munin-update.log are presented below.

To Reproduce I see same behaviour on all 3 hosts running munin-node on Ubuntu 18.04. Monitoring works well for Ubuntu 16.04.5 LTS, munin-node v2.0.25-2ubuntu0.16.04.3

Expected behavior RRD files and png graphs should be created.

Logs from munin server 2018/09/20 07:50:15 [INFO] creating rrd-file for memory->mapped: '/var/lib/munin//-memory-mapped-g.rrd' 2018/09/20 07:50:15 [ERROR] Unable to create '/var/lib/munin/*/***-memory-mapped-g.rrd': Invalid step: must be >= 1 2018/09/20 07:50:15 [ERROR] In RRD: Error updating /var/lib/munin//-memory-mapped-g.rrd: opening '/var/lib/munin//***-memory-mapped-g.rrd': No such file or directory

Same logs for all other plugins, no RRD file is created for those hosts.

Output from nc Output from nc from working made remotly from munin server: nc 10.**.72 4949 $ munin node at working host** list apc_nis cpu df df_inode entropy forks fw_conntrack fw_forwarded_local fw_packets if_err_eth0 if_err_eth1 if_err_eth2 if_err_eth3 if_eth0 if_eth1 if_eth2 if_eth3 interrupts irqstats load memory netstat ntp_kernel_err ntp_kernel_pll_freq ntp_kernel_pll_off ntp_offset ntp_states open_files open_inodes postfix_mailqueue postfix_mailvolume proc_pri processes swap threads uptime users vmstat fetch df _dev_sda3.value 88.5644079188055 _dev_sda1.value 68.6302926671602 _dev_sdb1.value 56.3586025807507 _dev_sdc1.value 43.7771068613263 . fetch cpu user.value 31052797061 nice.value 3774512937 system.value 6626055813 idle.value 172635960518 iowait.value 377503078 irq.value 101424 softirq.value 486065553 steal.value 0 guest.value 0

Output from nc from defunct made remotly from munin server: nc 147.**.187 4949 $ munin node at broken host list acpi cpu df df_inode entropy forks fw_packets http_loadtime if_eno1 if_eno2 if_err_eno1 if_erreno2 interrupts ip irqstats load memory munin_stats netstat open_files open_inodes postfix_mailqueue postfix_mailvolume proc_pri processes smart_nvme0n1 swap threads uptime users vmstat fetch df _run.value 0.0772152427062899 _dev_md3.value 19.4214452882412 _dev_shm.value 0 _run_lock.value 0 _sys_fs_cgroup.value 0 _dev_md2.value 17.2861865992328 _dev_nvme0n1p1.value 0.000765948972479453 . fetch cpu user.value 37830 nice.value 526 system.value 30900 idle.value 69936532 iowait.value 1620 irq.value 0 softirq.value 748 steal.value 0 guest.value 0

Logs from munin-node 2018/09/20-07:45:03 CONNECT TCP Peer: "[.34]:41620" Local: "[.52]:4949" 2018/09/20-07:50:03 CONNECT TCP Peer: "[.34]:60252" Local: "[.52]:4949" No other error messages etc.

Versions

Additional context

sumpfralle commented 5 years ago

@kimheino could it be, that the user does not have permissions to write into /var/lib/munin? Or do you have another idea?

kimheino commented 5 years ago

Just to verify that I understood correctly:

It could permission or selinux problem, but unlikely. What does munin-update.log on server say? My first guess is some broken plugin in Ubuntu 18.

steveschnepp commented 5 years ago

The fact that you redacted the logs doesn't help. Could you replace the hostname with good.example.com & bad.example.com, along with the IP : 192.168.1.1xx for good nodes 192.168.1.2xx for bad nodes

Also, could you send us the output of a config run?

robotpiotr commented 5 years ago

Hi guys! Thank you for replies!

I would rather rule out permission problem, I've checked this many times. We do not have selinux enabled anywhere. I've added also other Centos server in the meantime, without a problem. I decided to move munin-server and all data and configuration to new Ubuntu 18.04LTS machine, and everything starts to work without any problem, also for clients that were not working before.

I'm afraid that I do not have logs anymore, as we decided to cancel this server two weeks ago. However, because everything started to work from same config and data dir (to have historic data) as on an old server, I assume that problem was with munin version from Centos6 EPEL repository.

Best regards, Piotr

sumpfralle commented 5 years ago

So let us assume that you stumbled upon an issue, that is hopefully fixed by now. Thank you for reporting back!

pstef commented 4 years ago

In my case this error was caused by setting graph_data_size without having set update_rate.

sumpfralle commented 4 years ago

In my case this error was caused by setting graph_data_size without having set update_rate.

@pstef: I tried to reproduce this by setting graph_data_size custom 200, 3 400, 24 540, 288 450 in a random plugin and removing its corresponding rrd files. Afterwards the new rrd file (with a non-default size) was automatically generated. A quick look at the code also does not indicate, that the lack of update_rate could influence the creation of an rrd file.

Thus I could imagine, that your experience was based on an unrelated effect.

Feel free to open a separate issue, if you can reproduce this.