Open tamarw opened 9 years ago
The script is working fine if running dbf update shows the same thing. It is only a wrapper to theses commands. Did you refresh the nagios report on your dashboard ?
dnf update does not show the same thing. And I did restart nagios.
I do see the code but can't understand the disconnect. On Sep 10, 2015 5:02 AM, "nikkolasg" notifications@github.com wrote:
The script is working fine if running dbf update shows the same thing. It is only a wrapper to theses commands. Did you refresh the nagios report on your dashboard ?
— Reply to this email directly or view it on GitHub https://github.com/nikkolasg/check_dnf/issues/1#issuecomment-139175755.
Oddly enough, it is behaving now. nagios was definitely restarted but it only "caught up" hours later. Not sure if it's me but I'll take this!
"dnf update does not show the same thing." => please show me. Again, it just reads the output of the command so maybe there is an output that the script does not recognize (a unknown category or whatever). Sorry I can't help more, next time it is happening be sure to copy/paste the output of the script + commands. Glad it's working now however ;) I am marking this issue closed for the moment.
.# dnf upgrade
Fedora 22 - x86_64 1.4 MB/s | 41 MB 00:29
Fedora 22 - x86_64 - Updates 6.3 MB/s | 14 MB 00:02
Last metadata expiration check performed 0:00:13 ago on Wed Sep 9 21:57:27 2015.
Dependencies resolved.
Nothing to do.
Complete!
.# dnf update Last metadata expiration check performed 0:11:03 ago on Wed Sep 9 21:57:27 2015. Dependencies resolved. Nothing to do. Complete!
.# ./check_dnf.rb OK - DNF shows no updates to do !
Nagios output was different, showing CRITICAL - 1 security, 3 bugfix, 4 enhancement update(s) to do. Will keep you posted on it.
I still keep it closed as there is nothing wrong with the output you posted from the script side. It must be related to the Nagios output delayed or something. Good luck.
Yeah, that's fine! It's strange. I hope it doesn't happen again, but figured I'd put it on your radar.
On Wed, Sep 16, 2015 at 10:26 AM, nikkolasg notifications@github.com wrote:
I still keep it closed as there is nothing wrong with the output you posted from the script side. It must be related to the Nagios output delayed or something. Good luck.
— Reply to this email directly or view it on GitHub https://github.com/nikkolasg/check_dnf/issues/1#issuecomment-140757459.
Just so you know, it's happening again. It has also happened another time a few days ago.
Locally, I ran: .# dnf update Last metadata expiration check performed 2:48:26 ago on Mon Sep 21 11:20:07 2015. Dependencies resolved. Nothing to do. Complete!
From what I know of Ruby, you're running dnf updateinfo. I get this result: .# dnf updateinfo Last metadata expiration check performed 2:50:28 ago on Mon Sep 21 11:20:07 2015.
But yet my email says: * Nagios *
Notification Type: PROBLEM
Service: dnf Host: localhost Address: 127.0.0.1 State: WARNING
Date/Time: Mon Sept 21 14:08:03 EDT 2015
Additional Info:
WARNING - 1 bugfix update(s) to do.
Can you tell me the exact codes check_dnf runs? I can't imagine that there's a delayed output from Nagios. I don't have it with any other plugin.
Can you run the ruby script and compare to the 'dnf updateinfo' cmd ? Post both please.
This is so strange. Here's my console.
This command was run 45 minutes ago: .# dnf updateinfo Last metadata expiration check performed 2:50:28 ago on Mon Sep 21 11:20:07 2015.
On the line immediately below, I ran: .# ./check_dnf.rb WARNING - 1 bugfix update(s) to do.
And then this, per your request: .# dnf updateinfo Last metadata expiration check performed 0:45:41 ago on Mon Sep 21 14:20:29 2015. Updates Information Summary: available 1 Bugfix notice(s)
But it wasn't reporting that when I submitted the above comment. I don't get it...
Okay, here's another one.
My nagios console and email from 10 minutes ago (and is still current, nagios says this issue has persisted for 13+ minutes): * Nagios *
Notification Type: PROBLEM
Service: dnf Host: localhost Address: 127.0.0.1 State: CRITICAL
Date/Time: Thu Sept 24 08:01:03 EDT 2015
Additional Info:
CRITICAL - 3 security, 15 bugfix, 2 enhancement update(s) to do.
In the last 5 minutes, I ran the below:
.# ./check_dnf.rb OK - DNF shows no updates to do .# dnf updateinfo Last metadata expiration check performed 2:41:51 ago on Thu Sep 24 05:25:45 2015.
Is this helping at all?
Now that I reran the code, it is synching up. But I don't think it's a nagios issue sync issue. It's a sync issue, but I'm not sure if the root of it is nagios.
.# dnf updateinfo Last metadata expiration check performed 1:38:43 ago on Thu Sep 24 11:26:24 2015. Updates Information Summary: available 3 Security notice(s) 15 Bugfix notice(s) 2 Enhancement notice(s)
Would be super cool if someone else was using the code who can tell me if this is happening to them... :/
Another. I got a nagios warning email from check_dnf just now.
* Nagios *
Notification Type: PROBLEM
Service: dnf Host: localhost Address: 127.0.0.1 State: WARNING
Date/Time: Fri Sept 25 08:19:03 EDT 2015
Additional Info:
WARNING - 6 bugfix, 2 enhancement update(s) to do.
And again, dnf on my system says NOTHING TO DO (inclusive of this script).
.# /usr/lib64/nagios/plugins/check_dnf.rb OK - DNF shows no updates to do
I don't mean to keep revisiting this issue but do you have any insights why your code reports something on Nagios that the code on the command line is not reporting yet? If I can't get it to work, I guess I will just disable this script. But right now the Nagios display is what will be happening in the "future" and I can imagine that dnf updateinfo will soon show the same results.
(Right now, it's all "nothing to do" again though. I can paste the results but I already did yesterday and I don't want to keep pasting the same thing since you stopped responding :) )
UPDATE: I just typed the above. It took me 3 minutes. Now I typed dnf updateinfo literally 3 minutes later (less, see below) and it reports the 6 bugfix and 2 enhancement updates.
Nagios email pasted above: 13 minutes ago.
3 minutes ago: Last metadata expiration check performed 2:56:26 ago on Fri Sep 25 05:27:49 2015. Dependencies resolved. Nothing to do. Complete!
NOW: Last metadata expiration check performed 0:02:45 ago on Fri Sep 25 08:28:07 2015. Updates Information Summary: available 6 Bugfix notice(s) 2 Enhancement notice(s)
See, 2 minutes and 45 seconds later, it is reporting this issue. What do you think the issue is? Your script? dnf's fickle behavior?
One more note: dnf updateinfo doesn't seem to always "kick in," which is a dnf flaw, not a check_dnf.rb flaw. See below for more on this. Is there a way to force a metadata expiration check since clearly using an old metadata expiration check from 3 hours ago is not going to result in something that nagios just started reporting as an issue 13 minutes ago?
$ dnf update Last metadata expiration check performed 2:54:38 ago on Fri Sep 25 05:27:49 2015. Dependencies resolved. Nothing to do. Complete! $ dnf updateinfo Last metadata expiration check performed 2:54:54 ago on Fri Sep 25 05:27:49 2015. $ dnf update Last metadata expiration check performed 2:55:21 ago on Fri Sep 25 05:27:49 2015. Dependencies resolved. Nothing to do. Complete! $ dnf updateinfo Last metadata expiration check performed 2:55:28 ago on Fri Sep 25 05:27:49 2015. $ dnf update Last metadata expiration check performed 2:55:52 ago on Fri Sep 25 05:27:49 2015. Dependencies resolved. Nothing to do. Complete! $ /usr/lib64/nagios/plugins/check_dnf.rb OK - DNF shows no updates to do $ dnf update Last metadata expiration check performed 2:56:26 ago on Fri Sep 25 05:27:49 2015. Dependencies resolved. Nothing to do.
Weird, eh? Hopefully this update clarifies the issue more.
Ok I am trying to understand the bunch of things you're posting. From what I get, check_dnf.rb seems to be in adequation with what dnf reports, right ? Like you say, it seems to be a sync issue between DNF and the checks done on Nagios. I am not a DNF expert, I just scripted this to get reports, so you will have to consult "man dnf" for more info on forcing metadata. But, if it is happening again, could you check / post :
Nagios 'can't predict futur' but maybe dnf keeps separate metadata for each users... ?
Ha sorry, thanks for reopening.
From what I get, check_dnf.rb seems to be in adequation with what dnf reports, right ?
Yes, but let me be clear that check_dnf.rb's command line output is different than what Nagios reports--until it catches up once the metadata expiration check refreshes.
So to be clear:
1) the nagios output was pasted above today and yesterday. Both checks happened at 08:19 and reported that there were some updates/bugfixes.
2) the dnf updateinfo command shows "nothing to do" until the metadata expiration check refreshes (this is the key point!!!) - so even if I do it MINUTES after the nagios email, it still says "nothing to do"
3) the check_dnf.rb commands I ran today were all done around 08:23. They reported "nothing to do"
In other words: Nagios (web interface + email) had a refreshed metadata expiration that did NOT sync with the command line output. Only a few minutes later when the metadata expiration information caught up did I get the same results that Nagios reported.
It seems to me that the output on the command line is different than the Nagios output, and I think that's because I need to use some --force argument (that doesn't exist for dnf) to update the metadata.
Does that make sense? I feel like I have to do some sort of screencast next time, but I may prefer to send that privately! :)
I can't see any reasons why dnf metadata would be different except that you run nagios check under a different user than the user you were at the time of typing your commands in the terminal. Please try using the same user (preferable the nagios user). Good luck !
Cool, yeah, I will give it a shot when it recurs. Good call, thanks.
Hey, question - if the nagios user is /sbin/nologin, how do I test the code on that user? I'm seeing the issue right now again.
It surely aint. This is the default shell to start When user login. Nologin means no shell, the user can only run the specified command in the SSH connection. On your fedora, there should be some kind of 'nagios' user in /etc/passwd. To launch cmd under different user : http://www.cyberciti.biz/open-source/command-line-hacks/linux-run-command-as-different-user/
I know how to do that, but I can't if the account is unavailable, no? (sorry, this is a newbie q...)
.# runuser -l nagios 'dnf update' This account is currently not available.
Google is your friend ! su nagios -s /bin/sh -c cmd
Oh, I googled, but clearly wasn't using the right search terms. :)
So I'm getting nagios emails for check_dnf again. As root, no updates are needed. .# /usr/lib64/nagios/plugins/check_dnf.rb OK - DNF shows no updates to do
As nagios, you are correct, it's reporting different data. .# su nagios -s /bin/sh -c '/usr/lib64/nagios/plugins/check_dnf.rb' WARNING - 1 enhancement update(s) to do.
Do you have suggestions on how to synchronize the data? I don't understand why they'd be so different.
Try looking : http://dnf.readthedocs.org/en/latest/command_ref.html An option such as --refresh might come handy , try also the clean command. Let me know !
Or maybe I'm doing it wrong.
DNF on Nagios reports: CRITICAL - 1 security, 3 bugfix, 4 enhancement update(s) to do. (I'm pulling via check_dnf.rb -t 1000)
but command line ./check_dnf.rb shows: OK - DNF shows no updates to do
and dnf update / dnf upgrade both say "nothing to do."
Ruby is not my language, and I can't figure out dnf command is causing this Nagios report. But I'd really like to get it fixed...