ilovepancakes95 / idrac_snmp-grafana

SNMP Based Dashboard to Monitor Dell Hosts via iDRAC
https://grafana.com/grafana/dashboards/12106
Other
139 stars 36 forks source link

Values dissapearing #17

Closed chench00 closed 3 years ago

chench00 commented 3 years ago

I noticed that recently some values started disappearing such as system watts, CPU temp and physical disk status. The weird thing is I am monitoring a total of 3 hosts and two are showing the issue.

Any ideas? Nothing has changed besides usual updates to idrac, grafana, telegraf etc.

Thanks for your hard work.

ilovepancakes95 commented 3 years ago

On the two systems that don't work anymore, if you run snmpwalk commands on them to try and see the raw values for system watts, CPU, temp, etc. what do you get from that command? Is iDRAC on the same exact version on each system? How about BIOS version?

chench00 commented 3 years ago

@ilovepancakes95 Apologies for the late reply. I am using a MIB browser and when performing a walk on the servers that are having issues I don't see any CPU values/system watts etc. But then again, there are a ton of values displayed that it is even hard to sort through them. Do you recommend a better way to run a quick test?

ilovepancakes95 commented 3 years ago

Are you using the MIB files from Dell to view what each OID actually is in the MIB browser? That would be the fastest way to tell what is what? Sometimes, even a slight change in any firmware (iDRAC, BIOS, etc.) causes OID numbers to completely change for certain values. Wonder if this is what is happening to you. I have indeed even seen two systems on EXACT same versions of everything yet the OID numbers differ between them. I have spoken to Dell support about this and they had no idea why that happens.

chench00 commented 3 years ago

So I found the correct Dell MIBs and I cannot pull information for storage from two hosts but the third one works just fine even though it's running the same BIOS and iDrac version as the other non working host.

I have indeed even seen two systems on EXACT same versions of everything yet the OID numbers differ between them. I have spoken to Dell support about this and they had no idea why that happens.

This is probably exactly my issue. I assume no resolution for the time being? I have yet to restart iDrac but doubt that would solve it.

ilovepancakes95 commented 3 years ago

I assume no resolution for the time being?

Not from a dashboard standpoint. When the similar issue happened to me, the data didn't just completely disappear from MIB browser scans, it just reported under a different OID than the other systems. So, if you can't get it to all match between systems, you could create a new snmp input section in the telegraf config just for that one system and change the OID under that config for that IP to the OIDs that match properly. Not a clean way to make it work, but it would make it work, as long as snmp commands to the idrac are working properly.

If you are saying the data just doesn't show up anymore under any OID, then that just sounds like a problem with that one idrac. Try reloading the firmware onto the idrac or try a complete reset of it too.