v-zhuravlev / zbx-smartctl

Templates and scripts for monitoring disks health with Zabbix and smartmontools
https://share.zabbix.com/storage-devices/smartmontools/smart-monitoring-with-smartmontools-lld
GNU General Public License v3.0
245 stars 127 forks source link

Value is not numeric with Zabbix 3.2 server, agent and 3.0 template #66

Closed jtl999 closed 6 years ago

jtl999 commented 6 years ago

So I'm running Debian 9.x on my server/NAS. Zabbix server and agent are 3.2. I'm trying to get zbx-smartctl setup to log SMART data from my disks and fire alerts when needed. One problem I'm having is after I add the template to my server in Zabbix and adjust the interval I get this error in the Zabbix discovery item.

Cannot create item: macro "{#DISKCMD}" value is not numeric.
Cannot create item: macro "{#DISKCMD}" value is not numeric.
Cannot create item: macro "{#DISKCMD}" value is not numeric.
Cannot create item: macro "{#DISKCMD}" value is not numeric.
Cannot create item: macro "{#DISKCMD}" value is not numeric.

I added the proper sudoers entry in a file /etc/sudoers.d/sudoers_zabbix_smartctl and tested running sudo /etc/zabbix/scripts/smartctl-disks-discovery.pl as the zabbix user and it works.

Thanks

v-zhuravlev commented 6 years ago

Very strange error. I wonder where it came from. {#DISKCMD} is only used in item prototype keys and triggers expressions, these fields can be anything, there is no special restriction about being numeric or not. https://github.com/v-zhuravlev/zbx-smartctl/blob/master/Template_3.0_HDD_SMARTMONTOOLS_2_WITH_LLD.xml

v-zhuravlev commented 6 years ago

Is this about calculated items maybe?

jtl999 commented 6 years ago

Took me a while to remember and I should've not posted the issue until then, but I previous had an earlier version of zbx-smartctl installed from May on this server (commit: d61ceaf8e2e23db9973311942367a1366a880042 I think) and randomly a few weeks ago with no changes to the system it stopped collecting data. :/

Wonder what I should look at next.

lukaskaplan commented 6 years ago

I have same problem.

lukaskaplan commented 6 years ago

I tried to copy template and remove all trigger and item prototypes. I left there only item with key uHDD.health.["{#DISKCMD}"]. This works without error. So I will try to add other item prototypes and let you know which is problematic.

jtl999 commented 6 years ago

I took a look at the Zabbix agent with the discovery with verbose debug logging and all the data sent to the Zabbix server seemed in order.

I will take a look at the server later.

jtl999 commented 6 years ago

I tried using DebugLevel=5with the Zabbix server and the data I got wasn't much helpful then with the agent.

An excerpt of the data received from the agent

        "data":[

                {
                        "{#DISKMODEL}":"Crucial_CT250MX200SSD1",
                        "{#DISKSN}":"15080ECB81BF",
                        "{#DISKNAME}":"/dev/sda",
                        "{#DISKCMD}":"/dev/sda -d sat",
                        "{#SMART_ENABLED}":"1",
                        "{#DISKTYPE}":"1"
                },
                {
                        "{#DISKMODEL}":"WDC WD30EFRX-68EUZN0",
                        "{#DISKSN}":"WD-WCC4N3JX6X94",
                        "{#DISKNAME}":"/dev/sdb",
                        "{#DISKCMD}":"/dev/sdb -d sat",
                        "{#SMART_ENABLED}":"1",
                        "{#DISKTYPE}":"0"
                },
                {
                        "{#DISKMODEL}":"WDC WD30EFRX-68EUZN0",
                        "{#DISKSN}":"WD-WMC4N0NATWMF",
                        "{#DISKNAME}":"/dev/sdc",
                        "{#DISKCMD}":"/dev/sdc -d sat",
                        "{#SMART_ENABLED}":"1",
                        "{#DISKTYPE}":"0"
                },

An excerpt of the "Value is not numeric" error from the server

 30456:20180902:210008.386 query [txnlev:1] [update items set error='Cannot create item: macro "{#DISKCMD}" value is not numeric.
Cannot create item: macro "{#DISKCMD}" value is not numeric.
Cannot create item: macro "{#DISKCMD}" value is not numeric.
Cannot create item: macro "{#DISKCMD}" value is not numeric.
Cannot create item: macro "{#DISKCMD}" value is not numeric.
Cannot create item: macro "{#DISKCMD}" value is not numeric.
' where itemid=27067]
 30456:20180902:210008.386 query [txnlev:1] [commit;]
 30456:20180902:210008.396 End of lld_process_discovery_rule()

itemid=27067 is the discovery rule for zbx-smartctl

But why does it want #DISKCMD to be numeric?

lukaskaplan commented 6 years ago

I think that in version 3.2, calculated macros can be only numeric... https://support.zabbix.com/browse/ZBX-11700

2018-09-02 23:08 GMT+02:00 JTL notifications@github.com:

I tried using DebugLevel=5with the Zabbix server and the data I got wasn't much helpful then with the agent.

An excerpt of the data received from the agent

    "data":[

            {
                    "{#DISKMODEL}":"Crucial_CT250MX200SSD1",
                    "{#DISKSN}":"15080ECB81BF",
                    "{#DISKNAME}":"/dev/sda",
                    "{#DISKCMD}":"/dev/sda -d sat",
                    "{#SMART_ENABLED}":"1",
                    "{#DISKTYPE}":"1"
            },
            {
                    "{#DISKMODEL}":"WDC WD30EFRX-68EUZN0",
                    "{#DISKSN}":"WD-WCC4N3JX6X94",
                    "{#DISKNAME}":"/dev/sdb",
                    "{#DISKCMD}":"/dev/sdb -d sat",
                    "{#SMART_ENABLED}":"1",
                    "{#DISKTYPE}":"0"
            },
            {
                    "{#DISKMODEL}":"WDC WD30EFRX-68EUZN0",
                    "{#DISKSN}":"WD-WMC4N0NATWMF",
                    "{#DISKNAME}":"/dev/sdc",
                    "{#DISKCMD}":"/dev/sdc -d sat",
                    "{#SMART_ENABLED}":"1",
                    "{#DISKTYPE}":"0"
            },

An excerpt of the "Value is not numeric" error from the server

30456:20180902:210008.386 query [txnlev:1] [update items set error='Cannot create item: macro "{#DISKCMD}" value is not numeric. Cannot create item: macro "{#DISKCMD}" value is not numeric. Cannot create item: macro "{#DISKCMD}" value is not numeric. Cannot create item: macro "{#DISKCMD}" value is not numeric. Cannot create item: macro "{#DISKCMD}" value is not numeric. Cannot create item: macro "{#DISKCMD}" value is not numeric. ' where itemid=27067] 30456:20180902:210008.386 query [txnlev:1] [commit;] 30456:20180902:210008.396 End of lld_process_discovery_rule()

itemid=27067 is the discovery rule for zbx-smartctl

But why does it want #DISKCMD to be numeric?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/v-zhuravlev/zbx-smartctl/issues/66#issuecomment-417960059, or mute the thread https://github.com/notifications/unsubscribe-auth/ATIXPfuKOHC32CUCTxZY5JMSyLajpTYeks5uXEjngaJpZM4V7rOd .

v-zhuravlev commented 6 years ago

Interesting. Is it going to work if you delete item prototype: "{#DISKNAME}: SMART critical errors total" ?

jtl999 commented 6 years ago

Getting closer.

I deleted that item prototype and the items are created. Reallocated Sectors Count has an error for my boot SSD Received value [] is not suitable for value type [Numeric (unsigned)] and data type [Decimal]

Also the drive model and serial number are blank. Same for the WD RED drives in the same server.

The SSD model is Crucial_CT250MX200SSD1

I can include smartctl output if needed.

v-zhuravlev commented 6 years ago

Please smartctl output and also run discovery script manually and attach its output please.

v-zhuravlev commented 6 years ago

Please smartctl output and also run discovery script manually and attach its output please.

jtl999 commented 6 years ago

smartctl ran with which arguments?

v-zhuravlev commented 6 years ago

try smartctl -A /dev/sda

jtl999 commented 6 years ago

Boot SSD (the above mentioned Crucial):

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.18-1-pve] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0033   100   100   000    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       13914
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       372
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   095   095   000    Old_age   Always       -       167
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       148
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       2566
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       394
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   070   049   000    Old_age   Always       -       30 (Min/Max 10/51)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Used   0x0031   095   095   000    Pre-fail  Offline      -       5
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       8525441190
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       266953826
248 Bckgnd_Program_Page_Cnt 0x0032   100   100   000    Old_age   Always       -       1916085146

WD RED 3TB (same as the rest of the disks in this server)

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.18-1-pve] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   177   175   021    Pre-fail  Always       -       6150
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       195
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   082   082   000    Old_age   Always       -       13695
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       194
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       137
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       806
194 Temperature_Celsius     0x0022   118   113   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
lukaskaplan commented 6 years ago

For your information. I have upgraded zabbix to version 3.4 and it seems that it is working now (with 3.4 template...).

jtl999 commented 6 years ago

Might give that a try later.

jtl999 commented 6 years ago

Upgraded to 3.4, will comment later if the template works properly.

v-zhuravlev commented 6 years ago

any news?

jtl999 commented 6 years ago

Seems to be fine, data collection and triggers work again.