Closed jtl999 closed 6 years ago
Very strange error. I wonder where it came from. {#DISKCMD} is only used in item prototype keys and triggers expressions, these fields can be anything, there is no special restriction about being numeric or not. https://github.com/v-zhuravlev/zbx-smartctl/blob/master/Template_3.0_HDD_SMARTMONTOOLS_2_WITH_LLD.xml
Is this about calculated items maybe?
Took me a while to remember and I should've not posted the issue until then, but I previous had an earlier version of zbx-smartctl installed from May on this server (commit: d61ceaf8e2e23db9973311942367a1366a880042 I think) and randomly a few weeks ago with no changes to the system it stopped collecting data. :/
Wonder what I should look at next.
I have same problem.
I tried to copy template and remove all trigger and item prototypes. I left there only item with key uHDD.health.["{#DISKCMD}"]. This works without error. So I will try to add other item prototypes and let you know which is problematic.
I took a look at the Zabbix agent with the discovery with verbose debug logging and all the data sent to the Zabbix server seemed in order.
I will take a look at the server later.
I tried using DebugLevel=5
with the Zabbix server and the data I got wasn't much helpful then with the agent.
An excerpt of the data received from the agent
"data":[
{
"{#DISKMODEL}":"Crucial_CT250MX200SSD1",
"{#DISKSN}":"15080ECB81BF",
"{#DISKNAME}":"/dev/sda",
"{#DISKCMD}":"/dev/sda -d sat",
"{#SMART_ENABLED}":"1",
"{#DISKTYPE}":"1"
},
{
"{#DISKMODEL}":"WDC WD30EFRX-68EUZN0",
"{#DISKSN}":"WD-WCC4N3JX6X94",
"{#DISKNAME}":"/dev/sdb",
"{#DISKCMD}":"/dev/sdb -d sat",
"{#SMART_ENABLED}":"1",
"{#DISKTYPE}":"0"
},
{
"{#DISKMODEL}":"WDC WD30EFRX-68EUZN0",
"{#DISKSN}":"WD-WMC4N0NATWMF",
"{#DISKNAME}":"/dev/sdc",
"{#DISKCMD}":"/dev/sdc -d sat",
"{#SMART_ENABLED}":"1",
"{#DISKTYPE}":"0"
},
An excerpt of the "Value is not numeric" error from the server
30456:20180902:210008.386 query [txnlev:1] [update items set error='Cannot create item: macro "{#DISKCMD}" value is not numeric.
Cannot create item: macro "{#DISKCMD}" value is not numeric.
Cannot create item: macro "{#DISKCMD}" value is not numeric.
Cannot create item: macro "{#DISKCMD}" value is not numeric.
Cannot create item: macro "{#DISKCMD}" value is not numeric.
Cannot create item: macro "{#DISKCMD}" value is not numeric.
' where itemid=27067]
30456:20180902:210008.386 query [txnlev:1] [commit;]
30456:20180902:210008.396 End of lld_process_discovery_rule()
itemid=27067 is the discovery rule for zbx-smartctl
But why does it want #DISKCMD to be numeric?
I think that in version 3.2, calculated macros can be only numeric... https://support.zabbix.com/browse/ZBX-11700
2018-09-02 23:08 GMT+02:00 JTL notifications@github.com:
I tried using DebugLevel=5with the Zabbix server and the data I got wasn't much helpful then with the agent.
An excerpt of the data received from the agent
"data":[ { "{#DISKMODEL}":"Crucial_CT250MX200SSD1", "{#DISKSN}":"15080ECB81BF", "{#DISKNAME}":"/dev/sda", "{#DISKCMD}":"/dev/sda -d sat", "{#SMART_ENABLED}":"1", "{#DISKTYPE}":"1" }, { "{#DISKMODEL}":"WDC WD30EFRX-68EUZN0", "{#DISKSN}":"WD-WCC4N3JX6X94", "{#DISKNAME}":"/dev/sdb", "{#DISKCMD}":"/dev/sdb -d sat", "{#SMART_ENABLED}":"1", "{#DISKTYPE}":"0" }, { "{#DISKMODEL}":"WDC WD30EFRX-68EUZN0", "{#DISKSN}":"WD-WMC4N0NATWMF", "{#DISKNAME}":"/dev/sdc", "{#DISKCMD}":"/dev/sdc -d sat", "{#SMART_ENABLED}":"1", "{#DISKTYPE}":"0" },
An excerpt of the "Value is not numeric" error from the server
30456:20180902:210008.386 query [txnlev:1] [update items set error='Cannot create item: macro "{#DISKCMD}" value is not numeric. Cannot create item: macro "{#DISKCMD}" value is not numeric. Cannot create item: macro "{#DISKCMD}" value is not numeric. Cannot create item: macro "{#DISKCMD}" value is not numeric. Cannot create item: macro "{#DISKCMD}" value is not numeric. Cannot create item: macro "{#DISKCMD}" value is not numeric. ' where itemid=27067] 30456:20180902:210008.386 query [txnlev:1] [commit;] 30456:20180902:210008.396 End of lld_process_discovery_rule()
itemid=27067 is the discovery rule for zbx-smartctl
But why does it want #DISKCMD to be numeric?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/v-zhuravlev/zbx-smartctl/issues/66#issuecomment-417960059, or mute the thread https://github.com/notifications/unsubscribe-auth/ATIXPfuKOHC32CUCTxZY5JMSyLajpTYeks5uXEjngaJpZM4V7rOd .
Interesting. Is it going to work if you delete item prototype: "{#DISKNAME}: SMART critical errors total" ?
Getting closer.
I deleted that item prototype and the items are created. Reallocated Sectors Count
has an error for my boot SSD Received value [] is not suitable for value type [Numeric (unsigned)] and data type [Decimal]
Also the drive model and serial number are blank. Same for the WD RED drives in the same server.
The SSD model is Crucial_CT250MX200SSD1
I can include smartctl output if needed.
Please smartctl output and also run discovery script manually and attach its output please.
Please smartctl output and also run discovery script manually and attach its output please.
smartctl ran with which arguments?
try smartctl -A /dev/sda
Boot SSD (the above mentioned Crucial):
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.18-1-pve] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0
5 Reallocate_NAND_Blk_Cnt 0x0033 100 100 000 Pre-fail Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 13914
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 372
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 Ave_Block-Erase_Count 0x0032 095 095 000 Old_age Always - 167
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 148
180 Unused_Reserve_NAND_Blk 0x0033 000 000 000 Pre-fail Always - 2566
183 SATA_Interfac_Downshift 0x0032 100 100 000 Old_age Always - 394
184 Error_Correction_Count 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 070 049 000 Old_age Always - 30 (Min/Max 10/51)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
202 Percent_Lifetime_Used 0x0031 095 095 000 Pre-fail Offline - 5
206 Write_Error_Rate 0x000e 100 100 000 Old_age Always - 0
210 Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age Always - 0
246 Total_Host_Sector_Write 0x0032 100 100 000 Old_age Always - 8525441190
247 Host_Program_Page_Count 0x0032 100 100 000 Old_age Always - 266953826
248 Bckgnd_Program_Page_Cnt 0x0032 100 100 000 Old_age Always - 1916085146
WD RED 3TB (same as the rest of the disks in this server)
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.18-1-pve] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 177 175 021 Pre-fail Always - 6150
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 195
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 13695
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 194
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 137
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 806
194 Temperature_Celsius 0x0022 118 113 000 Old_age Always - 32
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
For your information. I have upgraded zabbix to version 3.4 and it seems that it is working now (with 3.4 template...).
Might give that a try later.
Upgraded to 3.4, will comment later if the template works properly.
any news?
Seems to be fine, data collection and triggers work again.
So I'm running Debian 9.x on my server/NAS. Zabbix server and agent are 3.2. I'm trying to get zbx-smartctl setup to log SMART data from my disks and fire alerts when needed. One problem I'm having is after I add the template to my server in Zabbix and adjust the interval I get this error in the Zabbix discovery item.
I added the proper sudoers entry in a file
/etc/sudoers.d/sudoers_zabbix_smartctl
and tested runningsudo /etc/zabbix/scripts/smartctl-disks-discovery.pl
as the zabbix user and it works.Thanks