v-zhuravlev / zbx-smartctl

Templates and scripts for monitoring disks health with Zabbix and smartmontools
https://share.zabbix.com/storage-devices/smartmontools/smart-monitoring-with-smartmontools-lld
GNU General Public License v3.0
245 stars 127 forks source link

smart support not detected #101

Closed nerijus closed 5 years ago

nerijus commented 5 years ago

smart support is not detected for such drive:

smartctl -i -H -A -l error -l background -d megaraid,3  /dev/sdb
smartctl 5.43 2016-09-28 r4347 [x86_64-linux-2.6.32-754.14.2.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               SEAGATE 
Product:              ST4000NM0023    
Revision:             0006
User Capacity:        4.000.787.030.016 bytes [4,00 TB]
Logical block size:   512 bytes
Logical Unit id:      0x5000c5008584b023
Serial number:        Z1ZB1QCV0000R642KQ5C
Device type:          disk
Transport protocol:   SAS
Local Time is:        Sat Jun 15 20:39:27 2019 EEST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

as we can see:

smartctl-disks-discovery.pl.orig /dev/sdb_-d_megaraid,3{
    "data":[
        {
            "{#DISKMODEL}":"SEAGATE  ST4000NM0023    ",
            "{#DISKSN}":"Z1ZB1QCV0000R642KQ5C",
            "{#DISKNAME}":"/dev/sdb",
            "{#DISKCMD}":"/dev/sdb -d megaraid,3",
            "{#SMART_ENABLED}":"0",
            "{#DISKTYPE}":"2"
        },
...

The following diff helps:

--- smartctl-disks-discovery.pl.orig    2019-06-14 19:08:21.000000000 +0300
+++ smartctl-disks-discovery.pl 2019-06-15 18:33:38.885889715 +0300
@@ -165,7 +165,7 @@
     foreach my $line (@smartctl_output) {
         #foreach my $line ($testline) {
         #print $line;
-        if ( $line =~ /^SMART.+?: +(.+)$/ ) {
+        if ( $line =~ /^Device supports SMART.+? +(.+)$/ ) {

             if ( $1 =~ /Enabled/ ) {
                 $disk->{smart_enabled} = 1;

With the patched script:

smartctl-disks-discovery.pl /dev/sdb_-d_megaraid,3
{
    "data":[
        {
            "{#DISKMODEL}":"SEAGATE  ST4000NM0023    ",
            "{#DISKSN}":"Z1ZB1QCV0000R642KQ5C",
            "{#DISKNAME}":"/dev/sdb",
            "{#DISKCMD}":"/dev/sdb -d megaraid,3",
            "{#SMART_ENABLED}":"1",
            "{#DISKTYPE}":"2"
        },
...
nerijus commented 5 years ago

But even with this diff drives do not appear in zabbix.

v-zhuravlev commented 5 years ago

that is because DISKTYPE detected as 2(other)

nerijus commented 5 years ago

How do I fix it?

v-zhuravlev commented 5 years ago

There is nothing that points out whether this is HDD and SSD. Do you see anything in the info?

As a workaround , you may add to LLD filter in the template: OR {#DISKMODEL} = SEAGATE ST4000NM0023.*

v-zhuravlev commented 5 years ago

@nerijus , see #102 for more universal regex that should work for other disks and this one as well.

nerijus commented 5 years ago

Where should I add it in the template?

v-zhuravlev commented 5 years ago

in Low-level discovery filter

nerijus commented 5 years ago

Sorry, found it.

nerijus commented 5 years ago

102 works ok, thank you. I did not find anything in the info which should help identify the drive as hdd, so I modified template as you suggested. Could it be possible to add such drives as exception to smartctl-disks-discovery.pl? The other drive is SEAGATE ST32000445SS. The full info of the drives:

# smartctl -i -H -A -l error -l background -d megaraid,3  /dev/sdb
smartctl 5.43 2016-09-28 r4347 [x86_64-linux-2.6.32-754.14.2.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               SEAGATE 
Product:              ST4000NM0023    
Revision:             0006
User Capacity:        4.000.787.030.016 bytes [4,00 TB]
Logical block size:   512 bytes
Logical Unit id:      0x5000c5008584b023
Serial number:        Z1ZB1QCV0000R642KQ5C
Device type:          disk
Transport protocol:   SAS
Local Time is:        Sun Jun 16 11:35:47 2019 EEST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature:     40 C
Drive Trip Temperature:        60 C
Manufactured in week 23 of year 2016
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  85
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  16426
Elements in grown defect list: 0
Vendor (Seagate) cache information
  Blocks sent to initiator = 675159426
  Blocks received from initiator = 1503331005
  Blocks read from cache and sent to initiator = 221605044
  Number of read and write commands whose size <= segment size = 14098670
  Number of read and write commands whose size > segment size = 52845
Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 22965,70
  number of minutes until next internal SMART test = 15

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   3104728929        0         0  3104728929          0       6061,134           0
write:         0        0         0         0          0      55216,422           0
verify: 1664285397        0         0  1664285397          0     124026,920           0

Non-medium error count:     3879

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']

Background scan results log
  Status: waiting until BMS interval timer expires
    Accumulated power on time, hours:minutes 22965:42 [1377942 minutes]
    Number of background scans performed: 321,  scan progress: 0,00%
    Number of background medium scans performed: 321
# smartctl -i -H -A -l error -l background -d megaraid,2  /dev/sda
smartctl 5.43 2016-09-28 r4347 [x86_64-linux-2.6.32-754.14.2.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               SEAGATE 
Product:              ST32000445SS    
Revision:             KSF4
User Capacity:        2.000.398.934.016 bytes [2,00 TB]
Logical block size:   512 bytes
Logical Unit id:      0x5000c5003429ca17
Serial number:        9WM5M8HG
Device type:          disk
Transport protocol:   SAS
Local Time is:        Sun Jun 16 11:37:28 2019 EEST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature:     37 C
Drive Trip Temperature:        68 C
Manufactured in week 18 of year 2011
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  30
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  992
Elements in grown defect list: 3
Vendor (Seagate) cache information
  Blocks sent to initiator = 2311214654
  Blocks received from initiator = 3666452158
  Blocks read from cache and sent to initiator = 2178478613
  Number of read and write commands whose size <= segment size = 397758212
  Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 61405,92
  number of minutes until next internal SMART test = 49

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   1583586311       27         0  1583586338   1583586338     114205,234           0
write:         0        0         0         0          0     344982,581           0
verify: 3467608765      123         0  3467608888   3467608888     412037,860           0

Non-medium error count:      617

Background scan results log
  Status: waiting until BMS interval timer expires
    Accumulated power on time, hours:minutes 61405:55 [3684355 minutes]
    Number of background scans performed: 863,  scan progress: 0,00%
    Number of background medium scans performed: 863

   #  when        lba(hex)    [sk,asc,ascq]    reassign_status
   1 5224:55  000000004a363e00  [1,17,1]   Recovered via rewrite in-place
   2 17964:55  0000000045560a00  [1,17,1]   Recovered via rewrite in-place
   3 24807:17  00000000446af000  [1,17,1]   Recovered via rewrite in-place
...
nerijus commented 5 years ago

BTW, there are 2 spaces between SEAGATE and ST... - "SEAGATE ST4000NM0023 ". Github eats one space.