nobody43 / zabbix-smartmontools

Disk SMART monitoring for Linux, FreeBSD and Windows. LLD, trapper.
The Unlicense
54 stars 19 forks source link

Error status? #42

Closed killmasta93 closed 3 years ago

killmasta93 commented 3 years ago

Describe the bug Hi currently checking the data and saw all the disks on status the error code 4

  | ERR_CODE_4

Screenshots image

i get this information


smartctl -x /dev/sda
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.34-1-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST91000640SS
Revision:             0002
Compliance:           SPC-4
User Capacity:        1,000,204,886,016 bytes [1.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x5000c50041ae521b
Serial number:        9XG1MJTN00009241K7QJ
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sun Nov 22 17:50:49 2020 -05
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
Read Cache is:        Enabled
Writeback Cache is:   Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     36 C
Drive Trip Temperature:        68 C

Manufactured in week 18 of year 2012
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  134
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  135
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 351780005
  Blocks received from initiator = 2789839980
  Blocks read from cache and sent to initiator = 378387119
  Number of read and write commands whose size <= segment size = 69984702
  Number of read and write commands whose size > segment size = 42

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 51959.42
  number of minutes until next internal SMART test = 26

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   2483882562        0         0  2483882562          0       8976.204           0
write:         0        0         0         0          0       3648.934           0

Non-medium error count:        0

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
No Self-tests have been logged

Background scan results log
  Status: waiting until BMS interval timer expires
    Accumulated power on time, hours:minutes 51959:25 [3117565 minutes]
    Number of background scans performed: 377,  scan progress: 0.00%
    Number of background medium scans performed: 377

   #  when        lba(hex)    [sk,asc,ascq]    reassign_status
   1    8:50  00000000006cd99c  [1,17,1]   Recovered via rewrite in-place
   2   10:55  00000000287e1e0c  [1,17,1]   Recovered via rewrite in-place
   3   24:30  000000002fe36400  [1,17,1]   Recovered via rewrite in-place
   4   24:52  000000003ca41600  [1,17,1]   Recovered via rewrite in-place
   5 3053:13  0000000000000a7d  [1,18,7]   Recovered via rewrite in-place
   6 5194:26  00000000258ae919  [1,18,7]   Recovered via rewrite in-place
   7 8170:50  000000000be6faba  [1,18,7]   Recovered via rewrite in-place
   8 11943:20  000000006cf08d76  [1,17,1]   Recovered via rewrite in-place
   9 12914:06  0000000017e30644  [1,17,1]   Recovered via rewrite in-place
  10 13970:35  00000000261cff10  [1,18,7]   Recovered via rewrite in-place
  11 13975:55  0000000027c3b4a8  [1,18,7]   Recovered via rewrite in-place
  12 14949:08  0000000015f2c800  [1,17,1]   Recovered via rewrite in-place
  13 14965:12  000000000d244e5b  [1,17,1]   Recovered via rewrite in-place
  14 15013:03  000000000d5e6400  [1,17,1]   Recovered via rewrite in-place
  15 15232:16  0000000016bfb400  [1,17,1]   Recovered via rewrite in-place
  16 15706:39  000000000e85fa00  [1,17,1]   Recovered via rewrite in-place
  17 17264:29  0000000028cffa15  [1,17,1]   Recovered via rewrite in-place
  18 17283:53  000000000be6fab9  [1,18,7]   Recovered via rewrite in-place
  19 17335:16  000000002a95d5e3  [1,17,1]   Recovered via rewrite in-place
  20 18302:42  0000000070fc90cd  [1,17,1]   Recovered via rewrite in-place
  21 18835:41  0000000016670400  [1,17,1]   Recovered via rewrite in-place
  22 19103:50  0000000016dea000  [1,17,1]   Recovered via rewrite in-place
  23 21621:55  00000000732428a5  [1,17,1]   Recovered via rewrite in-place
  24 24443:10  00000000170a6440  [1,18,7]   Recovered via rewrite in-place
  25 25773:17  000000000be6faba  [1,18,7]   Recovered via rewrite in-place
  26 29539:55  0000000000000a7d  [1,18,7]   Recovered via rewrite in-place
  27 31899:54  0000000004936321  [1,17,1]   Recovered via rewrite in-place
  28 33537:48  000000000be6fabb  [1,18,7]   Recovered via rewrite in-place
  29 33571:15  00000000268da000  [1,17,1]   Recovered via rewrite in-place
  30 35630:44  0000000008931e1c  [1,17,1]   Recovered via rewrite in-place
  31 36962:09  000000001e24405e  [1,17,1]   Recovered via rewrite in-place
  32 38722:58  0000000006a5479e  [1,18,7]   Recovered via rewrite in-place
  33 39932:13  000000001323a008  [1,18,7]   Recovered via rewrite in-place
  34 40127:22  00000000287429bf  [1,18,7]   Recovered via rewrite in-place
  35 42273:45  000000000da64e17  [1,18,7]   Recovered via rewrite in-place
  36 43306:20  0000000000000a7d  [1,18,7]   Recovered via rewrite in-place
  37 43591:06  000000000da64e17  [1,18,7]   Recovered via rewrite in-place
  38 43948:58  0000000035f0a391  [1,17,1]   Recovered via rewrite in-place
  39 45119:54  000000000d9b7400  [1,17,1]   Recovered via rewrite in-place
  40 45363:48  000000000ccaf602  [1,17,1]   Recovered via rewrite in-place
  41 45669:58  00000000667e655f  [1,17,1]   Recovered via rewrite in-place
  42 45671:50  0000000005370c19  [1,17,1]   Recovered via rewrite in-place
  43 46201:56  00000000071fba1e  [1,17,1]   Recovered via rewrite in-place
  44 49123:07  0000000057640932  [1,17,1]   Recovered via rewrite in-place
  45 49137:19  00000000092eb600  [1,17,1]   Recovered via rewrite in-place
  46 49534:49  000000000b2c5600  [1,17,1]   Recovered via rewrite in-place
  47 49706:53  000000002e903a00  [1,17,1]   Recovered via rewrite in-place
  48 49891:33  000000000d198800  [1,17,1]   Recovered via rewrite in-place
  49 50176:44  000000006d8d6bf4  [1,17,1]   Recovered via rewrite in-place
  50 50211:59  0000000016833400  [1,17,1]   Recovered via rewrite in-place
  51 50228:59  000000003c10c200  [1,17,1]   Recovered via rewrite in-place
  52 50326:10  00000000261d9600  [1,17,1]   Recovered via rewrite in-place
  53 50473:20  0000000008379397  [1,17,1]   Recovered via rewrite in-place
  54 50759:46  000000001eee4600  [1,17,1]   Recovered via rewrite in-place
  55 50979:45  0000000036405a00  [1,17,1]   Recovered via rewrite in-place
  56 50979:48  0000000036405a00  [1,17,1]   Recovered via rewrite in-place
  57 51146:09  0000000036918a00  [1,17,1]   Recovered via rewrite in-place
  58 51161:25  0000000036bb2000  [1,17,1]   Recovered via rewrite in-place
  59 51231:12  000000006aa774a8  [1,17,1]   Recovered via rewrite in-place
  60 51784:04  000000003c3f5400  [1,17,1]   Recovered via rewrite in-place
  61 51813:44  000000002e189800  [1,17,1]   Recovered via rewrite in-place
  62 51814:09  000000002e406200  [1,17,1]   Recovered via rewrite in-place

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 4
  number of phys = 1
  phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: unknown
    reason: loss of dword synchronization
    negotiated logical link rate: phy enabled; 6 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c50041ae5219
    attached SAS address = 0x5782bcb052622204
    attached phy identifier = 7
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 10
    Phy reset problem = 0
    Phy event descriptors:
     Invalid word count: 0
     Running disparity error count: 0
     Loss of dword synchronization count: 10
     Phy reset problem count: 0
relative target port id = 2
  generation code = 4
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 1.5 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c50041ae521a
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0
    Phy event descriptors:
     Invalid word count: 0
     Running disparity error count: 0
     Loss of dword synchronization count: 0
     Phy reset problem count: 0

Thank you

nobody43 commented 3 years ago

Some SMART or other ATA command to the disk failed

It's a common disk-related thing. To verify that it's not a bug, run: smartctl -x /dev/sda; echo $?

killmasta93 commented 3 years ago

thanks for the reply this is the outcome i got, so when it says status error what does that mean?

smartctl -x /dev/sda; echo $?
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.34-1-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST91000640SS
Revision:             0002
Compliance:           SPC-4
User Capacity:        1,000,204,886,016 bytes [1.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x5000c50041ae521b
Serial number:        9XG1MJTN00009241K7QJ
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sun Nov 22 18:19:28 2020 -05
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
Read Cache is:        Enabled
Writeback Cache is:   Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     36 C
Drive Trip Temperature:        68 C

Manufactured in week 18 of year 2012
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  134
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  135
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 360877201
  Blocks received from initiator = 2794696420
  Blocks read from cache and sent to initiator = 378445983
  Number of read and write commands whose size <= segment size = 70025532
  Number of read and write commands whose size > segment size = 42

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 51959.90
  number of minutes until next internal SMART test = 58

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   2498404395        0         0  2498404395          0       8980.862           0
write:         0        0         0         0          0       3651.434           0

Non-medium error count:        0

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
No Self-tests have been logged

Background scan results log
  Status: waiting until BMS interval timer expires
    Accumulated power on time, hours:minutes 51959:54 [3117594 minutes]
    Number of background scans performed: 377,  scan progress: 0.00%
    Number of background medium scans performed: 377

   #  when        lba(hex)    [sk,asc,ascq]    reassign_status
   1    8:50  00000000006cd99c  [1,17,1]   Recovered via rewrite in-place
   2   10:55  00000000287e1e0c  [1,17,1]   Recovered via rewrite in-place
   3   24:30  000000002fe36400  [1,17,1]   Recovered via rewrite in-place
   4   24:52  000000003ca41600  [1,17,1]   Recovered via rewrite in-place
   5 3053:13  0000000000000a7d  [1,18,7]   Recovered via rewrite in-place
   6 5194:26  00000000258ae919  [1,18,7]   Recovered via rewrite in-place
   7 8170:50  000000000be6faba  [1,18,7]   Recovered via rewrite in-place
   8 11943:20  000000006cf08d76  [1,17,1]   Recovered via rewrite in-place
   9 12914:06  0000000017e30644  [1,17,1]   Recovered via rewrite in-place
  10 13970:35  00000000261cff10  [1,18,7]   Recovered via rewrite in-place
  11 13975:55  0000000027c3b4a8  [1,18,7]   Recovered via rewrite in-place
  12 14949:08  0000000015f2c800  [1,17,1]   Recovered via rewrite in-place
  13 14965:12  000000000d244e5b  [1,17,1]   Recovered via rewrite in-place
  14 15013:03  000000000d5e6400  [1,17,1]   Recovered via rewrite in-place
  15 15232:16  0000000016bfb400  [1,17,1]   Recovered via rewrite in-place
  16 15706:39  000000000e85fa00  [1,17,1]   Recovered via rewrite in-place
  17 17264:29  0000000028cffa15  [1,17,1]   Recovered via rewrite in-place
  18 17283:53  000000000be6fab9  [1,18,7]   Recovered via rewrite in-place
  19 17335:16  000000002a95d5e3  [1,17,1]   Recovered via rewrite in-place
  20 18302:42  0000000070fc90cd  [1,17,1]   Recovered via rewrite in-place
  21 18835:41  0000000016670400  [1,17,1]   Recovered via rewrite in-place
  22 19103:50  0000000016dea000  [1,17,1]   Recovered via rewrite in-place
  23 21621:55  00000000732428a5  [1,17,1]   Recovered via rewrite in-place
  24 24443:10  00000000170a6440  [1,18,7]   Recovered via rewrite in-place
  25 25773:17  000000000be6faba  [1,18,7]   Recovered via rewrite in-place
  26 29539:55  0000000000000a7d  [1,18,7]   Recovered via rewrite in-place
  27 31899:54  0000000004936321  [1,17,1]   Recovered via rewrite in-place
  28 33537:48  000000000be6fabb  [1,18,7]   Recovered via rewrite in-place
  29 33571:15  00000000268da000  [1,17,1]   Recovered via rewrite in-place
  30 35630:44  0000000008931e1c  [1,17,1]   Recovered via rewrite in-place
  31 36962:09  000000001e24405e  [1,17,1]   Recovered via rewrite in-place
  32 38722:58  0000000006a5479e  [1,18,7]   Recovered via rewrite in-place
  33 39932:13  000000001323a008  [1,18,7]   Recovered via rewrite in-place
  34 40127:22  00000000287429bf  [1,18,7]   Recovered via rewrite in-place
  35 42273:45  000000000da64e17  [1,18,7]   Recovered via rewrite in-place
  36 43306:20  0000000000000a7d  [1,18,7]   Recovered via rewrite in-place
  37 43591:06  000000000da64e17  [1,18,7]   Recovered via rewrite in-place
  38 43948:58  0000000035f0a391  [1,17,1]   Recovered via rewrite in-place
  39 45119:54  000000000d9b7400  [1,17,1]   Recovered via rewrite in-place
  40 45363:48  000000000ccaf602  [1,17,1]   Recovered via rewrite in-place
  41 45669:58  00000000667e655f  [1,17,1]   Recovered via rewrite in-place
  42 45671:50  0000000005370c19  [1,17,1]   Recovered via rewrite in-place
  43 46201:56  00000000071fba1e  [1,17,1]   Recovered via rewrite in-place
  44 49123:07  0000000057640932  [1,17,1]   Recovered via rewrite in-place
  45 49137:19  00000000092eb600  [1,17,1]   Recovered via rewrite in-place
  46 49534:49  000000000b2c5600  [1,17,1]   Recovered via rewrite in-place
  47 49706:53  000000002e903a00  [1,17,1]   Recovered via rewrite in-place
  48 49891:33  000000000d198800  [1,17,1]   Recovered via rewrite in-place
  49 50176:44  000000006d8d6bf4  [1,17,1]   Recovered via rewrite in-place
  50 50211:59  0000000016833400  [1,17,1]   Recovered via rewrite in-place
  51 50228:59  000000003c10c200  [1,17,1]   Recovered via rewrite in-place
  52 50326:10  00000000261d9600  [1,17,1]   Recovered via rewrite in-place
  53 50473:20  0000000008379397  [1,17,1]   Recovered via rewrite in-place
  54 50759:46  000000001eee4600  [1,17,1]   Recovered via rewrite in-place
  55 50979:45  0000000036405a00  [1,17,1]   Recovered via rewrite in-place
  56 50979:48  0000000036405a00  [1,17,1]   Recovered via rewrite in-place
  57 51146:09  0000000036918a00  [1,17,1]   Recovered via rewrite in-place
  58 51161:25  0000000036bb2000  [1,17,1]   Recovered via rewrite in-place
  59 51231:12  000000006aa774a8  [1,17,1]   Recovered via rewrite in-place
  60 51784:04  000000003c3f5400  [1,17,1]   Recovered via rewrite in-place
  61 51813:44  000000002e189800  [1,17,1]   Recovered via rewrite in-place
  62 51814:09  000000002e406200  [1,17,1]   Recovered via rewrite in-place

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 4
  number of phys = 1
  phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: unknown
    reason: loss of dword synchronization
    negotiated logical link rate: phy enabled; 6 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c50041ae5219
    attached SAS address = 0x5782bcb052622204
    attached phy identifier = 7
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 10
    Phy reset problem = 0
    Phy event descriptors:
     Invalid word count: 0
     Running disparity error count: 0
     Loss of dword synchronization count: 10
     Phy reset problem count: 0
relative target port id = 2
  generation code = 4
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 1.5 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c50041ae521a
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0
    Phy event descriptors:
     Invalid word count: 0
     Running disparity error count: 0
     Loss of dword synchronization count: 0
     Phy reset problem count: 0

0
nobody43 commented 3 years ago

Hm. Wrong disk. Try: smartctl -x /dev/sdb; echo $?

killmasta93 commented 3 years ago

thanks for the quick reply, this is the outcome

smartctl -x /dev/sdb; echo $?
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.34-1-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST91000640SS
Revision:             0001
Compliance:           SPC-3
User Capacity:        1,000,204,886,016 bytes [1.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x5000c50033e0be0f
Serial number:        9XG0510S00009129VN7B
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sun Nov 22 18:23:35 2020 -05
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled
Read Cache is:        Enabled
Writeback Cache is:   Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature:     0 C
Drive Trip Temperature:        0 C

Elements in grown defect list: 0

Error Counter logging not supported

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging
Device does not support Background scan results logging
4
nobody43 commented 3 years ago

Last three lines is your answer. You could acknowledge the problem or disable it for this disk (I recommend serial-based identification before doing that).

killmasta93 commented 3 years ago

thanks for the reply, so how can i know a disk is about to fail? i would look at the Elements in grown defect list?

Thank you

nobody43 commented 3 years ago

That's correct. Also Non-medium error count and disk log (separate trigger).

killmasta93 commented 3 years ago

thanks for the reply, i do have a few other servers with Non-medium error count i was reading a bit that may be a false alarm but not sure from your experience what have you seen? as for the disk log i saw this one time


  | sdb: The device error log contains records of errors
-- | --

man smartctl:
Bit 6: The device error log contains records of errors.

but i checked with crystal disk but its SATA and not SAS

image

nobody43 commented 3 years ago

The device could operate for years with errors in the log, as well as could fail tomorrow. Most concerning parameter is Reallocated Sectors: if there's a high raise - the device most likely fail.