v-zhuravlev / zbx-smartctl

Templates and scripts for monitoring disks health with Zabbix and smartmontools
https://share.zabbix.com/storage-devices/smartmontools/smart-monitoring-with-smartmontools-lld
GNU General Public License v3.0
245 stars 127 forks source link

Incorrect wearout on WD Blue SSD #148

Open dannytech opened 4 years ago

dannytech commented 4 years ago

On my WD Blue 2TB drive, the 230 Media_Wearout_Indicator attribute contains the wearout percentage (percentage used) rather than the available spare. On my relatively new drive, this means that the wearout is 1%, while all my other drives would instead report having 99% available spare. This is triggering wearout warnings because Zabbix believes the spare is nearly exhausted. I'm including the S.M.A.R.T. data below because I'm not at all sure where I could start on setting up a condition for this (I don't have another drive that's similar enough to tell the differences):

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.60-1-pve] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     WD Blue and Green SSDs
Device Model:     WDC  WDS200T2B0A-00SM50
Serial Number:    XXXXXXXXXXXX
LU WWN Device Id: X XXXXXX XXXXXXXXX
Firmware Version: 411040WD
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Sep 29 22:09:26 2020 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   ---    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   ---    Old_age   Always       -       4057
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       36
165 Block_Erase_Count       0x0032   100   100   ---    Old_age   Always       -       24707331
166 Minimum_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       1
167 Max_Bad_Blocks_per_Die  0x0032   100   100   ---    Old_age   Always       -       75
168 Maximum_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       10
169 Total_Bad_Blocks        0x0032   100   100   ---    Old_age   Always       -       968
170 Grown_Bad_Blocks        0x0032   100   100   ---    Old_age   Always       -       0
171 Program_Fail_Count      0x0032   100   100   ---    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   ---    Old_age   Always       -       0
173 Average_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       2
174 Unexpected_Power_Loss   0x0032   100   100   ---    Old_age   Always       -       1
184 End-to-End_Error        0x0032   100   100   ---    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   ---    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   ---    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   065   048   ---    Old_age   Always       -       35 (Min/Max 22/48)
199 UDMA_CRC_Error_Count    0x0032   100   100   ---    Old_age   Always       -       0
230 Media_Wearout_Indicator 0x0032   001   001   ---    Old_age   Always       -       0x003300140033
232 Available_Reservd_Space 0x0033   100   100   004    Pre-fail  Always       -       100
233 NAND_GB_Written_TLC     0x0032   100   100   ---    Old_age   Always       -       5231
234 NAND_GB_Written_SLC     0x0032   100   100   ---    Old_age   Always       -       8115
241 Host_Writes_GiB         0x0030   253   253   ---    Old_age   Offline      -       7377
242 Host_Reads_GiB          0x0030   253   253   ---    Old_age   Offline      -       13497
244 Temp_Throttle_Status    0x0032   000   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged
Lillecarl commented 3 years ago

This is a SMART issue, should be resolved by smartmontools or the drive manufacturer possibly. Disable the wear check and rely on the selft assesment test instead.

chrfranke commented 3 years ago

Followup from smartmontools-support ML: https://listi.jpberlin.de/pipermail/smartmontools-support/2021-February/000580.html