v-zhuravlev / zbx-smartctl

Templates and scripts for monitoring disks health with Zabbix and smartmontools
https://share.zabbix.com/storage-devices/smartmontools/smart-monitoring-with-smartmontools-lld
GNU General Public License v3.0
245 stars 127 forks source link

Kingston SSD wearout #118

Open demento-ru opened 5 years ago

demento-ru commented 5 years ago

"SSD wearout (<5% left)" is triggered on kingston SSD

/dev/sdc: ID 177/202/233 SSD wearout    2019-10-10 13:18:34 0 %
smartctl 7.0 2018-12-30 r4883 [x86_64-w64-mingw32-w10-b18362] (sf-7.0-1)

Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family:     Phison Driven SSDs

Device Model:     KINGSTON SA400S37960G

Serial Number:    50026B76830887D3

LU WWN Device Id: 5 0026b7 6830887d3

Firmware Version: SBFK61F1

User Capacity:    960?197?124?096 bytes [960 GB]

Sector Size:      512 bytes logical/physical

Rotation Rate:    Solid State Device

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ACS-3 T13/2161-D revision 4

SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)

Local Time is:    Thu Oct 10 13:18:33 2019 RTZ

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate     0x0032   000   100   000    Old_age   Always       -       0

9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       9

12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       7

148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0

149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0

167 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0

168 SATA_Phy_Error_Count    0x0012   100   100   000    Old_age   Always       -       0

169 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       11

170 Bad_Blk_Ct_Erl/Lat      0x0000   100   100   010    Old_age   Offline      -       0/27

172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

173 MaxAvgErase_Ct          0x0000   100   100   000    Old_age   Offline      -       1

181 Program_Fail_Cnt_Total  0x0032   100   100   000    Old_age   Always       -       0

182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      -       0

187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

192 Unsafe_Shutdown_Count   0x0012   100   100   000    Old_age   Always       -       5

194 Temperature_Celsius     0x0022   071   068   000    Old_age   Always       -       29 (Min/Max 23/32)

196 Not_In_Use              0x0032   100   100   000    Old_age   Always       -       0

199 CRC_Error_Count         0x0032   100   100   000    Old_age   Always       -       0

218 CRC_Error_Count         0x0032   100   100   000    Old_age   Always       -       327

231 SSD_Life_Left           0x0000   000   000   000    Old_age   Offline      -       100

233 Flash_Writes_GiB        0x0032   100   100   000    Old_age   Always       -       115

241 Lifetime_Writes_GiB     0x0032   100   100   000    Old_age   Always       -       223

242 Lifetime_Reads_GiB      0x0032   100   100   000    Old_age   Always       -       1

244 Average_Erase_Count     0x0000   100   100   000    Old_age   Offline      -       0

245 Max_Erase_Count         0x0000   100   100   000    Old_age   Offline      -       1

246 Total_Erase_Count       0x0000   100   100   000    Old_age   Offline      -       10336

SMART Error Log Version: 1

No Errors Logged
v-zhuravlev commented 5 years ago

000 from 231 SSD_Life_Left 0x0000 000 000 000 Old_age Offline - 100 is catched with regex: (?:(?:177 Wear_Leveling_Count|202 Percent_Lifetime_Used|233 Media_Wearout_Indicator|231 SSD_Life_Left) +0x[0-9a-z]+|Available Spare:) +([0-9]+) Disks I had before(even Kinston) have proper value of life left stored in value not in raw value. So not sure what to do here.

PlaksinAA commented 4 years ago

Unfortunately on this firmware "SBFKB1D1" SMART exhaust has been changed. Example:

Firmware Version: SBFK71E0

smartctl 7.0 2018-12-30 r4883 [x86_64-w64-mingw32-win7-sp1] (sf-7.0-1)

Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family: Phison Driven SSDs

Device Model: KINGSTON SA400S37240G

Serial Number: 50026B777602EF18

LU WWN Device Id: 5 000000 000000000

Firmware Version: SBFK71E0

User Capacity: 240?057?409?536 bytes [240 GB]

Sector Size: 512 bytes logical/physical

Rotation Rate: Solid State Device

Form Factor: 2.5 inches

Device is: In smartctl database [for details use: -P show]

ATA Version is: ACS-4 (minor revision not indicated)

SATA Version is: SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)

Local Time is: Wed Dec 25 16:50:28 2019 RTZ

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0

9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 2897

12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always - 1515

148 Unknown_Attribute 0x0000 255 255 000 Old_age Offline - 3

149 Unknown_Attribute 0x0000 255 255 000 Old_age Offline - 19

167 Unknown_Attribute 0x0022 100 100 000 Old_age Always - 0

168 SATA_Phy_Error_Count 0x0012 100 100 000 Old_age Always - 0

169 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 11

170 Bad_Blk_Ct_Erl/Lat 0x0013 100 100 010 Pre-fail Always - 0/8

172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0

173 MaxAvgErase_Ct 0x0000 100 100 000 Old_age Offline - 42 (Average 17)

181 Program_Fail_Cnt_Total 0x0012 100 100 000 Old_age Always - 0

182 Erase_Fail_Count_Total 0x0000 255 255 000 Old_age Offline - 3

187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 16

192 Unsafe_Shutdown_Count 0x0012 100 100 000 Old_age Always - 21

194 Temperature_Celsius 0x0023 070 062 000 Pre-fail Always - 30 (Min/Max 14/38)

196 Not_In_Use 0x0000 100 100 000 Old_age Offline - 19

199 CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0

218 CRC_Error_Count 0x0000 100 100 000 Old_age Offline - 0

231 SSD_Life_Left 0x0013 100 100 000 Pre-fail Always - 98

233 Flash_Writes_GiB 0x0013 100 100 000 Pre-fail Always - 3676

241 Lifetime_Writes_GiB 0x0012 100 100 000 Old_age Always - 2694

242 Lifetime_Reads_GiB 0x0012 100 100 000 Old_age Always - 4467

244 Average_Erase_Count 0x0000 100 100 000 Old_age Offline - 17

245 Max_Erase_Count 0x0000 100 100 000 Old_age Offline - 42

246 Total_Erase_Count 0x0000 100 100 000 Old_age Offline - 208944

SMART Error Log Version: 1

No Errors Logged

Firmware Version: SBFKB1D1

smartctl 7.0 2018-12-30 r4883 [x86_64-w64-mingw32-win7-sp1] (sf-7.0-1)

Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family: Phison Driven SSDs

Device Model: KINGSTON SA400S37240G

Serial Number: 50026B7782CA0461

LU WWN Device Id: 5 0026b7 782ca0461

Firmware Version: SBFKB1D1

User Capacity: 240?057?409?536 bytes [240 GB]

Sector Size: 512 bytes logical/physical

Rotation Rate: Solid State Device

Form Factor: 2.5 inches

Device is: In smartctl database [for details use: -P show]

ATA Version is: ACS-4 (minor revision not indicated)

SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)

Local Time is: Thu Dec 26 10:36:34 2019 RTZ

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x0032 000 100 000 Old_age Always - 0

9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 1982

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 266

148 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0

149 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0

167 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0

168 SATA_Phy_Error_Count 0x0012 100 100 000 Old_age Always - 0

169 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 16

170 Bad_Blk_Ct_Erl/Lat 0x0000 100 100 010 Old_age Offline - 0/11

172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0

173 MaxAvgErase_Ct 0x0000 100 100 000 Old_age Offline - 61 (Average 48)

181 Program_Fail_Cnt_Total 0x0032 100 100 000 Old_age Always - 0

182 Erase_Fail_Count_Total 0x0000 100 100 000 Old_age Offline - 0

187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0

192 Unsafe_Shutdown_Count 0x0012 100 100 000 Old_age Always - 3

194 Temperature_Celsius 0x0022 073 066 000 Old_age Always - 27 (Min/Max 15/34)

196 Not_In_Use 0x0032 100 100 000 Old_age Always - 0

199 CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0

218 CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0

231 SSD_Life_Left 0x0000 005 005 000 Old_age Offline - 95

233 Flash_Writes_GiB 0x0032 100 100 000 Old_age Always - 6714

241 Lifetime_Writes_GiB 0x0032 100 100 000 Old_age Always - 3796

242 Lifetime_Reads_GiB 0x0032 100 100 000 Old_age Always - 7064

244 Average_Erase_Count 0x0000 100 100 000 Old_age Offline - 48

245 Max_Erase_Count 0x0000 100 100 000 Old_age Offline - 61

246 Total_Erase_Count 0x0000 100 100 000 Old_age Offline - 374704

SMART Error Log Version: 1

No Errors Logged

PlaksinAA commented 4 years ago

Solved a problem:

  1. Adding a prototype data {#DISKNAME}: Firmware Version uSSD.fw.["{#DISKCMD}"] Preprocessing: Firmware Version: ([\w]+)
  2. Editing prototype triggers: {#DISKNAME}: SSD wearout 5% and 10% 2.1 Expression of the problem: {Template_HDD_SMARTMONTOOLS_2_WITH_LLD:uSSD["{#DISKCMD}",SSD wearout].last()}<5 and {Template_HDD_SMARTMONTOOLS_2_WITH_LLD:uSSD.fw.["{#DISKCMD}"].iregexp("(SBFKB1D1)")}=0 2.2 Recovery expression: {Template_HDD_SMARTMONTOOLS_2_WITH_LLD:uSSD.fw.["{#DISKCMD}"].iregexp("(SBFKB1D1)")}=1
  3. And also added 2 new triggers for 5% and 10% Expression of the problem: {Template_HDD_SMARTMONTOOLS_2_WITH_LLD:uSSD["{#DISKCMD}",SSD wearout].last()}>95 and {Template_HDD_SMARTMONTOOLS_2_WITH_LLD:uSSD.fw.["{#DISKCMD}"].iregexp("(SBFKB1D1)")}=1