micha37-martins / S.M.A.R.T-disk-monitoring-for-Prometheus

Prometheus node_exporter text_collector for S.M.A.R.T disk values
Apache License 2.0
99 stars 27 forks source link

Add vendor specific command line options #22

Open DenisBY opened 2 months ago

DenisBY commented 2 months ago

Seagate raw data isn't accurate. Therefore -v is required. See https://www.truenas.com/community/threads/seagate-ironwolf-smart-test-raw_read_error_rate-seek_error_rate.68634/post-470741

micha37-martins commented 2 months ago

Hi and thanks for mentioning this issue. As the source you provide seems rather old I am not convinced to make an exception for Segate. I'm also not sure if there are only drives concerned that are old or if this is a general "Segate" problem. Feel free to convince me that this is a general problem. Otherwise this this info might help someone to understand wrong error rates. So I added it to the README: https://github.com/micha37-martins/S.M.A.R.T-disk-monitoring-for-Prometheus/pull/24/commits/73600ac546418537c2d6ff6fee0472ecabd13300

DenisBY commented 2 months ago

Hi. Thank you for your work. This issue is still valid. I bought two Seagate IronWolf NAS 12TB and smart still shows abnormally high values for "Seek_Error_Rate" and "Raw_Read_Error_Rate":

image image
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   084   064   044    Pre-fail  Always       -       241341296
  3 Spin_Up_Time            0x0003   092   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       11
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   074   060   045    Pre-fail  Always       -       25607460

however, if I put these values here https://s.i.wtf/, it shows 0 errors

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST12000VN0007-2GS116
Serial Number:    ZJV60FWR
LU WWN Device Id: 5 000c50 0c3046680
Firmware Version: SC60
User Capacity:    12,000,138,625,024 bytes [12.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Sep 23 09:37:38 2024 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
dynux90 commented 3 weeks ago

Hi. Thank you for your work. This issue is still valid. I bought two Seagate IronWolf NAS 12TB and smart still shows abnormally high values for "Seek_Error_Rate" and "Raw_Read_Error_Rate": image image

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   084   064   044    Pre-fail  Always       -       241341296
  3 Spin_Up_Time            0x0003   092   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       11
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   074   060   045    Pre-fail  Always       -       25607460

however, if I put these values here https://s.i.wtf/, it shows 0 errors

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST12000VN0007-2GS116
Serial Number:    ZJV60FWR
LU WWN Device Id: 5 000c50 0c3046680
Firmware Version: SC60
User Capacity:    12,000,138,625,024 bytes [12.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Sep 23 09:37:38 2024 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

I have the same problem. How did you modify the script?

micha37-martins commented 3 weeks ago

@DenisBY @dynux90 I see this is annoying behavior and I created a workaround for you. But I need your support as I do not have a Seagate drive. Please test the "seagate_special" branch.

dynux90 commented 3 weeks ago

@micha37-martins does this works when we have both HDD from Seagate and SSD from other vendors?

micha37-martins commented 3 weeks ago

@dynux90 I cannot test it but maybe you can verify that it works for your Seagate and then I try to implement a switch for mixed setups that have Seagate and other vendor(s).

dynux90 commented 3 weeks ago

@micha37-martins the modified version does not work, even is not getting the smart attributes.

i tried this code instead and its working in process_device():

  local info_json                                                                                                      

  info_json=$(/usr/sbin/smartctl -i -j -d "${type}" "${disk}")                                                         

  parse_smartctl_info_json "${disk}" "${type}" "${info_json}"                                                          

  local model_name                                                                                                     
  model_name=$(echo "$info_json" | jq -r '.model_name // empty')                                                       

  # Get and parse SMART attributes                                                                                     
  local attributes_json                                                                                                

  if [[ "$model_name" == ST* ]]; then                                                                                  
    attributes_json=$(/usr/sbin/smartctl -A -j -d "${type}" -v 1,raw48:54 -v 7,raw48:54 "${disk}")                     
  else                                                                                                                                                                                     
    attributes_json=$(/usr/sbin/smartctl -A -j -d "${type}" "${disk}")                                                 
  fi   

every Seagate device should begin with STXXXX so we can recognize him this way. I hope this is good for you.

image

micha37-martins commented 2 weeks ago

@dynux90 Thanks for testing and your suggestion. I would still keep the flag to somehow separate the code for this kind of edge case. Feel free to check the change to seagate_special branch: https://github.com/micha37-martins/S.M.A.R.T-disk-monitoring-for-Prometheus/blob/5a50a63dc28e1fb63c6c404a97832503f851a1b3/src/smartmon.sh#L316