Open Liamlu28 opened 5 years ago
+1
Are you able to use the smartmon.py script instead?
Hello.
Thank you for your reply. Of course, we try to use smartmon.py.
# ./smartmon.py_NEWEST | grep -v "#" smartmon_smartctl_version{version="7.2"} 1 smartmon_attr_raw_value{name="raw_read_error_rate",device="/dev/sda",disk="0"} 0 smartmon_attr_raw_value{name="power_on_hours",device="/dev/sda",disk="0"} 30079 smartmon_attr_raw_value{name="power_cycle_count",device="/dev/sda",disk="0"} 6 smartmon_attr_raw_value{name="program_fail_count",device="/dev/sda",disk="0"} 0 smartmon_attr_raw_value{name="reported_uncorrect",device="/dev/sda",disk="0"} 0 smartmon_attr_raw_value{name="temperature_celsius",device="/dev/sda",disk="0"} 38 smartmon_attr_raw_value{name="reallocated_event_count",device="/dev/sda",disk="0"} 0 smartmon_attr_raw_value{name="offline_uncorrectable",device="/dev/sda",disk="0"} 0 smartmon_attr_raw_value{name="udma_crc_error_count",device="/dev/sda",disk="0"} 0 smartmon_attr_raw_value{name="total_lbas_written",device="/dev/sda",disk="0"} 72636214020 smartmon_attr_raw_value{name="raw_read_error_rate",device="/dev/sdb",disk="0"} 2817580 smartmon_attr_raw_value{name="power_on_hours",device="/dev/sdb",disk="0"} 56476 smartmon_attr_raw_value{name="power_cycle_count",device="/dev/sdb",disk="0"} 15 smartmon_attr_raw_value{name="program_fail_count",device="/dev/sdb",disk="0"} 0 smartmon_attr_raw_value{name="reported_uncorrect",device="/dev/sdb",disk="0"} 1851 smartmon_attr_raw_value{name="temperature_celsius",device="/dev/sdb",disk="0"} 35 smartmon_attr_raw_value{name="reallocated_event_count",device="/dev/sdb",disk="0"} 2 smartmon_attr_raw_value{name="offline_uncorrectable",device="/dev/sdb",disk="0"} 0 smartmon_attr_raw_value{name="udma_crc_error_count",device="/dev/sdb",disk="0"} 1 smartmon_attr_raw_value{name="total_lbas_written",device="/dev/sdb",disk="0"} 134617775539 smartmon_attr_threshold{name="raw_read_error_rate",device="/dev/sda",disk="0"} 0 smartmon_attr_threshold{name="power_on_hours",device="/dev/sda",disk="0"} 0 smartmon_attr_threshold{name="power_cycle_count",device="/dev/sda",disk="0"} 0 smartmon_attr_threshold{name="program_fail_count",device="/dev/sda",disk="0"} 0 smartmon_attr_threshold{name="reported_uncorrect",device="/dev/sda",disk="0"} 0 smartmon_attr_threshold{name="temperature_celsius",device="/dev/sda",disk="0"} 0 smartmon_attr_threshold{name="reallocated_event_count",device="/dev/sda",disk="0"} 0 smartmon_attr_threshold{name="offline_uncorrectable",device="/dev/sda",disk="0"} 0 smartmon_attr_threshold{name="udma_crc_error_count",device="/dev/sda",disk="0"} 0 smartmon_attr_threshold{name="total_lbas_written",device="/dev/sda",disk="0"} 0 smartmon_attr_threshold{name="raw_read_error_rate",device="/dev/sdb",disk="0"} 0 smartmon_attr_threshold{name="power_on_hours",device="/dev/sdb",disk="0"} 0 smartmon_attr_threshold{name="power_cycle_count",device="/dev/sdb",disk="0"} 0 smartmon_attr_threshold{name="program_fail_count",device="/dev/sdb",disk="0"} 0 smartmon_attr_threshold{name="reported_uncorrect",device="/dev/sdb",disk="0"} 0 smartmon_attr_threshold{name="temperature_celsius",device="/dev/sdb",disk="0"} 0 smartmon_attr_threshold{name="reallocated_event_count",device="/dev/sdb",disk="0"} 0 smartmon_attr_threshold{name="offline_uncorrectable",device="/dev/sdb",disk="0"} 0 smartmon_attr_threshold{name="udma_crc_error_count",device="/dev/sdb",disk="0"} 0 smartmon_attr_threshold{name="total_lbas_written",device="/dev/sdb",disk="0"} 0 smartmon_attr_value{name="raw_read_error_rate",device="/dev/sda",disk="0"} 100 smartmon_attr_value{name="power_on_hours",device="/dev/sda",disk="0"} 100 smartmon_attr_value{name="power_cycle_count",device="/dev/sda",disk="0"} 100 smartmon_attr_value{name="program_fail_count",device="/dev/sda",disk="0"} 100 smartmon_attr_value{name="reported_uncorrect",device="/dev/sda",disk="0"} 100 smartmon_attr_value{name="temperature_celsius",device="/dev/sda",disk="0"} 62 smartmon_attr_value{name="reallocated_event_count",device="/dev/sda",disk="0"} 100 smartmon_attr_value{name="offline_uncorrectable",device="/dev/sda",disk="0"} 100 smartmon_attr_value{name="udma_crc_error_count",device="/dev/sda",disk="0"} 100 smartmon_attr_value{name="total_lbas_written",device="/dev/sda",disk="0"} 100 smartmon_attr_value{name="raw_read_error_rate",device="/dev/sdb",disk="0"} 100 smartmon_attr_value{name="power_on_hours",device="/dev/sdb",disk="0"} 100 smartmon_attr_value{name="power_cycle_count",device="/dev/sdb",disk="0"} 100 smartmon_attr_value{name="program_fail_count",device="/dev/sdb",disk="0"} 100 smartmon_attr_value{name="reported_uncorrect",device="/dev/sdb",disk="0"} 100 smartmon_attr_value{name="temperature_celsius",device="/dev/sdb",disk="0"} 65 smartmon_attr_value{name="reallocated_event_count",device="/dev/sdb",disk="0"} 100 smartmon_attr_value{name="offline_uncorrectable",device="/dev/sdb",disk="0"} 100 smartmon_attr_value{name="udma_crc_error_count",device="/dev/sdb",disk="0"} 100 smartmon_attr_value{name="total_lbas_written",device="/dev/sdb",disk="0"} 100 smartmon_attr_worst{name="raw_read_error_rate",device="/dev/sda",disk="0"} 100 smartmon_attr_worst{name="power_on_hours",device="/dev/sda",disk="0"} 100 smartmon_attr_worst{name="power_cycle_count",device="/dev/sda",disk="0"} 100 smartmon_attr_worst{name="program_fail_count",device="/dev/sda",disk="0"} 100 smartmon_attr_worst{name="reported_uncorrect",device="/dev/sda",disk="0"} 100 smartmon_attr_worst{name="temperature_celsius",device="/dev/sda",disk="0"} 49 smartmon_attr_worst{name="reallocated_event_count",device="/dev/sda",disk="0"} 100 smartmon_attr_worst{name="offline_uncorrectable",device="/dev/sda",disk="0"} 100 smartmon_attr_worst{name="udma_crc_error_count",device="/dev/sda",disk="0"} 100 smartmon_attr_worst{name="total_lbas_written",device="/dev/sda",disk="0"} 100 smartmon_attr_worst{name="raw_read_error_rate",device="/dev/sdb",disk="0"} 100 smartmon_attr_worst{name="power_on_hours",device="/dev/sdb",disk="0"} 100 smartmon_attr_worst{name="power_cycle_count",device="/dev/sdb",disk="0"} 100 smartmon_attr_worst{name="program_fail_count",device="/dev/sdb",disk="0"} 100 smartmon_attr_worst{name="reported_uncorrect",device="/dev/sdb",disk="0"} 100 smartmon_attr_worst{name="temperature_celsius",device="/dev/sdb",disk="0"} 49 smartmon_attr_worst{name="reallocated_event_count",device="/dev/sdb",disk="0"} 100 smartmon_attr_worst{name="offline_uncorrectable",device="/dev/sdb",disk="0"} 100 smartmon_attr_worst{name="udma_crc_error_count",device="/dev/sdb",disk="0"} 100 smartmon_attr_worst{name="total_lbas_written",device="/dev/sdb",disk="0"} 100 smartmon_device_active{device="/dev/sda",disk="0"} 1 smartmon_device_active{device="/dev/sdb",disk="0"} 1 smartmon_device_errors{device="/dev/sda",disk="0"} 0 smartmon_device_errors{device="/dev/sdb",disk="0"} 1851 smartmon_device_info{device="/dev/sda",disk="0",model_family="Crucial/Micron Client SSDs",device_model="Micron_1100_MTFDDAK256TBN",serial_number="17******3",firmware_version="M0MU031"} 1 smartmon_device_info{device="/dev/sdb",disk="0",model_family="Crucial/Micron Client SSDs",device_model="Crucial_CT256MX100SSD1",serial_number="14******5",firmware_version="MU03"} 1 smartmon_device_smart_available{device="/dev/sda",disk="0"} 1 smartmon_device_smart_available{device="/dev/sdb",disk="0"} 1 smartmon_device_smart_enabled{device="/dev/sda",disk="0"} 1 smartmon_device_smart_enabled{device="/dev/sdb",disk="0"} 1 smartmon_device_smart_healthy{device="/dev/sda",disk="0"} 0 smartmon_device_smart_healthy{device="/dev/sdb",disk="0"} 1 smartmon_smartctl_run{device="/dev/sda",disk="0"} 1616051028 smartmon_smartctl_run{device="/dev/sdb",disk="0"} 1616051028
# smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-4.19.0-6-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Crucial/Micron Client SSDs
Device Model: Micron_1100_MTFDDAK256TBN
Serial Number: 17*****3
LU WWN Device Id: 5 00a075 115acfd33
Firmware Version: M0MU031
User Capacity: 256,060,514,304 bytes [256 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Mar 18 07:05:02 2021 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
No failed Attributes found.
General SMART Values:
Offline data collection status: (0x06) Offline data collection activity
was aborted by the device with a fatal error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 654) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 4) minutes.
Conveyance self-test routine
recommended polling time: ( 3) minutes.
SCT capabilities: (0x0035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0
5 Reallocate_NAND_Blk_Cnt 0x0032 100 100 010 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 30079
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 6
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 Ave_Block-Erase_Count 0x0032 001 001 000 Old_age Always - 1597
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 2
183 SATA_Interfac_Downshift 0x0032 100 100 000 Old_age Always - 0
184 Error_Correction_Count 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 062 049 000 Old_age Always - 38 (Min/Max 22/51)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_ECC_Cnt 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
202 Percent_Lifetime_Remain 0x0030 000 000 001 Old_age Offline FAILING_NOW 100
206 Write_Error_Rate 0x000e 100 100 000 Old_age Always - 0
246 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 72636216364
247 Host_Program_Page_Count 0x0032 100 100 000 Old_age Always - 2308805595
248 FTL_Program_Page_Count 0x0032 100 100 000 Old_age Always - 13846706729
180 Unused_Reserve_NAND_Blk 0x0033 000 000 000 Pre-fail Always - 2056
210 Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Vendor (0xff) Completed without error 00% 29688 -
# 2 Vendor (0xff) Completed without error 00% 29605 -
# 3 Vendor (0xff) Completed without error 00% 28706 -
# 4 Vendor (0xff) Completed without error 00% 27823 -
# 5 Vendor (0xff) Completed without error 00% 26820 -
# 6 Vendor (0xff) Completed without error 00% 25728 -
# 7 Vendor (0xff) Completed without error 00% 24587 -
# 8 Vendor (0xff) Completed without error 00% 23338 -
# 9 Vendor (0xff) Completed without error 00% 22109 -
#10 Vendor (0xff) Completed without error 00% 21130 -
#11 Vendor (0xff) Completed without error 00% 20294 -
#12 Vendor (0xff) Completed without error 00% 19360 -
#13 Vendor (0xff) Completed without error 00% 18370 -
#14 Vendor (0xff) Completed without error 00% 17719 -
#15 Extended offline Completed without error 00% 17592 -
#16 Extended offline Completed without error 00% 17580 -
#17 Vendor (0xff) Completed without error 00% 17543 -
#18 Vendor (0xff) Completed without error 00% 17423 -
#19 Vendor (0xff) Completed without error 00% 17303 -
#20 Extended offline Completed without error 00% 17195 -
#21 Short offline Completed without error 00% 17194 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
You have new mail in /var/mail/root
202 Percent_Lifetime_Remain 0x0030 000 000 001 Old_age Offline FAILING_NOW 100
So. We have another project https://github.com/micha37-martins/S.M.A.R.T-disk-monitoring-for-Prometheus and my pull request (pre-Alpha): https://github.com/laa88rf/S.M.A.R.T-disk-monitoring-for-Prometheus/blob/master/smartmon.sh
Could you please have a look?
Some SSDs also report health information via the so-called "ATA Device Statistics". #68 implements support for those in smartmon.py
.
Special characters cannot be parse Example: Drive_Life_Remaining% SSD_LifeLeft(0.01%)