Closed PARitter closed 5 years ago
Thank you for the report.
1: I think its better to provide a setting that enables -d nvme
call, as different version of smartmontools will behave to this differently:
https://bugs.launchpad.net/ubuntu/+source/smartmontools/+bug/1685332
This, as many other repo issues requeries a major rewrite.
2: I'm looking forward to pool requests, but not in this case: it will be rewritten (rather) soon. I'm swimming in technical debt right now.
Related: https://github.com/nobodysu/zabbix-smartmontools/issues/15
part 2 is addressed in https://github.com/nobodysu/zabbix-mini-IPMI/commit/8d839b8f7c2b1ccdc4ba7832ca89e4e0d95ab8e6 (only manual nvme will work at this time)
@PARitter @rmalenko Any chance you could test it? https://github.com/nobodysu/zabbix-mini-IPMI/tree/refactoring_and_nvme Two scripts and template.
Tested on: Ubuntu 18.10 / smartmontools 6.6 / zabbix-agent 4.05: works great! Windows 10 Pro 1809 / smartmontools 7.0 / zabbix-agent 4.0.0 (x64): works great!
I'm going to try it on Debian Stretch / Arm64 (dietpi) a bit later. Never run it there before, but should be an interesting test.
Thank you.
That's great, thanks. Looking forward to arm test results.
Also, can you provide -A -i
, -x
and -a
outputs of an nvme (redacting serials ofc)? That would be pretty helpful.
@nobodysu excuse, I hadn't any test. However, I wrote own Zabbix check only for NVME disks. https://github.com/rmalenko/zabbix
Per your request...though HTML is messing up the formatting...
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.18.0-15-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION === Model Number: KXG50PNV2T04 NVMe TOSHIBA 2048GB Serial Number: --------------- Firmware Version: AFDA4103 PCI Vendor/Subsystem ID: 0x1179 IEEE OUI Identifier: 0x00080d Total NVM Capacity: 2,048,408,248,320 [2.04 TB] Unallocated NVM Capacity: 0 Controller ID: 0 Number of Namespaces: 1 Namespace 1 Size/Capacity: 2,048,408,248,320 [2.04 TB] Namespace 1 Formatted LBA Size: 512 Local Time is: Sat Mar 2 17:49:27 2019 UTC
=== START OF SMART DATA SECTION === SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff) Critical Warning: 0x00 Temperature: 32 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 7,830 [4.00 GB] Data Units Written: 4,004,405 [2.05 TB] Host Read Commands: 707,342 Host Write Commands: 1,766,738 Controller Busy Time: 39 Power Cycles: 50 Power On Hours: 989 Unsafe Shutdowns: 28 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 18 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 32 Celsius
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.18.0-15-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION === Model Number: KXG50PNV2T04 NVMe TOSHIBA 2048GB Serial Number: --------------- Firmware Version: AFDA4103 PCI Vendor/Subsystem ID: 0x1179 IEEE OUI Identifier: 0x00080d Total NVM Capacity: 2,048,408,248,320 [2.04 TB] Unallocated NVM Capacity: 0 Controller ID: 0 Number of Namespaces: 1 Namespace 1 Size/Capacity: 2,048,408,248,320 [2.04 TB] Namespace 1 Formatted LBA Size: 512 Local Time is: Sat Mar 2 17:55:32 2019 UTC Firmware Updates (0x14): 2 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Other Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Other Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 78 Celsius Critical Comp. Temp. Threshold: 82 Celsius Namespace 1 Features (0x02): NA_Fields
Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 6.00W - - 0 0 0 0 0 0 1 + 2.40W - - 1 1 1 1 0 0 2 + 1.90W - - 2 2 2 2 0 0 3 - 0.0500W - - 3 3 3 3 1500 1500 4 - 0.0030W - - 4 4 4 4 50000 90000
Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 2 1 - 4096 0 1
=== START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff) Critical Warning: 0x00 Temperature: 32 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 7,830 [4.00 GB] Data Units Written: 4,004,407 [2.05 TB] Host Read Commands: 707,342 Host Write Commands: 1,766,911 Controller Busy Time: 39 Power Cycles: 50 Power On Hours: 990 Unsafe Shutdowns: 28 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 18 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 32 Celsius
Error Information (NVMe Log 0x01, max 128 entries) No Errors Logged
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.18.0-15-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION === Model Number: KXG50PNV2T04 NVMe TOSHIBA 2048GB Serial Number: ---------------- Firmware Version: AFDA4103 PCI Vendor/Subsystem ID: 0x1179 IEEE OUI Identifier: 0x00080d Total NVM Capacity: 2,048,408,248,320 [2.04 TB] Unallocated NVM Capacity: 0 Controller ID: 0 Number of Namespaces: 1 Namespace 1 Size/Capacity: 2,048,408,248,320 [2.04 TB] Namespace 1 Formatted LBA Size: 512 Local Time is: Sat Mar 2 17:56:40 2019 UTC Firmware Updates (0x14): 2 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Other Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Other Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 78 Celsius Critical Comp. Temp. Threshold: 82 Celsius Namespace 1 Features (0x02): NA_Fields
Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 6.00W - - 0 0 0 0 0 0 1 + 2.40W - - 1 1 1 1 0 0 2 + 1.90W - - 2 2 2 2 0 0 3 - 0.0500W - - 3 3 3 3 1500 1500 4 - 0.0030W - - 4 4 4 4 50000 90000
Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 2 1 - 4096 0 1
=== START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff) Critical Warning: 0x00 Temperature: 32 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 7,830 [4.00 GB] Data Units Written: 4,004,407 [2.05 TB] Host Read Commands: 707,342 Host Write Commands: 1,766,911 Controller Busy Time: 39 Power Cycles: 50 Power On Hours: 990 Unsafe Shutdowns: 28 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 18 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 32 Celsius
Error Information (NVMe Log 0x01, max 128 entries) No Errors Logged
Putting it up on the ARM SoC will have to wait until I have a bit more time. Two problems:
Part 1: Smartctl work correctly with most nvme drives, but "smartctl --scan" wont return them so mini-ipmi does not detect or report them correctly. "smartctl --scan -d nvme" will list them correctly, but does not list any other drives. So enumerating nvme, scsi and ata drives requires two separate calls to "smartctl --scan".
Part 2: if you list nvme drives in the diskListManual it reports "no temp" for the drive because smartctl display format for nvme disks is slightly different: "Temperature: 42 Celsius"
I can do a pull request to fix part 2 (its simple) but haven't got a generic fix for part 1.