Observe that NVMe devices do not record temperature, while non-NVMe devices do, even if the non-NVMe devices hold this information in the vendor specific attributes section
Run telegraf with attributes flag set to true
Observe that NVMe and non-NVMe devices now record temperature
Expected behavior:
smartctl --info --health --attributes --tolerance=verypermissive --nocheck standby --format=brief /dev/nvme0
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-862.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: SAMSUNG MZQLB3T8HALS-00007
Serial Number: S438NF0M304843
Firmware Version: EDA5202Q
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 3,840,755,982,336 [3.84 TB]
Unallocated NVM Capacity: 0
Controller ID: 4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 3,840,755,982,336 [3.84 TB]
Namespace 1 Utilization: 60,272,201,728 [60.2 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Tue Nov 5 17:01:43 2019 PST
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 27 Celsius
^The above line should be recorded with attributes set to false
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 7,994,886 [4.09 TB]
Data Units Written: 333,054 [170 GB]
Host Read Commands: 17,607,817
Host Write Commands: 1,411,082
Controller Busy Time: 44
Power Cycles: 52
Power On Hours: 506
Unsafe Shutdowns: 34
Media and Data Integrity Errors: 0
Error Information Log Entries: 5
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 27 Celsius
Temperature Sensor 2: 31 Celsius
Temperature Sensor 3: 36 Celsius
smartctl --info --health --attributes --tolerance=verypermissive --nocheck standby --format=brief /dev/sdb
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-862.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Samsung based SSDs
Device Model: SAMSUNG MZ7LM3T8HMLP-00005
Serial Number: S2TYNX0J702931
LU WWN Device Id: 5 002538 c406fe884
Firmware Version: GXT5404Q
User Capacity: 3,840,755,982,336 bytes [3.84 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Nov 5 16:49:04 2019 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Power mode was: IDLE
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0
9 Power_On_Hours -O--CK 098 098 000 - 5758
12 Power_Cycle_Count -O--CK 098 098 000 - 1487
177 Wear_Leveling_Count PO--C- 099 099 005 - 62
179 Used_Rsvd_Blk_Cnt_Tot PO--C- 100 100 010 - 0
180 Unused_Rsvd_Blk_Cnt_Tot PO--C- 100 100 010 - 13078
181 Program_Fail_Cnt_Total -O--CK 100 100 010 - 0
182 Erase_Fail_Count_Total -O--CK 100 100 010 - 0
183 Runtime_Bad_Block PO--C- 100 100 010 - 0
184 End-to-End_Error PO--CK 100 100 097 - 0
187 Uncorrectable_Error_Cnt -O--CK 100 100 000 - 0
190 Airflow_Temperature_Cel -O--CK 073 046 000 - 27
194 Temperature_Celsius -O---K 073 046 000 - 27 (Min/Max 20/54)
^The above line should not record temperature with attributes set to false
The opposite. When attributes is false, non-nvme temperature is recorded from the attributes section, while nvme temperature is not. When attributes is true, all temperatures are recorded.
Additional info:
I believe the issue comes from misplacing the if collectAttributes line in the smart.go file. I also believe that smart_test.go should be amended to not only check that all required fields are present, but also that all fields that should be excluded are not present.
Relevant telegraf.conf:
System info:
Telegraf version 1.12.4 CentOS 7 Smartctl 7.0
Steps to reproduce:
Expected behavior:
^The above line should be recorded with attributes set to false
^The above line should not record temperature with attributes set to false
Actual behavior:
The opposite. When attributes is false, non-nvme temperature is recorded from the attributes section, while nvme temperature is not. When attributes is true, all temperatures are recorded.
Additional info:
I believe the issue comes from misplacing the
if collectAttributes
line in the smart.go file. I also believe that smart_test.go should be amended to not only check that all required fields are present, but also that all fields that should be excluded are not present.