truenas / py-SMART

Wrapper for smartctl (smartmontools)
GNU Lesser General Public License v2.1
76 stars 35 forks source link

NVMe devices do not appear in DeviceList #53

Closed ulmitov closed 1 year ago

ulmitov commented 1 year ago

Hi,

There is an nvme device but for some reason it is not being listed by pySmart.

python ouptut:

>>> import pySMART
>>> z=pySMART.DeviceList()
>>> z
<DeviceList contents:
>

smartctl:

root@ubuntu20-cuda:~# smartctl --scan-open /dev/nvme0 -d nvme # /dev/nvme0, NVMe device /dev/nvme1 -d nvme # /dev/nvme1, NVMe device

root@ubuntu20-cuda:~# smartctl -d nvme --all /dev/nvme0 smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-99-generic] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION === Model Number: KCM61VUL1T60 Serial Number: 61B0A067T1K8 Firmware Version: 0105 PCI Vendor/Subsystem ID: 0x1e0f IEEE OUI Identifier: 0x8ce38e Total NVM Capacity: 1 600 321 314 816 [1,60 TB] Unallocated NVM Capacity: 0 Controller ID: 1 Number of Namespaces: 64 Local Time is: Thu Dec 8 11:32:32 2022 MSK Firmware Updates (0x16): 3 Slots, no Reset required Optional Admin Commands (0x025f): Security Format Frmw_DL NS_Mngmt Self_Test MI_Snd/Rec Other Optional NVM Commands (0x00ff): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Resv Timestmp Other Maximum Data Transfer Size: 8192 Pages Warning Comp. Temp. Threshold: 73 Celsius Critical Comp. Temp. Threshold: 82 Celsius

Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 27.50W 25.00W - 0 0 0 0 500000 500000 1 + 19.80W 18.00W - 0 0 1 1 500000 500000 2 + 17.60W 16.00W - 0 0 2 2 500000 500000 3 + 15.40W 14.00W - 1 1 3 3 500000 500000 4 + 12.10W 11.00W - 2 2 4 4 500000 500000 5 + 9.90W 9.00W - 3 3 5 5 500000 500000 6 - 5.00W - - 6 6 6 6 500000 500000

=== START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 45 Celsius Available Spare: 100% Available Spare Threshold: 12% Percentage Used: 0% Data Units Read: 81 966 502 [41,9 TB] Data Units Written: 56 742 372 [29,0 TB] Host Read Commands: 1 868 020 852 Host Write Commands: 1 054 642 944 Controller Busy Time: 567 Power Cycles: 497 Power On Hours: 1 998 Unsafe Shutdowns: 423 Media and Data Integrity Errors: 0 Error Information Log Entries: 283 Warning Comp. Temperature Time: 284 Critical Comp. Temperature Time: 0 Thermal Temp. 1 Transition Count: 18 Thermal Temp. 2 Transition Count: 4 Thermal Temp. 1 Total Time: 1440 Thermal Temp. 2 Total Time: 15910

Error Information (NVMe Log 0x01, max 256 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS 0 283 0 0x000c 0xc004 0x02b - 0 -

root@ubuntu20-cuda:~# smartctl -d nvme --all /dev/nvme1 smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-99-generic] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION === Model Number: KCM61VUL1T60 Serial Number: 61B0A066T1K8 Firmware Version: 0105 PCI Vendor/Subsystem ID: 0x1e0f IEEE OUI Identifier: 0x8ce38e Total NVM Capacity: 1 600 321 314 816 [1,60 TB] Unallocated NVM Capacity: 0 Controller ID: 1 Number of Namespaces: 64 Local Time is: Thu Dec 8 11:37:18 2022 MSK Firmware Updates (0x16): 3 Slots, no Reset required Optional Admin Commands (0x025f): Security Format Frmw_DL NS_Mngmt Self_Test MI_Snd/Rec Other Optional NVM Commands (0x00ff): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Resv Timestmp Other Maximum Data Transfer Size: 8192 Pages Warning Comp. Temp. Threshold: 73 Celsius Critical Comp. Temp. Threshold: 82 Celsius

Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 27.50W 25.00W - 0 0 0 0 500000 500000 1 + 19.80W 18.00W - 0 0 1 1 500000 500000 2 + 17.60W 16.00W - 0 0 2 2 500000 500000 3 + 15.40W 14.00W - 1 1 3 3 500000 500000 4 + 12.10W 11.00W - 2 2 4 4 500000 500000 5 + 9.90W 9.00W - 3 3 5 5 500000 500000 6 - 5.00W - - 6 6 6 6 500000 500000

=== START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 46 Celsius Available Spare: 100% Available Spare Threshold: 12% Percentage Used: 0% Data Units Read: 11 442 052 [5,85 TB] Data Units Written: 36 848 011 [18,8 TB] Host Read Commands: 48 159 985 Host Write Commands: 145 411 665 Controller Busy Time: 135 Power Cycles: 503 Power On Hours: 2 144 Unsafe Shutdowns: 298 Media and Data Integrity Errors: 0 Error Information Log Entries: 197 Warning Comp. Temperature Time: 61 Critical Comp. Temperature Time: 0 Thermal Temp. 1 Transition Count: 8 Thermal Temp. 2 Transition Count: 3 Thermal Temp. 1 Total Time: 420 Thermal Temp. 2 Total Time: 3380

Error Information (NVMe Log 0x01, max 256 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS 0 197 0 0x0008 0xc004 0x02b - 0 -

root@ubuntu20-cuda:~# sudo lshw -json -class disk [
{ "id" : "namespace", "class" : "disk", "claimed" : true, "handle" : "GUID:68a031c5-563f-4224-8054-0fede5ed8910", "description" : "NVMe namespace", "physid" : "1", "logicalname" : "/dev/nvme0n1", "units" : "bytes", "size" : 1600321314816, "configuration" : { "guid" : "68a031c5-563f-4224-8054-0fede5ed8910", "logicalsectorsize" : "512", "sectorsize" : "4096" }, "capabilities" : { "gpt-1.00" : "GUID Partition Table version 1.00", "partitioned" : "Partitioned disk", "partitioned:gpt" : "GUID partition table" } }, { "id" : "namespace", "class" : "disk", "claimed" : true, "description" : "NVMe namespace", "physid" : "1", "logicalname" : "/dev/nvme1n1", "units" : "bytes", "size" : 1600321314816, "configuration" : { "logicalsectorsize" : "512", "sectorsize" : "4096" } } ]

ralequi commented 1 year ago

Are you running that python with sudo/root?

Everything in the output seems to be OK, so that's the only thing that comes to my mind

ralequi commented 1 year ago

Ok, wait, I think I've found something...

ralequi commented 1 year ago

Ok, bug confirmed

ralequi commented 1 year ago

hi @ulmitov

There was an issue checking nvme sizes, please, check the master/develop branches and confirm it is fixed.

Thanks for your contribution! Finding+reporting issues makes this project more robust!

ulmitov commented 1 year ago

@ralequi the develop version works, thank you