linux-nvme / nvme-cli

NVMe management command line interface.
https://nvmexpress.org
GNU General Public License v2.0
1.49k stars 659 forks source link

0x2002 Errors from Corsair P3 Plus #2540

Closed pallaswept closed 1 month ago

pallaswept commented 1 month ago

I was quite concerned to see SMART errors from my brand new SSD this morning:

> sudo nvme error-log /dev/nvme1
Error Log Entries for device:nvme1 entries:16
.................
 Entry[ 0]
.................
error_count     : 13
sqid            : 0
cmdid           : 0xa005
status_field    : 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag       : 0x1
parm_err_loc    : 0x4
lba             : 0
nsid            : 0x1
vs              : 0
trtype          : The transport type is not indicated or the error is not transport related.
csi             : 0
opcode          : 0
cs              : 0
trtype_spec_info: 0
log_page_version: 0
.................
 Entry[ 1]
.................
error_count     : 12
sqid            : 0
cmdid           : 0xa004
status_field    : 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag       : 0x1
parm_err_loc    : 0x4
lba             : 0
nsid            : 0x1
vs              : 0
trtype          : The transport type is not indicated or the error is not transport related.
csi             : 0
opcode          : 0
cs              : 0
trtype_spec_info: 0
log_page_version: 0
.................
 Entry[ 2]
.................
error_count     : 11
sqid            : 0
cmdid           : 0x18
status_field    : 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
phase_tag       : 0x1
parm_err_loc    : 0x28
lba             : 0
nsid            : 0
vs              : 0
trtype          : The transport type is not indicated or the error is not transport related.
csi             : 0
opcode          : 0
cs              : 0
trtype_spec_info: 0
log_page_version: 0
.................

I've searched here and I understand that these errors are nothing to worry about directly, but I also gather that for some devices, there was a means to avoid these errors being logged. I saw some discussion of a 'quirks' database? I'm wondering if maybe the same could be done for these devices also? They're quite popular, being Amazon's recommended NVMe drive, so maybe it's worth it.

Presuming that I have correctly interpreted the issues here and nothing is really wrong, my goal here is to get a clean log when nothing is wrong, so that genuine errors would not be hidden behind the 'noise' of spurious errors.

Thanks for any advice you could offer.

keithbusch commented 1 month ago

You experience is unfortunately very common. It's too bad the spec is vague on when errors need to be logged, and I think vendors really should have chosen not to log an error for optional admin queries: there's nothing interesting from having this saved in the log. But this is where we are today...

nvme-cli just reports the logs though. If you want to prevent the commands from being dispatched in the first place, need to send the quirk information to the kernel mailing list at linux-nvme@lists.infradead.org. The quirks are based on the PCI vendor and device ID, so that info would be needed.

pallaswept commented 1 month ago

Thanks for the advice Keith!

I'll fire an email to that address, with the above message, plus the PCI Vendor and device IDs. I'm not familiar with the list, is there anything else I should do? Happy to help in any way I can.

Will close this for now as I've obviously filed it in the wrong place, my apologies.