linux-nvme / nvme-cli

NVMe management command line interface.
https://nvmexpress.org
GNU General Public License v2.0
1.49k stars 659 forks source link

Firmware commit success for SK Hynix P31 but firmware doesn't update #2576

Open robbins opened 6 days ago

robbins commented 6 days ago

This is my SSD:

03:00.0 Non-Volatile memory controller [0108]: SK hynix Gold P31/BC711/PC711 NVMe Solid State Drive [1c5c:174a] (prog-if 02 [NVM Express])
    Subsystem: SK hynix Gold P31/BC711/PC711 NVMe Solid State Drive [1c5c:174a]
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1          /dev/ng0n1            CY0CN03821020CJ1T    HFM001TD3JX013N                          0x1          1.02  TB /   1.02  TB    512   B +  0 B   41000C20

and it appears to have 3 slots, at least the first of which is writable:

sudo nvme id-ctrl /dev/nvme0 -H | grep Firmware
  [9:9] : 0x1   Firmware Activation Notices Supported
  [4:4] : 0x1   Firmware Activate Without Reset Supported
  [3:1] : 0x3   Number of Firmware Slots
  [0:0] : 0 Firmware Slot 1 Read/Write

and it's currently on firmware 41000C20:

sudo nvme fw-log /dev/nvme0n1
Firmware Log for device:nvme0n1
afi  : 0x1
frs1 : 0x3032433030303134 (41000C20)

I downloaded the "Firmware: Gold P31 1TB" file from https://ssd.skhynix.com/download/, and it seems to be the newer revision:

strings /home/nixos/Downloads/Gold_P31_1000GB.ebin | grep C20
41062C20
41062C20
41062C20
41062C20

I then download the firmware to the drive:

sudo nvme fw-download --verbose --fw=/home/nixos/Downloads/Gold_P31_1000GB.ebin /dev/nvme0n1
...
Firmware download success

However, no fw-commit command actually changes the firmware on any slot:

[nix-shell:~]$ sudo nvme fw-download --fw=/home/nixos/Downloads/Gold_P31_1000GB.ebin /dev/nvme0n1 && sudo nvme fw-commit /dev/nvme0 --action=2 --slot=0
Firmware download success
Success committing firmware action:2 slot:0

[nix-shell:~]$ sudo nvme fw-download --fw=/home/nixos/Downloads/Gold_P31_1000GB.ebin /dev/nvme0n1 && sudo nvme fw-commit /dev/nvme0 --action=2 --slot=1
Firmware download success
Success committing firmware action:2 slot:1

[nix-shell:~]$ sudo nvme fw-download --fw=/home/nixos/Downloads/Gold_P31_1000GB.ebin /dev/nvme0n1 && sudo nvme fw-commit /dev/nvme0 --action=2 --slot=2
Firmware download success
NVMe status: Invalid Firmware Image: The firmware image specified for activation is invalid and not loaded by the controller(0x107)

[nix-shell:~]$ sudo nvme fw-download --fw=/home/nixos/Downloads/Gold_P31_1000GB.ebin /dev/nvme0n1 && sudo nvme fw-commit /dev/nvme0 --action=3 --slot=0
Firmware download success
NVMe status: Invalid Firmware Image: The firmware image specified for activation is invalid and not loaded by the controller(0x107)

[nix-shell:~]$ sudo nvme fw-download --fw=/home/nixos/Downloads/Gold_P31_1000GB.ebin /dev/nvme0n1 && sudo nvme fw-commit /dev/nvme0 --action=3 --slot=1
Firmware download success
NVMe status: Invalid Firmware Image: The firmware image specified for activation is invalid and not loaded by the controller(0x107)

[nix-shell:~]$ sudo nvme fw-download --fw=/home/nixos/Downloads/Gold_P31_1000GB.ebin /dev/nvme0n1 && sudo nvme fw-commit /dev/nvme0 --action=3 --slot=2
Firmware download success
NVMe status: Invalid Firmware Image: The firmware image specified for activation is invalid and not loaded by the controller(0x107)

[nix-shell:~]$ sudo nvme fw-log /dev/nvme0
Firmware Log for device:nvme0
afi  : 0x11
frs1 : 0x3032433030303134 (41000C20)

[nix-shell:~]$ sudo nvme reset /dev/nvme0

[nix-shell:~]$ sudo nvme fw-log /dev/nvme0
Firmware Log for device:nvme0
afi  : 0x1
frs1 : 0x3032433030303134 (41000C20)

and I'm not sure why action 3 fails. Is the firmware required to be downloaded before every commit? Because if I only download it once, slot 0 and 1 commit successfully with action 3.

Dmesg contains the following:

[  410.363576] nvme nvme0: controller capabilities changed, reset may be required to take effect.
[  474.554374] nvme nvme0: 16/0/0 default/read/poll queues
[  474.558245] nvme nvme0: Ignoring bogus Namespace Identifiers
[  606.030167] nvme nvme0: resetting controller
[  606.132928] nvme nvme0: 16/0/0 default/read/poll queues
[  606.137146] nvme nvme0: Ignoring bogus Namespace Identifiers
[ 1088.156631] nvme nvme0: controller capabilities changed, reset may be required to take effect.
[ 1263.666956] nvme nvme0: resetting controller
[ 1263.755138] nvme nvme0: 16/0/0 default/read/poll queues
[ 1263.759033] nvme nvme0: Ignoring bogus Namespace Identifiers

A verbose commit:

[nix-shell:~]$ sudo nvme fw-download --fw=/home/nixos/Downloads/Gold_P31_1000GB.ebin /dev/nvme0n1 && sudo nvme fw-commit /dev/nvme0 --action=2 --slot=0 -vvv
Firmware download success
opcode       : 10
flags        : 00
rsvd1        : 0000
nsid         : 00000000
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00000000
metadata_len : 00000000
addr         : 0
metadata     : 0
cdw10        : 00000010
cdw11        : 00000000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000
result       : 00000000
err          : 0
latency      : 908931 us
Success committing firmware action:2 slot:0
opcode       : 06
flags        : 00
rsvd1        : 0000
nsid         : 00000000
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00001000
metadata_len : 00000000
addr         : 33623000
metadata     : 0
cdw10        : 00000001
cdw11        : 00000000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000
result       : 00000000
err          : 0
latency      : 2619 us
igaw commented 1 day ago

firmware action 3 means The existing image in the specified Firmware Slot is activated at the next Controller Level Reset.. This usually means you need to reboot the machine.

robbins commented 20 hours ago

firmware action 3 means The existing image in the specified Firmware Slot is activated at the next Controller Level Reset.. This usually means you need to reboot the machine.

Even after doing that, the firmware version remains unchanged.