raspberrypi / rpi-eeprom

Installation scripts and binaries for the Raspberry Pi 4 and Raspberry Pi 5 bootloader EEPROMs
https://www.raspberrypi.com/documentation/computers/raspberry-pi.html#raspberry-pi-boot-eeprom
Other
1.26k stars 201 forks source link

CM4: Not booting from SK hynix BC711 NVMe #451

Closed agners closed 1 year ago

agners commented 1 year ago

Describe the bug

I am trying to boot from a SK hynix BC711 NVMe but it seems the CM4 is not able to boot from that device. The device gets successfully detected in Linux (when booting the system from another media). Also the same setup boots fine from a Samsung 970 EVO plus.

Steps to reproduce the behaviour

Device (s)

Raspberry Pi CM4 Lite

Bootloader configuration.

$ vcgencmd bootloader_config
[all]
BOOT_UART=0
WAKE_ON_GPIO=1
POWER_OFF_ON_HALT=0

# Try SD first (1), followed by, USB PCIe, NVMe PCIe, USB SoC XHCI then network
BOOT_ORDER=0xf25641

# Set to 0 to prevent bootloader updates from USB/Network boot
# For remote units EEPROM hardware write protection should be used.
ENABLE_SELF_UPDATE=1

Updated to latest version using rpiboot

SIG pieeprom.sig 476be6589f29bef71b1218f232475b3a6a17f6d8feab78a865f7de9a45bdf00d 1668674865
Reading EEPROM: 524288
Writing EEPROM
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++......+
Verify BOOT EEPROM
Reading EEPROM: 524288
BOOT-EEPROM: UPDATED

System

No response

Bootloader logs

No response

USB boot

No response

NVMe boot

pi@raspberrypi:~$ sudo nvme list
Node             SN                   Model                                    Namespace Usage                      Format           FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1        FJB1N694511301Q3Z BC711 NVMe SK hynix 128GB                1         128.04  GB / 128.04  GB    512   B +  0 B   41002131
pi@raspberrypi:~$ sudo nvme id-ctrl -H /dev/nvme0
NVME Identify Controller:
vid       : 0x1c5c
ssvid     : 0x1c5c
sn        :    FJB1N694511301Q3Z
mn        : BC711 NVMe SK hynix 128GB
fr        : 41002131
rab       : 3
ieee      : ace42e
cmic      : 0
  [3:3] : 0     ANA not supported
  [2:2] : 0     PCI
  [1:1] : 0     Single Controller
  [0:0] : 0     Single Port

mdts      : 6
cntlid    : 0x1
ver       : 0x10300
rtd3r     : 0x7a120
rtd3e     : 0x1e8480
oaes      : 0x200
[14:14] : 0     Endurance Group Event Aggregate Log Page Change Notice Not Supported
[13:13] : 0     LBA Status Information Notices Not Supported
[12:12] : 0     Predictable Latency Event Aggregate Log Change Notices Not Supported
[11:11] : 0     Asymmetric Namespace Access Change Notices Not Supported
  [9:9] : 0x1   Firmware Activation Notices Supported
  [8:8] : 0     Namespace Attribute Changed Event Not Supported

ctratt    : 0x10
  [9:9] : 0     UUID List Not Supported
  [7:7] : 0     Namespace Granularity Not Supported
  [5:5] : 0     Predictable Latency Mode Not Supported
  [4:4] : 0x1   Endurance Groups Supported
  [3:3] : 0     Read Recovery Levels Not Supported
  [2:2] : 0     NVM Sets Not Supported
  [1:1] : 0     Non-Operational Power State Permissive Not Supported
  [0:0] : 0     128-bit Host Identifier Not Supported

rrls      : 0
cntrltype : 0
  [7:2] : 0     Reserved
  [1:0] : 0     Controller type not reported
fguid     :
crdt1     : 0
crdt2     : 0
crdt3     : 0
oacs      : 0x17
  [9:9] : 0     Get LBA Status Capability Not Supported
  [8:8] : 0     Doorbell Buffer Config Not Supported
  [7:7] : 0     Virtualization Management Not Supported
  [6:6] : 0     NVMe-MI Send and Receive Not Supported
  [5:5] : 0     Directives Not Supported
  [4:4] : 0x1   Device Self-test Supported
  [3:3] : 0     NS Management and Attachment Not Supported
  [2:2] : 0x1   FW Commit and Download Supported
  [1:1] : 0x1   Format NVM Supported
  [0:0] : 0x1   Security Send and Receive Supported

acl       : 3
aerl      : 7
frmw      : 0x16
  [4:4] : 0x1   Firmware Activate Without Reset Supported
  [3:1] : 0x3   Number of Firmware Slots
  [0:0] : 0     Firmware Slot 1 Read/Write

lpa       : 0x1e
  [4:4] : 0x1   Persistent Event log Supported
  [3:3] : 0x1   Telemetry host/controller initiated log page Supported
  [2:2] : 0x1   Extended data for Get Log Page Supported
  [1:1] : 0x1   Command Effects Log Page Supported
  [0:0] : 0     SMART/Health Log Page per NS Not Supported

elpe      : 255
npss      : 4
avscc     : 0x1
  [0:0] : 0x1   Admin Vendor Specific Commands uses NVMe Format

apsta     : 0x1
  [0:0] : 0x1   Autonomous Power State Transitions Supported

wctemp    : 356
cctemp    : 358
mtfa      : 0
hmpre     : 0
hmmin     : 0
tnvmcap   : 0
unvmcap   : 0
rpmbs     : 0
 [31:24]: 0     Access Size
 [23:16]: 0     Total Size
  [5:3] : 0     Authentication Method
  [2:0] : 0     Number of RPMB Units

edstt     : 8
dsto      : 1
fwug      : 0
kas       : 0
hctma     : 0x1
  [0:0] : 0x1   Host Controlled Thermal Management Supported

mntmt     : 273
mxtmt     : 355
sanicap   : 0x2
  [31:30] : 0   Additional media modification after sanitize operation completes successfully is not defined
  [29:29] : 0   No-Deallocate After Sanitize bit in Sanitize command Supported
    [2:2] : 0   Overwrite Sanitize Operation Not Supported
    [1:1] : 0x1 Block Erase Sanitize Operation Supported
    [0:0] : 0   Crypto Erase Sanitize Operation Not Supported

hmminds   : 0
hmmaxd    : 0
nsetidmax : 0
endgidmax : 1
anatt     : 0
anacap    : 0
  [7:7] : 0     Non-zero group ID Not Supported
  [6:6] : 0     Group ID does not change
  [4:4] : 0     ANA Change state Not Supported
  [3:3] : 0     ANA Persistent Loss state Not Supported
  [2:2] : 0     ANA Inaccessible state Not Supported
  [1:1] : 0     ANA Non-optimized state Not Supported
  [0:0] : 0     ANA Optimized state Not Supported

anagrpmax : 0
nanagrpid : 0
pels      : 1
sqes      : 0x66
  [7:4] : 0x6   Max SQ Entry Size (64)
  [3:0] : 0x6   Min SQ Entry Size (64)

cqes      : 0x44
  [7:4] : 0x4   Max CQ Entry Size (16)
  [3:0] : 0x4   Min CQ Entry Size (16)

maxcmd    : 0
nn        : 1
oncs      : 0x5f
  [7:7] : 0     Verify Not Supported
  [6:6] : 0x1   Timestamp Supported
  [5:5] : 0     Reservations Not Supported
  [4:4] : 0x1   Save and Select Supported
  [3:3] : 0x1   Write Zeroes Supported
  [2:2] : 0x1   Data Set Management Supported
  [1:1] : 0x1   Write Uncorrectable Supported
  [0:0] : 0x1   Compare Supported

fuses     : 0
  [0:0] : 0     Fused Compare and Write Not Supported

fna       : 0
  [2:2] : 0     Crypto Erase Not Supported as part of Secure Erase
  [1:1] : 0     Crypto Erase Applies to Single Namespace(s)
  [0:0] : 0     Format Applies to Single Namespace(s)

vwc       : 0x1
  [2:1] : 0     Support for the NSID field set to FFFFFFFFh is not indicated
  [0:0] : 0x1   Volatile Write Cache Present

awun      : 0
awupf     : 0
nvscc     : 1
  [0:0] : 0x1   NVM Vendor Specific Commands uses NVMe Format

nwpc      : 0
  [2:2] : 0     Permanent Write Protect Not Supported
  [1:1] : 0     Write Protect Until Power Supply Not Supported
  [0:0] : 0     No Write Protect and Write Protect Namespace Not Supported

acwu      : 0
sgls      : 0
 [1:0]  : 0     Scatter-Gather Lists Not Supported

mnan      : 0
subnqn    : nqn.2022-01.com.skhynix:nvme:nvm-subsystem-sn-FJB1N694511301Q3Z
ioccsz    : 0
iorcsz    : 0
icdoff    : 0
ctrattr   : 0
  [0:0] : 0     Dynamic Controller Model

msdbd     : 0
ps    0 : mp:6.3000W operational enlat:5 exlat:5 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps    1 : mp:2.4000W operational enlat:30 exlat:30 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps    2 : mp:1.9000W operational enlat:100 exlat:100 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps    3 : mp:0.0500W non-operational enlat:1000 exlat:1000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps    4 : mp:0.0040W non-operational enlat:1000 exlat:9000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
pi@raspberrypi:~$ sudo nvme list-ns /dev/nvme0
[   0]:0x1
pi@raspberrypi:~$ sudo nvme id-ns -H /dev/nvme0 --namespace-id=1
NVME Identify Namespace 1:
nsze    : 0xee7c2b0
ncap    : 0xee7c2b0
nuse    : 0xee7c2b0
nsfeat  : 0
  [4:4] : 0     NPWG, NPWA, NPDG, NPDA, and NOWS are Not Supported
  [2:2] : 0     Deallocated or Unwritten Logical Block error Not Supported
  [1:1] : 0     Namespace uses AWUN, AWUPF, and ACWU
  [0:0] : 0     Thin Provisioning Not Supported

nlbaf   : 1
flbas   : 0
  [4:4] : 0     Metadata Transferred in Separate Contiguous Buffer
  [3:0] : 0     Current LBA Format Selected

mc      : 0
  [1:1] : 0     Metadata Pointer Not Supported
  [0:0] : 0     Metadata as Part of Extended Data LBA Not Supported

dpc     : 0
  [4:4] : 0     Protection Information Transferred as Last 8 Bytes of Metadata Not Supported
  [3:3] : 0     Protection Information Transferred as First 8 Bytes of Metadata Not Supported
  [2:2] : 0     Protection Information Type 3 Not Supported
  [1:1] : 0     Protection Information Type 2 Not Supported
  [0:0] : 0     Protection Information Type 1 Not Supported

dps     : 0
  [3:3] : 0     Protection Information is Transferred as Last 8 Bytes of Metadata
  [2:0] : 0     Protection Information Disabled

nmic    : 0
  [0:0] : 0     Namespace Multipath Not Capable

rescap  : 0
  [6:6] : 0     Exclusive Access - All Registrants Not Supported
  [5:5] : 0     Write Exclusive - All Registrants Not Supported
  [4:4] : 0     Exclusive Access - Registrants Only Not Supported
  [3:3] : 0     Write Exclusive - Registrants Only Not Supported
  [2:2] : 0     Exclusive Access Not Supported
  [1:1] : 0     Write Exclusive Not Supported
  [0:0] : 0     Persist Through Power Loss Not Supported

fpi     : 0
  [7:7] : 0     Format Progress Indicator Not Supported

dlfeat  : 0
  [4:4] : 0     Guard Field of Deallocated Logical Blocks is set to 0xFFFF
  [3:3] : 0     Deallocate Bit in the Write Zeroes Command is Not Supported
  [2:0] : 0     Bytes Read From a Deallocated Logical Block and its Metadata are Not Reported

nawun   : 0
nawupf  : 0
nacwu   : 0
nabsn   : 0
nabo    : 0
nabspf  : 0
noiob   : 0
nvmcap  : 0
nsattr  : 0
nvmsetid: 0
anagrpid: 0
endgid  : 1
nguid   : 00000000000000000000000000000000
eui64   : ffffffffffffffff
LBA Format  0 : Metadata Size: 0   bytes - Data Size: 512 bytes - Relative Performance: 0 Best (in use)
LBA Format  1 : Metadata Size: 0   bytes - Data Size: 4096 bytes - Relative Performance: 0 Best
pi@raspberrypi:~$

Network (TFTP boot)

No response

agners commented 1 year ago

Tested with the latest beta firmware pieeprom-2022-11-04.bin, no success.

SIG pieeprom.sig 9221b44ba796ba4cd2f60a27990afc66c74d7e2f72b2f7be02e9f8ab13f39481 1668678754
Reading EEPROM: 524288
Writing EEPROM
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************....+
Verify BOOT EEPROM
Reading EEPROM: 524288
BOOT-EEPROM: UPDATED

When enabling UART, I get the following:

...
Trying partition: 0
type: 32 lba: 8192 oem: 'mkfs.fat' volume: ' boot       '
rsc 32 fat-sectors 1020 c-count 130554 c-size 4
root dir cluster 2 sectors 0 entries 0
FAT32 clusters 130554
Trying partition: 0
type: 32 lba: 8192 oem: 'mkfs.fat' volume: ' boot       '
rsc 32 fat-sectors 1020 c-count 130554 c-size 4
root dir cluster 2 sectors 0 entries 0
FAT32 clusters 130554
Read config.txt bytes     2075 hnd 0x163
Read start4.elf bytes  2249280 hnd 0x3cdb
Read fixup4.dat bytes     5399 hnd 0x169
0x00b03140 0x00000000 0x00001fff
MEM GPU: 76 ARM: 948 TOTAL: 1024
Firmware: 102f1e848393c2112206fadffaaf86db04e98326 Aug 26 2022 14:03:16
Starting start4.elf @ 0xfec00200 partition 0
NVME off
+

So it seems start4.elf crashing actually?

peterharperuk commented 1 year ago

Firmware: 102f1e848393c2112206fadffaaf86db04e98326 Aug 26 2022 14:03:16

Any chance you can try a recent version of start4.elf? Something after "Oct 5 2022" It sounds a bit like https://github.com/RPi-Distro/repo/issues/309

You can get the most recent here https://github.com/raspberrypi/firmware/tree/master/boot

If you add enable_uart=1 and uart_2ndstage=1 to config.txt it'll tell us if it's getting as far as firmware.

agners commented 1 year ago

After realizing that the EEPROM is actually handing off too start4.elf, that is exactly what I was start doing :smile:

I manually replaced the files on the NVMe with the ones from the 1.20221104 tag, and it boots! The firmware which is currently shipped with the Raspberry Pi OS seems to be the culprit (release date 2022-09-22 according to RPi Imager, start4.elf seems to be from 1.20220830 tag).

agners commented 1 year ago

It seems that 1.20221028 already fixed the problem. From the git log no change between 1.20220830..1.20221028 really jumps into my eye which seems like it would address an NVMe boot issue, but maybe the git log is incomplete?

In any case, booting works with the latest firmware hence this can be closed.

peterharperuk commented 1 year ago

but maybe the git log is incomplete

Yes - my fault. The change was to turn nvme off before switching from bootloader to firmware and firmware to kernel. That "NVME off" line is showing the fix working in the bootloader. I failed to mark the change as affecting the firmware so there's no comment in the git log - apologies for that. Thanks for testing.

agners commented 1 year ago

Ok, I see. I was actually wondering if the "NVME off" log entry could be a problem. Thanks for the insight!