raspberrypi / firmware

This repository contains pre-compiled binaries of the current Raspberry Pi kernel and modules, userspace libraries, and bootloader/GPU firmware.
5.15k stars 1.68k forks source link

RPi no longer booting after upgrading to latest start.elf/fixup.dat #1445

Closed pbatard closed 3 years ago

pbatard commented 4 years ago

A regression appears to have been introduced with the latest firmware in that we are no longer able to boot binaries such as the Trusted Firmware or the UEFI firmware. Reverting to an older version of the RPi firmware solves this issue.

Steps to replicate

Result

The boot freezes after the following output on the serial console, with the multicoloured screen staying on:

recover4.elf not found (6)
recovery.elf not found (6)
Read start4.elf bytes  2277760 hnd 0x00000089 hash 'ae50b620c077baa5'
Read fixup4.dat bytes     5418 hnd 0x00000088 hash '3f2c561eec60a59c'
0x00c03111 0x00000000 0x0000001f
MEM GPU: 76 ARM: 948 TOTAL: 1024
Starting start4.elf @ 0xfec00200 partition 0

Then, if you replace start4.elf and fixup4.dat with a version published before 2020.07.14, everything works as expected, and you get the expected output:

recover4.elf not found (6)
recovery.elf not found (6)
Read start4.elf bytes  2277248 hnd 0x00000089 hash '8e98b15f075142da'
Read fixup4.dat bytes     5409 hnd 0x00000088 hash 'bdc1f053a4ad68f8'
0x00c03111 0x00000000 0x0000001f
MEM GPU: 76 ARM: 948 TOTAL: 1024
Starting start4.elf @ 0xfec00200 partition 0

NOTICE:  BL31: v2.3():v2.3
NOTICE:  BL31: Built : 10:40:51, Apr 21 2020

Note that commenting out the device_tree_address=0x1f0000, device_tree_end=0x200000 has no effect, so it's not a device tree setup issue.

We are in the process of pinpointing the exact revision of start4.elf that introduced this issue, but since we didn't expect such a major regression and now have to scramble to fix our UEFI firmware downloads as a result, it might be a while before we can do so...

pbatard commented 4 years ago

We also observe the same issue with the latest start.elf/fixup.dat on the Pi 3, where reverting to an older version "fixes" boot.

pelwell commented 4 years ago

I'll take a look.

pelwell commented 4 years ago

We've identified the problem - a fix will be released shortly.

pbatard commented 4 years ago

I appreciate that, thanks!

pelwell commented 4 years ago

The bug was introduced in https://github.com/raspberrypi/firmware/commit/bd816dbac723e04f8f8b06bb2e16d767cda7692c, and caused your stub to not be loaded, the built-in default being used instead. As a result of the load-time optimisation in that release an external ARM stub would be loaded to address zero, which is fine for the VPU's memory system but our fread() implementation doesn't like it.

popcornmix commented 4 years ago

rpi-update firmware should have a fix for this.

pbatard commented 4 years ago

Many thanks for the quick turnover. Much appreciated!

We confirm that the new version of the firmware fixes the issue.

At this stage I have to ask, assuming that you already have some kind of CI/testing facility for your binaries: Are there any chances you could add the booting of the bl31.bin to the list of tests you run on the Raspberry Pi 4?

Provided you have the means to automate the copying of start files to a media, and then boot a Raspberry Pi 4 with it, we would greatly appreciate if you could add the boot process and boot files we described above as one of the tests you perform, with validation that you do get the NOTICE: BL31: v2.3():v2.3 message on the serial output (as this is a fixed notice that should always appears if the binary was able to boot).

Thanks again.

pbatard commented 4 years ago

I'm afraid that there is more to this issue than the initial problem we reported.

Whereas the commit from August 5th fixed the initial boot problem, where we couldn't even get the UEFI firmware to run, we are still being left with a major regression where GRUB (and thus Linux boot) is completely broken, whereas it worked absolutely fine with the start.elf/fixup.dat prior to bd816dbac723e04f8f8b06bb2e16d767cda7692c.

The problem can be replicated as follows (this requires a few downloads):

  1. Download the 1.30 version of the UEFI firmware from https://github.com/pftf/RPi3/releases/tag/v1.30/.
    Note that this zip archive includes start.elf/fixup.dat from bd816dbac723e04f8f8b06bb2e16d767cda7692c.
  2. Download the debian-10.6.0-arm64-netinst.iso from https://cdimage.debian.org/debian-cd/current/arm64/iso-cd/
  3. Create an MBR/FAT32 formatted media and extract both the content of the ISO and the content of the UEFI zip onto it
  4. Insert the media into a Pi 3, and let it boot to the GRUB prompt (NB: you can press Enter during the UEFI prompt to speed this up)
  5. At the GRUB menu, select Install or Graphical Install.

After a few seconds, you are presented with the Debian installation screen, as expected.

Now, replace start.elf/fixup.dat with a post August 5th version and repeat steps 4 & 5.

This time, booting to the installer either freezes or produces GRUB error error: failed to install/update FDT before forcing a return to the GRUB prompt...

Therefore, another behavioural change has been introduced between July 31st and August 5th, that broke the ability for GRUB ARM64 to run a Linux kernel...

pbatard commented 4 years ago

Just a quick update to mention that (after having to fiddle with the baudrate, since the post July 31st start.elf update also moved our UEFI serial baudrate, which, last time I checked, we do set up according to the official documentation, from 115200 to ~180000 bauds) that the new start.elf is triggering an assert here during our processing of the Device Tree to sort out psci.

So this explains why GRUB/kernel will complain about the Device Tree not being found and fail, since we don't pass something that we haven't been able to process.

I'm still trying to identify exactly what has changed, from our perspective, to trigger that assert. But it may take a while before I can provide more data.

Note that, if you want to get extended serial data that includes asserts, and after making sure that you set the baudrate to 180000, you can replace the release version of RPI_EFI.fd with the DEBUG version from https://ci.appveyor.com/project/pbatard/rpi3/builds/34965252/artifacts.

markmi commented 3 years ago

Another example context that looks to involve some of what is reported here is:

RPi4B 4GiByte FreeBSD (in gradual development) uses an armstub8-gic.bin port and a u-boot port as one way to boot. Such boots using 542aceb (2020-Jul-17). But, attempting booting via the later versions of the firmware tried hangs up with the rainbow screen showing and the end of the serial output looking like:

MESS:00:00:09.122140:0: brfs: File read: /mfs/sd/armstub8-gic.bin MESS:00:00:09.125171:0: Loading 'armstub8-gic.bin' to 0x0 size 0x1700 MESS:00:00:09.131367:0: brfs: File read: 5888 bytes MESS:00:00:09.251424:0: brfs: File read: /mfs/sd/u-boot.bin MESS:00:00:09.253939:0: Loading 'u-boot.bin' to 0x80000 size 0x8b9e8 MESS:00:00:09.260029:0: Device tree loaded to 0x4000 (size 0xbe0c) MESS:00:00:09.269647:0: uart: Set PL011 baud rate to 103448.300000 Hz MESS:00:00:09.275976:0: uart: Baud rate change done... MESS:00:00:09.278038:0: uart: Baud rate change done... MESS:00:00:09.283427:0: gpioman: gpioman_get_pin_num: pin SDCARD_CONTROL_POWER not defined

This is still true of 63b1922 (2020-Oct-08) firmware.

armstub8-gic.bin expects to be loaded at address zero and u-boot (as configured/patched) expects to internally reserve/avoid that area and to cause the area to show up as reserved. (armstub8-gic.bin is over a page in size.) With the problematical firmware versions, u-boot does not get far enough to start putting debug messages to the console (when built to do so).

pelwell commented 3 years ago

Can you upload a build of u-boot.bin somewhere to demonstrate the problem?

pbatard commented 3 years ago

Another thing we are seeing is that the Device Tree seems to be corrupted during the early processing with newer versions of start.elf.

For instance, as per pftf/RPi3#22, we are seeing obvious signs of corruption such as with:

  (...)
  000007B0  2C 62 63 6D 32 38 33 35 2D 74 78 70 00 00 00 00  ,bcm2835-txp....
  000007C0  72 76 65 64 2D 6D 65 6D 6F 72 79 00 00 00 00 03  rved-memory.....
  (...)

whereas it should say reserved-memory.

I'm still in the process of (slowly) investigating this (there's still a possibility that the corruption could come from TF-A and not start.elf, but my understanding is that TF-A does not touch the Device Tree data), but it actually explains the errors we are seeing with Linux boot, since the UEFI firmware gives up on processing a corrupted Device Tree, which means that when GRUB invokes the kernel, no Device Tree is being provided hence the failure to boot.

Can you guys please validate that the Device Tree that's being handed over by the newer versions of start.elf is really valid? Another thing that strikes me is that, whereas the Device Tree is being altered from the one on SD/USB, with elements being added/removed, its size remains the same as the one on disk (or at least the size that's being handed over to the subsequent boot loaded), which looks a bit suspicious.

All in all, if I were to take a guess, it would look to me like the post Jul. 31st version of start.elf is corrupting the Device Tree in some circumstances.

pelwell commented 3 years ago

Using the settings from the first post and bl31.bin I've managed to recreate a DT failure. I think it's a fairly simple fix. It took longer to debug because the stub doesn't disable kernel loading (offset 0xfc should be 0xffffffff if the firmware isn't expected to load it), and the kernel which was on the card was large enough to overwrite the DTB at 0x1f00000.

nullr0ute commented 3 years ago

Seeing this issue on Fedora too, various U-Boot available here: https://pbrobinson.fedorapeople.org/rpi-u-boot/

A work around was to enable the UART universally. Not sure if it's related but U-Boot crashes on the 8Gb model if there isn't a display plugged in. The config.txt we use is:

# Raspberry Pi 2
[pi2]
kernel=rpi2-u-boot.bin

# Raspberry Pi 3
[pi3]
kernel=rpi3-u-boot.bin

# Raspberry Pi 4
[pi4]
kernel=rpi4-u-boot.bin

# Default Fedora configs for all Raspberry Pi Revisions
[all]
# Enable UART
# Only enable UART if you're going to use it as it has speed implications
# Serial console is ttyS0 on RPi3 and ttyAMA0 on all other variants
# u-boot will auto detect serial and pass corrent options to kernel if enabled
# Speed details: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=141195
# enable_uart=1

# Put the RPi into 64 bit mode
# arm_64bit=1

# Early boot delay in the hope monitors are initialised enough to provide EDID
bootcode_delay=1

# We need this to be 32Mb to support VCHI services and drivers which use them
# but this isn't used by mainline VC4 driver so reduce to lowest supported value
# You need to set this to at least 80 for using the camera
gpu_mem=32

# Use eXtended firmware by default
start_x=1

# New option to allow the firmware to load upstream dtb
# Will allow things like camera, touchscreen etc to work OOTB
upstream_kernel=1

# HAT and DT overlays. Documentation at Raspberry Pi here:
# https://www.raspberrypi.org/documentation/configuration/device-tree.md
# Each dtoverlay line is an individual HAT/overlay, multiple lines allowed
# The dtoverlay=upstream must be present for Fedora kernels
dtoverlay=upstream
# dtoverlay=rpi-sense

# Allow OS rather than firmware control CEC
mask_gpu_interrupt1=0x100

# Without this sdram runs at 400mhz, instead of 450
# https://github.com/Hexxeh/rpi-firmware/issues/172
audio_pwm_mode=0

# Other options you can adjust for all Raspberry Pi Revisions
# https://www.raspberrypi.org/documentation/configuration/config-txt/README.md
# All options documented at http://elinux.org/RPiconfig
# for more options see http://elinux.org/RPi_config.txt
pelwell commented 3 years ago

There's a test firmware for 4B (start4 and start4x) that should solve the DTB corruption problem here: https://drive.google.com/file/d/1MeFj10YIq6RNXfU5CVGKmDLVLoA2inRU/view?usp=sharing

If the incorrect baudrates are arrived at after querying the core clock using mailbox message 0x30002 (GET_CLOCK_RATE) and the problem started appearing with https://github.com/raspberrypi/firmware/commit/11e3c314bc2b64f7d862bac00ff3d9f42f3c5a50 then that is due to a separate matter under discussion here at the moment.

pelwell commented 3 years ago

By the way, dtoverlay=upstream should be implied by upstream_kernel=1.

klaus4 commented 3 years ago

There's a test firmware for 4B (start4 and start4x) that should solve the DTB corruption problem here: https://drive.google.com/file/d/1MeFj10YIq6RNXfU5CVGKmDLVLoA2inRU/view?usp=sharing

If the incorrect baudrates are arrived at after querying the core clock using mailbox message 0x30002 (GET_CLOCK_RATE) and the problem started appearing with 11e3c31 then that is due to a separate matter under discussion here at the moment.

great, thanks a lot!
1st successful straight SSD-bootup FreeBSD, while I notice a very slow kernel loading , but after the dtb is loaded , the boot speeds up to normal ... dmesg incl. uart_2ndstage=1 : https://dmesgd.nycbug.org/index.cgi?do=view&id=5711

klaus4 commented 3 years ago

U-Boot 2020.10-rc5 (Oct 05 2020 - 03:08:23 +0000)

DRAM: 7.9 GiB RPI 4 Model B (0xd03114) MMC: mmc@7e300000: 1, emmc2@7e340000: 0 Loading Environment from FAT... In: serial Out: vidconsole Err: vidconsole Net: eth0: ethernet@7d580000 PCIe BRCM: link up, 5.0 Gbps x1 (SSC) starting USB... Bus xhci_pci: probe failed, error -110 No working controllers found .. I am not sure if it is desirable to include the issue here, but the same disk booted successfully on the 4GB-model fails on early boot stage on the 8GB-model, while the following seems not yet to be upstreamed ? : https://patchwork.ozlabs.org/project/linux-pci/patch/20200629161845.6021-4-nsaenzjulienne@suse.de/ https://patchwork.ozlabs.org/project/linux-pci/patch/20200629161845.6021-5-nsaenzjulienne@suse.de/

markmi commented 3 years ago

There's a test firmware for 4B (start4 and start4x) that should solve the DTB corruption problem here: https://drive.google.com/file/d/1MeFj10YIq6RNXfU5CVGKmDLVLoA2inRU/view?usp=sharing

If the incorrect baudrates are arrived at after querying the core clock using mailbox message 0x30002 (GET_CLOCK_RATE) and the problem started appearing with 11e3c31 then that is due to a separate matter under discussion here at the moment.

These notes are for a 4 GiByte RPi4B with the MSD USB eeprom update, avoiding the issue of the status of u-boot 2020.10's handling of the xHCI for the 8 GiByte RPi4B.

My github.com/pftf/RPi4 uefi/ACPI v1.20 USB3 SSD (no microsd card) context booted just fine with this but has been tolerating sufficiently recent firmware versions before this. (This and the below all involve EFI/BOOT/BOOTAA64.EFI so I'll not mention it explicitly after this.)

As for armstub8-gic.bin and u-boot 2020.10:

Booting via a MBR microsd card that has the EFI msdos file system material and a FreeBSD kernel on a ufs file system (where the kernel in turn uses the USB3 SDD for later stages and general operation), it boots all the way just fine.

Booting via USB3-SSD-only context boots all the way just fine (now that I have the right partition type for the msdos file system involved).

Overall: original FreeBSD 4 GiByte RPi4B problem fixed.

pbatard commented 3 years ago

@pelwell, is there any chance you could also provide a test firmware for the Pi 3?

While the original issue (failure to boot into TF-A) was mostly opened against the Pi 4, I've been testing the subsequent issue (Device Tree corruption) on the Pi 3, so, if possible, I'd like to validate that your planned fix addresses what I've seen there. Thanks.

pelwell commented 3 years ago

Of course - the archive (reachable from the same download link) now includes start and startx (and fixups).

pbatard commented 3 years ago

Thanks a lot.

The new start.elf looks good to me.

I can see that the reported Device Tree size now accounts for the alterations carried out (0x73EB bytes vs 0x6FB7 bytes previously) and our Device Tree parsing code is happy to declare the data as valid (with no more truncated keywords in the hex dump) which means that both GRUB and the Linux kernel now boot as expected. Once again, your quick work in identifying the bug and providing a fix is much appreciated!

As to the serial baudrate change, we are indeed using mailbox GET_CLOCK_RATE to set our base serial base clock, so I guess we'll just have to wait for the result of your internal discussions on that. Obviously, this is not as major an issue as the Device Tree one and we don't mind changing our baudrate setup code if needed, as long as the new method of computing the miniUART baudrate is properly documented.

pelwell commented 3 years ago

A patch to change the GET_CLOCK_RATE behaviour has gone into the internal firmware repo in the last few minutes. An archive of the same four firmware variants including the patch is available here: https://drive.google.com/file/d/1Rb_hhJ1t2L9gEhPuxqk8V3Q2IoJVZ7XZ/view?usp=sharing

popcornmix commented 3 years ago

Latest rpi-update firmware has the potential two fixes mentioned here.

pbatard commented 3 years ago

I confirm both the baudrate and Device Tree issues we were seeing on the Pi 3 are fixed with the latest from https://github.com/raspberrypi/firmware/tree/master/boot. Thanks.

pelwell commented 3 years ago

Is there anything outstanding on this issue?

pbatard commented 3 years ago

Not from my perspective. Both the original problem reported when the issue was opened and the subsequent matter we raised 9 days ago have been addressed. In other words, I am no longer aware of any regression that needs fixing, as far as my testing is concerned.

markmi commented 3 years ago

FYI: The following may be related background information on what is enabled vs. what is not yet so, not necessarily as something 1445 should be dealing with.

As far as I can tell from what I've recently learned, the 8 GiByte RPI4B VL805 handling vs. u-boot 2020.10 xHCI/USB3 issue is u-boot waiting on material from github.com/torvalds/linux/commits/master/arch/arm/boot/dts/bcm2711-rpi-4-b.dts (2020-Aug-18 as 258f92d). It has been reported by klaus4 that the updated .dtb file that would result would lead to triggering "RASPBERRYPI_FIRMWARE_RESET_ID_USB: bcm2711_notify_vl805_reset()" in u-boot. (This report was in a FreeBSD context, not here.)

daniframinan commented 3 years ago

Hi, I have the same problem, running a rpi4 with 4gb. I have updated the EEPROM, though i may be doing it wrong(I flash the card with the imager, then I put it in the pi, the screen turns green and I turn it off when the screen turns off) but I can't seem to get it to work, I get the following error (pretty much the same as him at the beginning) Screenshot_20201022-182018_2

timg236 commented 3 years ago

Does it boot if you comment out start_x=1 in config.txt ?

edit @daniframinan that looks like a separate issue, please create a new bug report with full details of firmware / rpiupdate and any connected hardware

pbatard commented 3 years ago

Indeed, I seriously doubt that this is the same issue, because the one I reported was about bl31.bin freezing before producing the expected output on the serial console, and it has been properly addressed with a start.elf update (along with the subsequent .dtb corruption issue we picked up).

Unless you are monitoring serial output, and using the same config.txt as the one from my first post, as well as using bl31.bin as the armstub and find that your serial ouput freezes before it produces the lines:

NOTICE:  BL31: v2.3():v2.3
NOTICE:  BL31: Built : 10:40:51, Apr 21 2020

then I'm afraid that what you are observing is something that is not related to this specific issue, and you should open a separate report.

pelwell commented 3 years ago

...which is a useful reminder to...

lnn2204 commented 3 years ago

Hi, I have the same problem, running a rpi4 with 4gb. I have updated the EEPROM, though i may be doing it wrong(I flash the card with the imager, then I put it in the pi, the screen turns green and I turn it off when the screen turns off) but I can't seem to get it to work, I get the following error (pretty much the same as him at the beginning) Screenshot_20201022-182018_2

Did you resolve it?

zp-00 commented 3 months ago

When I replaced the 6.1 kernel on the 5.1 image, I also encountered the problem that the SD control_power pin was not defined. Is this a problem with uboot? Do I need to download the latest image and replaced the 6.1 kernel or other?

`MESS:00:00:05.901070:0: brfs: File read: 79 bytes MESS:00:00:07.900671:0: brfs: File read: /mfs/sd/kernel8.img MESS:00:00:07.903228:0: Loaded 'kernel8.img' to 0x80000 size 0x167f200 MESS:00:00:07.931545:0: Kernel relocated to 0x200000 MESS:00:00:07.933397:0: Device tree loaded to 0x2eff2200 (size 0xdd8f) MESS:00:00:07.941094:0: uart: Set PL011 baud rate to 103448.300000 Hz MESS:00:00:07.948720:0: uart: Baud rate change done... MESS:00:00:07.950742:0: uart: Baud rate change done... MESS:00:00:07.975566:0: gpioman: gpioman_get_pin_num: pin SDCARD_CONTROL_POWER not defined

`

popcornmix commented 3 months ago

Posting on a 4 year old, unrelated, closed issue is not the best option. The forum may be a better place for this question.

zp-00 commented 3 months ago

ok,thanks you