Closed pbatard closed 3 years ago
We also observe the same issue with the latest start.elf
/fixup.dat
on the Pi 3, where reverting to an older version "fixes" boot.
I'll take a look.
We've identified the problem - a fix will be released shortly.
I appreciate that, thanks!
The bug was introduced in https://github.com/raspberrypi/firmware/commit/bd816dbac723e04f8f8b06bb2e16d767cda7692c, and caused your stub to not be loaded, the built-in default being used instead. As a result of the load-time optimisation in that release an external ARM stub would be loaded to address zero, which is fine for the VPU's memory system but our fread() implementation doesn't like it.
rpi-update firmware should have a fix for this.
Many thanks for the quick turnover. Much appreciated!
We confirm that the new version of the firmware fixes the issue.
At this stage I have to ask, assuming that you already have some kind of CI/testing facility for your binaries: Are there any chances you could add the booting of the bl31.bin
to the list of tests you run on the Raspberry Pi 4?
Provided you have the means to automate the copying of start files to a media, and then boot a Raspberry Pi 4 with it, we would greatly appreciate if you could add the boot process and boot files we described above as one of the tests you perform, with validation that you do get the NOTICE: BL31: v2.3():v2.3
message on the serial output (as this is a fixed notice that should always appears if the binary was able to boot).
Thanks again.
I'm afraid that there is more to this issue than the initial problem we reported.
Whereas the commit from August 5th fixed the initial boot problem, where we couldn't even get the UEFI firmware to run, we are still being left with a major regression where GRUB (and thus Linux boot) is completely broken, whereas it worked absolutely fine with the start.elf
/fixup.dat
prior to bd816dbac723e04f8f8b06bb2e16d767cda7692c.
The problem can be replicated as follows (this requires a few downloads):
start.elf
/fixup.dat
from bd816dbac723e04f8f8b06bb2e16d767cda7692c.debian-10.6.0-arm64-netinst.iso
from https://cdimage.debian.org/debian-cd/current/arm64/iso-cd/Install
or Graphical Install
.After a few seconds, you are presented with the Debian installation screen, as expected.
Now, replace start.elf
/fixup.dat
with a post August 5th version and repeat steps 4 & 5.
This time, booting to the installer either freezes or produces GRUB error error: failed to install/update FDT
before forcing a return to the GRUB prompt...
Therefore, another behavioural change has been introduced between July 31st and August 5th, that broke the ability for GRUB ARM64 to run a Linux kernel...
Just a quick update to mention that (after having to fiddle with the baudrate, since the post July 31st start.elf
update also moved our UEFI serial baudrate, which, last time I checked, we do set up according to the official documentation, from 115200 to ~180000 bauds) that the new start.elf
is triggering an assert here during our processing of the Device Tree to sort out psci.
So this explains why GRUB/kernel will complain about the Device Tree not being found and fail, since we don't pass something that we haven't been able to process.
I'm still trying to identify exactly what has changed, from our perspective, to trigger that assert. But it may take a while before I can provide more data.
Note that, if you want to get extended serial data that includes asserts, and after making sure that you set the baudrate to 180000, you can replace the release version of RPI_EFI.fd
with the DEBUG version from https://ci.appveyor.com/project/pbatard/rpi3/builds/34965252/artifacts.
Another example context that looks to involve some of what is reported here is:
RPi4B 4GiByte FreeBSD (in gradual development) uses an armstub8-gic.bin port and a u-boot port as one way to boot. Such boots using 542aceb (2020-Jul-17). But, attempting booting via the later versions of the firmware tried hangs up with the rainbow screen showing and the end of the serial output looking like:
MESS:00:00:09.122140:0: brfs: File read: /mfs/sd/armstub8-gic.bin MESS:00:00:09.125171:0: Loading 'armstub8-gic.bin' to 0x0 size 0x1700 MESS:00:00:09.131367:0: brfs: File read: 5888 bytes MESS:00:00:09.251424:0: brfs: File read: /mfs/sd/u-boot.bin MESS:00:00:09.253939:0: Loading 'u-boot.bin' to 0x80000 size 0x8b9e8 MESS:00:00:09.260029:0: Device tree loaded to 0x4000 (size 0xbe0c) MESS:00:00:09.269647:0: uart: Set PL011 baud rate to 103448.300000 Hz MESS:00:00:09.275976:0: uart: Baud rate change done... MESS:00:00:09.278038:0: uart: Baud rate change done... MESS:00:00:09.283427:0: gpioman: gpioman_get_pin_num: pin SDCARD_CONTROL_POWER not defined
This is still true of 63b1922 (2020-Oct-08) firmware.
armstub8-gic.bin expects to be loaded at address zero and u-boot (as configured/patched) expects to internally reserve/avoid that area and to cause the area to show up as reserved. (armstub8-gic.bin is over a page in size.) With the problematical firmware versions, u-boot does not get far enough to start putting debug messages to the console (when built to do so).
Can you upload a build of u-boot.bin somewhere to demonstrate the problem?
Another thing we are seeing is that the Device Tree seems to be corrupted during the early processing with newer versions of start.elf
.
For instance, as per pftf/RPi3#22, we are seeing obvious signs of corruption such as with:
(...)
000007B0 2C 62 63 6D 32 38 33 35 2D 74 78 70 00 00 00 00 ,bcm2835-txp....
000007C0 72 76 65 64 2D 6D 65 6D 6F 72 79 00 00 00 00 03 rved-memory.....
(...)
whereas it should say reserved-memory
.
I'm still in the process of (slowly) investigating this (there's still a possibility that the corruption could come from TF-A and not start.elf
, but my understanding is that TF-A does not touch the Device Tree data), but it actually explains the errors we are seeing with Linux boot, since the UEFI firmware gives up on processing a corrupted Device Tree, which means that when GRUB invokes the kernel, no Device Tree is being provided hence the failure to boot.
Can you guys please validate that the Device Tree that's being handed over by the newer versions of start.elf
is really valid? Another thing that strikes me is that, whereas the Device Tree is being altered from the one on SD/USB, with elements being added/removed, its size remains the same as the one on disk (or at least the size that's being handed over to the subsequent boot loaded), which looks a bit suspicious.
All in all, if I were to take a guess, it would look to me like the post Jul. 31st version of start.elf
is corrupting the Device Tree in some circumstances.
Using the settings from the first post and bl31.bin I've managed to recreate a DT failure. I think it's a fairly simple fix. It took longer to debug because the stub doesn't disable kernel loading (offset 0xfc should be 0xffffffff if the firmware isn't expected to load it), and the kernel which was on the card was large enough to overwrite the DTB at 0x1f00000.
Seeing this issue on Fedora too, various U-Boot available here: https://pbrobinson.fedorapeople.org/rpi-u-boot/
A work around was to enable the UART universally. Not sure if it's related but U-Boot crashes on the 8Gb model if there isn't a display plugged in. The config.txt we use is:
# Raspberry Pi 2 [pi2] kernel=rpi2-u-boot.bin # Raspberry Pi 3 [pi3] kernel=rpi3-u-boot.bin # Raspberry Pi 4 [pi4] kernel=rpi4-u-boot.bin # Default Fedora configs for all Raspberry Pi Revisions [all] # Enable UART # Only enable UART if you're going to use it as it has speed implications # Serial console is ttyS0 on RPi3 and ttyAMA0 on all other variants # u-boot will auto detect serial and pass corrent options to kernel if enabled # Speed details: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=141195 # enable_uart=1 # Put the RPi into 64 bit mode # arm_64bit=1 # Early boot delay in the hope monitors are initialised enough to provide EDID bootcode_delay=1 # We need this to be 32Mb to support VCHI services and drivers which use them # but this isn't used by mainline VC4 driver so reduce to lowest supported value # You need to set this to at least 80 for using the camera gpu_mem=32 # Use eXtended firmware by default start_x=1 # New option to allow the firmware to load upstream dtb # Will allow things like camera, touchscreen etc to work OOTB upstream_kernel=1 # HAT and DT overlays. Documentation at Raspberry Pi here: # https://www.raspberrypi.org/documentation/configuration/device-tree.md # Each dtoverlay line is an individual HAT/overlay, multiple lines allowed # The dtoverlay=upstream must be present for Fedora kernels dtoverlay=upstream # dtoverlay=rpi-sense # Allow OS rather than firmware control CEC mask_gpu_interrupt1=0x100 # Without this sdram runs at 400mhz, instead of 450 # https://github.com/Hexxeh/rpi-firmware/issues/172 audio_pwm_mode=0 # Other options you can adjust for all Raspberry Pi Revisions # https://www.raspberrypi.org/documentation/configuration/config-txt/README.md # All options documented at http://elinux.org/RPiconfig # for more options see http://elinux.org/RPi_config.txt
There's a test firmware for 4B (start4 and start4x) that should solve the DTB corruption problem here: https://drive.google.com/file/d/1MeFj10YIq6RNXfU5CVGKmDLVLoA2inRU/view?usp=sharing
If the incorrect baudrates are arrived at after querying the core clock using mailbox message 0x30002 (GET_CLOCK_RATE) and the problem started appearing with https://github.com/raspberrypi/firmware/commit/11e3c314bc2b64f7d862bac00ff3d9f42f3c5a50 then that is due to a separate matter under discussion here at the moment.
By the way, dtoverlay=upstream
should be implied by upstream_kernel=1
.
There's a test firmware for 4B (start4 and start4x) that should solve the DTB corruption problem here: https://drive.google.com/file/d/1MeFj10YIq6RNXfU5CVGKmDLVLoA2inRU/view?usp=sharing
If the incorrect baudrates are arrived at after querying the core clock using mailbox message 0x30002 (GET_CLOCK_RATE) and the problem started appearing with 11e3c31 then that is due to a separate matter under discussion here at the moment.
great, thanks a lot!
1st successful straight SSD-bootup FreeBSD, while I notice a very slow kernel loading , but after the dtb is loaded , the boot speeds up to normal ...
dmesg incl. uart_2ndstage=1 :
https://dmesgd.nycbug.org/index.cgi?do=view&id=5711
U-Boot 2020.10-rc5 (Oct 05 2020 - 03:08:23 +0000)
DRAM: 7.9 GiB RPI 4 Model B (0xd03114) MMC: mmc@7e300000: 1, emmc2@7e340000: 0 Loading Environment from FAT... In: serial Out: vidconsole Err: vidconsole Net: eth0: ethernet@7d580000 PCIe BRCM: link up, 5.0 Gbps x1 (SSC) starting USB... Bus xhci_pci: probe failed, error -110 No working controllers found .. I am not sure if it is desirable to include the issue here, but the same disk booted successfully on the 4GB-model fails on early boot stage on the 8GB-model, while the following seems not yet to be upstreamed ? : https://patchwork.ozlabs.org/project/linux-pci/patch/20200629161845.6021-4-nsaenzjulienne@suse.de/ https://patchwork.ozlabs.org/project/linux-pci/patch/20200629161845.6021-5-nsaenzjulienne@suse.de/
There's a test firmware for 4B (start4 and start4x) that should solve the DTB corruption problem here: https://drive.google.com/file/d/1MeFj10YIq6RNXfU5CVGKmDLVLoA2inRU/view?usp=sharing
If the incorrect baudrates are arrived at after querying the core clock using mailbox message 0x30002 (GET_CLOCK_RATE) and the problem started appearing with 11e3c31 then that is due to a separate matter under discussion here at the moment.
These notes are for a 4 GiByte RPi4B with the MSD USB eeprom update, avoiding the issue of the status of u-boot 2020.10's handling of the xHCI for the 8 GiByte RPi4B.
My github.com/pftf/RPi4 uefi/ACPI v1.20 USB3 SSD (no microsd card) context booted just fine with this but has been tolerating sufficiently recent firmware versions before this. (This and the below all involve EFI/BOOT/BOOTAA64.EFI so I'll not mention it explicitly after this.)
As for armstub8-gic.bin and u-boot 2020.10:
Booting via a MBR microsd card that has the EFI msdos file system material and a FreeBSD kernel on a ufs file system (where the kernel in turn uses the USB3 SDD for later stages and general operation), it boots all the way just fine.
Booting via USB3-SSD-only context boots all the way just fine (now that I have the right partition type for the msdos file system involved).
Overall: original FreeBSD 4 GiByte RPi4B problem fixed.
@pelwell, is there any chance you could also provide a test firmware for the Pi 3?
While the original issue (failure to boot into TF-A) was mostly opened against the Pi 4, I've been testing the subsequent issue (Device Tree corruption) on the Pi 3, so, if possible, I'd like to validate that your planned fix addresses what I've seen there. Thanks.
Of course - the archive (reachable from the same download link) now includes start and startx (and fixups).
Thanks a lot.
The new start.elf
looks good to me.
I can see that the reported Device Tree size now accounts for the alterations carried out (0x73EB bytes
vs 0x6FB7 bytes
previously) and our Device Tree parsing code is happy to declare the data as valid (with no more truncated keywords in the hex dump) which means that both GRUB and the Linux kernel now boot as expected. Once again, your quick work in identifying the bug and providing a fix is much appreciated!
As to the serial baudrate change, we are indeed using mailbox GET_CLOCK_RATE
to set our base serial base clock, so I guess we'll just have to wait for the result of your internal discussions on that. Obviously, this is not as major an issue as the Device Tree one and we don't mind changing our baudrate setup code if needed, as long as the new method of computing the miniUART baudrate is properly documented.
A patch to change the GET_CLOCK_RATE behaviour has gone into the internal firmware repo in the last few minutes. An archive of the same four firmware variants including the patch is available here: https://drive.google.com/file/d/1Rb_hhJ1t2L9gEhPuxqk8V3Q2IoJVZ7XZ/view?usp=sharing
Latest rpi-update firmware has the potential two fixes mentioned here.
I confirm both the baudrate and Device Tree issues we were seeing on the Pi 3 are fixed with the latest from https://github.com/raspberrypi/firmware/tree/master/boot. Thanks.
Is there anything outstanding on this issue?
Not from my perspective. Both the original problem reported when the issue was opened and the subsequent matter we raised 9 days ago have been addressed. In other words, I am no longer aware of any regression that needs fixing, as far as my testing is concerned.
FYI: The following may be related background information on what is enabled vs. what is not yet so, not necessarily as something 1445 should be dealing with.
As far as I can tell from what I've recently learned, the 8 GiByte RPI4B VL805 handling vs. u-boot 2020.10 xHCI/USB3 issue is u-boot waiting on material from github.com/torvalds/linux/commits/master/arch/arm/boot/dts/bcm2711-rpi-4-b.dts (2020-Aug-18 as 258f92d). It has been reported by klaus4 that the updated .dtb file that would result would lead to triggering "RASPBERRYPI_FIRMWARE_RESET_ID_USB: bcm2711_notify_vl805_reset()" in u-boot. (This report was in a FreeBSD context, not here.)
Hi, I have the same problem, running a rpi4 with 4gb. I have updated the EEPROM, though i may be doing it wrong(I flash the card with the imager, then I put it in the pi, the screen turns green and I turn it off when the screen turns off) but I can't seem to get it to work, I get the following error (pretty much the same as him at the beginning)
Does it boot if you comment out start_x=1 in config.txt ?
edit @daniframinan that looks like a separate issue, please create a new bug report with full details of firmware / rpiupdate and any connected hardware
Indeed, I seriously doubt that this is the same issue, because the one I reported was about bl31.bin
freezing before producing the expected output on the serial console, and it has been properly addressed with a start.elf
update (along with the subsequent .dtb
corruption issue we picked up).
Unless you are monitoring serial output, and using the same config.txt
as the one from my first post, as well as using bl31.bin
as the armstub and find that your serial ouput freezes before it produces the lines:
NOTICE: BL31: v2.3():v2.3
NOTICE: BL31: Built : 10:40:51, Apr 21 2020
then I'm afraid that what you are observing is something that is not related to this specific issue, and you should open a separate report.
...which is a useful reminder to...
Hi, I have the same problem, running a rpi4 with 4gb. I have updated the EEPROM, though i may be doing it wrong(I flash the card with the imager, then I put it in the pi, the screen turns green and I turn it off when the screen turns off) but I can't seem to get it to work, I get the following error (pretty much the same as him at the beginning)
Did you resolve it?
When I replaced the 6.1 kernel on the 5.1 image, I also encountered the problem that the SD control_power pin was not defined. Is this a problem with uboot? Do I need to download the latest image and replaced the 6.1 kernel or other?
`MESS:00:00:05.901070:0: brfs: File read: 79 bytes MESS:00:00:07.900671:0: brfs: File read: /mfs/sd/kernel8.img MESS:00:00:07.903228:0: Loaded 'kernel8.img' to 0x80000 size 0x167f200 MESS:00:00:07.931545:0: Kernel relocated to 0x200000 MESS:00:00:07.933397:0: Device tree loaded to 0x2eff2200 (size 0xdd8f) MESS:00:00:07.941094:0: uart: Set PL011 baud rate to 103448.300000 Hz MESS:00:00:07.948720:0: uart: Baud rate change done... MESS:00:00:07.950742:0: uart: Baud rate change done... MESS:00:00:07.975566:0: gpioman: gpioman_get_pin_num: pin SDCARD_CONTROL_POWER not defined
`
Posting on a 4 year old, unrelated, closed issue is not the best option. The forum may be a better place for this question.
ok,thanks you
A regression appears to have been introduced with the latest firmware in that we are no longer able to boot binaries such as the Trusted Firmware or the UEFI firmware. Reverting to an older version of the RPi firmware solves this issue.
Steps to replicate
bl31.bin
(A Raspberry Pi 4 build of the ARM Trusted Firmware binary) from https://github.com/tianocore/edk2-non-osi/blob/master/Platform/RaspberryPi/RPi4/TrustedFirmware/bl31.bin and save it to SD/USBstart4.elf
,fixup4.dat
andbcm2711-rpi-4-b.dtb
config.txt
:Result
The boot freezes after the following output on the serial console, with the multicoloured screen staying on:
Then, if you replace
start4.elf
andfixup4.dat
with a version published before 2020.07.14, everything works as expected, and you get the expected output:Note that commenting out the
device_tree_address=0x1f0000
,device_tree_end=0x200000
has no effect, so it's not a device tree setup issue.We are in the process of pinpointing the exact revision of
start4.elf
that introduced this issue, but since we didn't expect such a major regression and now have to scramble to fix our UEFI firmware downloads as a result, it might be a while before we can do so...