raspberrypi / pico-sdk

BSD 3-Clause "New" or "Revised" License
3.69k stars 917 forks source link

Start main binary at 0x10001000 to allow for standalone second stage loader? #84

Open swetland opened 3 years ago

swetland commented 3 years ago

Since the (Q)SPI flash bootloader is possibly part specific, it would be nice if (for simple cases) firmware could "just work" with a pre-flashed second stage, rather than having to be compiled for a specific flash part.

The first step to enable this would be to align the main image to start at the next erase unit (0x1000 offset) so it can be reflashed without disrupting the second stage.

For more complex projects that need to access the flash other than just reading from it via the XIP window, a function table could be provided for optimized, part-specific low level flash access (replacements for the ROM flash functions).

lurch commented 3 years ago

it can be reflashed without disrupting the second stage.

I assume you're aware that the BOOTSEL mode (which provides the UF2 flashing over MSD) is built into the ROM of the RP2040, and as such can't be changed? :confused: (Unless I've misunderstood what you're asking for?)

Wren6991 commented 3 years ago

The first step to enable this would be to align the main image to start at the next erase unit (0x1000 offset) so it can be reflashed without disrupting the second stage

So in its simplest form, you would just want a chainloader at +0x100 that immediately vectors through a table at +0x1000, and have the current image start at +0x100 moved up to there?

That would be a fairly modest linker script addition, we are looking into templating our linker scripts at some point (as they are a bit copy/pastey) and that would make it pretty easy to add something like this.

Until we have something useful to go in that alignment hole, the default build will probably stay as it is -- people would miss the ~4k of flash they would immediately lose.

optimized, part-specific low level flash access (replacements for the ROM flash functions).

This is a little dicey because you can't do XIP execution whilst programming is in progress. This would lead to you copying 4k of code into RAM.

swetland commented 3 years ago

No changes to the ROM loader are needed, just to the second stage (boot2) which is built by the SDK and lives at offset 0 in the SPI flash.

The SPI flash's erase unit size is 0x1000, so with the main binary at offset 0x100, it's not possible to update the "app" without erasing and replacing boot2 at the same time (one could adjust tooling to save and restore boot2 but that gets kinda fiddly).

What I'm suggesting is that boot2 instead of transferring control to 0x10000100, transfer control to 0x10001000 once it has configured XIP mode. The extra space could also be used to provide a function table of optimized flash io functions to the app, similar to how the boot rom provides generic flash io functions.

The combination of the above then simplifies development for arbitrary rp2040 based boards by no longer requiring the app to have a board-specific flash "driver" compiled in. Of course it doesn't prevent that, either, if that's desirable in a particular instance.

One could also imagine providing some table of board hardware info, though before long that spirals out into some madness like devicetree or ACPI, so maybe simpler is better.

Wren6991 commented 3 years ago

Think we got our messages crossed!

swetland commented 3 years ago

Yeah! Saw your reply moments after I clicked "comment".

swetland commented 3 years ago

Good point about the XIP helpers needing to be SRAM loaded, so not quite as trivial, though still doable.

4K out of a typical several MB flash didn't strike me as a huge cost against the possibility of making arbitrary dev boards "just work" if they have a compatible boot2 installed from the factory. Obviously since the end users has full control of what they're flashing (which is fantastic) separately updateable boot2 + app could be discarded if space is at a premium, etc.

And I'm half-joking, half-not about some kind of HW descriptor table. There's already that firmware info table telling users what GPIO assignments are what. With sufficient cleverness one could allow for the two to be resolved against each other with a little helper routine to run at startup and then you start getting self-configuring systems.

lurch commented 3 years ago

No changes to the ROM loader are needed, just to the second stage (boot2) which is built by the SDK and lives at offset 0 in the SPI flash. The SPI flash's erase unit size is 0x1000, so with the main binary at offset 0x100, it's not possible to update the "app" without erasing and replacing boot2 at the same time (one could adjust tooling to save and restore boot2 but that gets kinda fiddly).

I'm obviously not as familiar with the low-level details as you and Luke, but I guess my concern is that (if I'm understanding this correctly) there'd then be some apps that do have an embedded boot2, and some apps that don't have an embedded boot2 (because they're relying on there already being a suitable boot2 in flash), and how much confusion this could cause users? :man_shrugging:

4K out of a typical several MB flash didn't strike me as a huge cost

Me neither, but we've already had users asking for 48 bytes back! https://github.com/raspberrypi/pico-sdk/pull/78

swetland commented 3 years ago

I'm obviously not as familiar with the low-level details as you and Luke, but I guess my concern is that (if I'm understanding this correctly) there'd then be some apps that do have an embedded boot2, and some apps that don't have an embedded boot2 (because they're relying on there already being a suitable boot2 in flash), and how much confusion this could cause users? man_shrugging

That is a point to consider. It may be that, having launched as it is, it's too late to explore such a proposal. On the other hand, if the no-onboard-flash variant of the part (datasheet indicates onboard flash at least a possibility based on p/n scheme) is most common, and a diverse ecosystem of devboards explodes (yay, success!), dealing with "what flash do I need to compile support for" becomes more and more of a headache for developers and/or SDK maintainers.

Having been through a few OS/platform launches, what I do know is the longer you wait, the more difficult it becomes to make a change like this, and sometimes taking a hit early on can save on pain down the road.

4K out of a typical several MB flash didn't strike me as a huge cost

Me neither, but we've already had users asking for 48 bytes back! #78

Well, I do have to applaud frugality. The way people burn through memory nowadays blows my mind.

lurch commented 3 years ago

dealing with "what flash do I need to compile support for" becomes more and more of a headache for developers and/or SDK maintainers.

I've never written any low-level flash code, but how "incompatible" are different flash chips? Or looking at it from the other angle, how likely is it that 3rd-party RP2040 devboards (intended for general public use) would choose a flash-chip which isn't already supported by the current SDK?

swetland commented 3 years ago

The SDK currently includes 4 different boot2 flash XIP implementations (following info from the header comments in the assembly source files):

I don't know how exhaustively that covers popular, active parts.

Even if the SDK supports the a part, figuring out which part is on your board is another step, and not immediately obvious. Presumably one could install a helper using the generic driver or just copy-to-ram boot2 and attempt to read the part number from the SPI flash.

I haven't yet stumbled over a document that told me exactly what flash part was on my Pico board(s) -- I'm guessing one of those supported by boot2_w25q080.S based on that being the default boot2 version selected by CMakeLists.txt. The Pico Data Sheet and all the marketing literature I've seen simply mentions 2MB of QSPI flash and I assume that the exact part may change from batch to batch based on availability, pricing, etc.

Wren6991 commented 3 years ago

I haven't yet stumbled over a document that told me exactly what flash part was on my Pico board(s) -- I'm guessing one of those supported by boot2_w25q080.S based on that being the default boot2 version selected by CMakeLists.txt. The Pico Data Sheet and all the marketing literature I've seen simply mentions 2MB of QSPI flash and I assume that the exact part may change from batch to batch based on availability, pricing, etc.

Good point, It's a W25Q16JV (if you scroll down in the Pico datasheet you will see the schematic I clipped here), I'll make sure the part number is mentioned higher up in the datasheet too.

image

Having been through a few OS/platform launches, what I do know is the longer you wait, the more difficult it becomes to make a change like this, and sometimes taking a hit early on can save on pain down the road.

Yes, appreciate this, we jumped on #10 for similar reasons.

I don't know how exhaustively that covers popular, active parts.

You can include boot2 files in your project, I guess an example of this would be helpful, and yes there needs to be better tooling for discovering what is on your board.

Will wait for @kilograham to get back before making any changes here, I think one of the major challenges is how this fits into programming tools and how we get boot-from-0x100 binaries to play nicely with boot-from-0x1000 binaries (because people will be upset about that 4k) and he is the right person to weigh in on that aspect of it. I think he's just popped off for a few days' break as we've all been quite hard pressed around launch.

Wren6991 commented 3 years ago

I don't know how exhaustively that covers popular, active parts.

It gives examples of the most common QSPI and DSPI continuous read formats (EBh/BBh), the remaining wrinkles are mostly around things like status register layout.

I would be interested in developing a generic e.g. SFDP extended boot2 that occupies the first 4k of flash, but my brief experience with SFDP (by buying a bunch of random devices off DigiKey to test their SFDP support) is that support is incredibly patchy, with a lot of broken implementations. Then again, 4k gives you a lot of space to work around the quirks.

lurch commented 3 years ago

buying a bunch of random devices .... support is incredibly patchy, with a lot of broken implementations.

Sounds very similar to the situation with SD cards :grinning:

swetland commented 3 years ago

buying a bunch of random devices .... support is incredibly patchy, with a lot of broken implementations.

Sounds very similar to the situation with SD cards

Same as it ever was... last time around for me was a big pile of NVME M.2 SSDs, crosschecking spec-vs-reality while bringing up a host driver.

Regarding "wasting" 4K... looking at some of the existing boot2 implementations which only have 2-3 unused words of their 256 byte allotment, I'd be nervous about having limited space to deal with some more complex boot situation down the road. Sure boot2 could read a larger boot3 that really knows how to turn on XIP, but there's no space for such a critter between the end of boot2 and the start of the app image.

kilograham commented 3 years ago

So the idea of an in flash stub has always been in the back of my mind which is why the ROM UF2 bootloader accepts flash binaries that don't start at 0x10000000 (even though ELF2UF2 doesn't for now). I certainly didn't want to require one, and there are a number of issues to work thru (especially how to not get users in a hole (and open up a support can of worms) where they don't have the stub). So I decided to not make too many set in stone/hasty-given-our-workload decisions until we saw how people started to use the device.

Some random thoughts in no particular order:

  1. There is the general question of whether you rely on there being a stub on the device. Obviously if you only support picotool loads you could take care of this, but one option to consider is just allowing plugging on the "stub" by picotool (i.e. switch out the stub on a UF2)

  2. The idea of self-configuring binaries already occurred to me as I was writing picotool/binary info stuff (which was actually a very very late addition). Sometime we use #define-ed configuration values, but the application can certainly choose not to do this and so be (re-)configurable. I had potentially envisaged some of this via picotool (i.e. just modify the binary - as part of this i had even considered argc,argv :-) ).

  3. Additional use cases for a stub include perhaps:

    1. a grub-like thing for multiple binaries
    2. debugging firmware (e.g. a core 1 debugger impl for a core 0 only binary)
    3. as you say additional board "driver" firmware
  4. As @Wren6991 says, actually changing the binary layout is easy enough

  5. Would this all confuse picotool/binary_info?

    No, because the stub could just leave a "forwarding/chain" binary_info reference to the next executable header (we would need to decide how to display - in picotool - the relationship between information from the stub binary and the application binary)

  6. Not having to prefix a second stage is helpful though (given the copy-able .S files) not critical for other languages.

  7. The 4K (at least with smaller flash) was very important to me for squeezing stuff in, so we certainly want to leave this as an option even if it is not the default (reminds me a I am meaning to make an example INTERFACE library that pares out as much runtime functionality/spave as possible to show how to get a small binary if you really want)

  8. Sometimes the application has a requirement on a particular second stage (mostly today when hugely overclocking when you need a large SSI clock divider). This can be solved in other ways though, including the configuration mentioned above, or of course such a speicific binary should just include the boot_stage2

After saying all this, getting back to the issue at hand, in order of things we could do:

  1. Let the user/downstream do this in a bespoke fashion; this obviously hurts interoperability.
  2. We can add support for inserting space into binary (using new templated linker scripts to make this easier). We would have to support include a forwarding trampoline VTABLE at +0x100. We would make the default be a space up until 0x10001000, but configurable by build defines.
  3. Start building ELFs/UF2s that don't include 0x100000000-0x10001000. This is where it gets a bit tricky and we need more discussion about how this should really work from a logistical point of view. Two areas of concern:

    1. Downloading a partial binary over a "full" binary if these still exist (which I think they will). Well full binaries might be able to (with updated SDK) check offset 0x10001000 and recognize a valid vector table there. if found it could assume it has been overwritten and forward.
    2. The case where there is nothing on the device at all (or invalid boot stage2)

Obviously the board "firmware" needs more discussion, but perhaps we should split out some new issues once we have discussed the basics here more.

swetland commented 3 years ago

I think having the image start at 0x10000000 for a "full" image (boot2+app bundled together) and start at 0x10001000 for a standalone "app" image expecting to work with a separate "boot2" has the advantage of being straight forward and clear in both cases, and very easy to detect by inspecting an ELF or UF2 file.

Full size (4K) "boot2" images could provide a small header at the very end to allow them to be identified as such and/or do this via binary_info (I haven't looked into exactly where all that goes and what it looks like yet).

It seems like the place where the workflow is trickiest is around UF2 usage. Picoboot could easily observe that the binary is "app" only and check to see if there's a "boot2" present or not (and even if the "boot2" represents itself as the "standard" second stage for a device vs a custom one) and warn the user.

But this comes back around to the whole issue of "assuming there's a variety of devices such that the SDK's default boot2 is not compatible with all of them, how do users avoid confusion and frustration when trying to build the right thing for the right device?"

kilograham commented 3 years ago

one other consideration is whether we can build a "multi-flash" binary (again this would be a more than 256 byte header), but the 256 byte header would try to detect the flash type (IDK if this is possible @Wren6991) and then load a fresh boot stage 2

swetland commented 3 years ago

one other consideration is whether we can build a "multi-flash" binary (again this would be a more than 256 byte header), but the 256 byte header would try to detect the flash type (IDK if this is possible @Wren6991) and then load a fresh boot stage 2

Probing for supported parts by part number (and if that fails, as @Wren6991 suggested above maybe trying SFDP) seems doable. A "large" boot2 could presumably even just copy the rest of itself to RAM and then have plenty of room to do more complex flash device detection and configuration.

Another board-specific thing that might be nice to do earlier (at stage2) to speed up whatever comes next is clock configuration. Not just flash read speed, but CPU speed remain very low until clock init happens a little ways into runtime init (unless I'm misunderstanding the boot ROM).

Wren6991 commented 3 years ago

A "large" boot2 could presumably even just copy the rest of itself to RAM

That makes sense, we wanted to avoid stomping on too much memory unconditionally in the bootrom, but you can loosen your belt a bit in a flash-resident bootloader.

Another board-specific thing that might be nice to do earlier (at stage2) to speed up whatever comes next is clock configuration

Yes currently you run on the ringosc until post-crt0, which slows down the .data copy-down considerably. There are a lot of issues around the initial clock setup, e.g. we flat out assume that a crystal is present, whereas you may want to just bump up the ringosc frequency. Not convinced yet it needs to be boot2, but definitely before data_cpy_loop

There was also suggestion in #6 about rewriting boot2 in C, which seems pretty fair.

I'm still not sure how we make the toolchain play nicely with the board. If a board comes flashed with a boot2+, and you build a self-contained image, that is immediately stomped on (though presumably you would write-protect that sector in the board manufacture/test environment). On the other hand, if a board does not have a boot2+, and you build a binary that expects one, you also have a bad day.

tannewt commented 3 years ago

One could also imagine providing some table of board hardware info, though before long that spirals out into some madness like devicetree or ACPI, so maybe simpler is better.

I've started a TOML "database" of flash settings here: https://github.com/adafruit/nvm.toml It's meant to be a human and machine readable source for flash info. It allows you to write code to generate the particular form of flash settings that you need for a given platform.

CircuitPython supports a number of platforms that support flash chips in different ways. This database allows us to centralize the config settings. My work-in-progress branch for it is here: https://github.com/adafruit/circuitpython/compare/main...tannewt:rp2040_flash

There was also suggestion in #6 about rewriting boot2 in C, which seems pretty fair.

I've done this and have a jinja templated version that handles both 0x03 setup and quad setup. Will be testing it this afternoon: https://github.com/adafruit/circuitpython/compare/main...tannewt:rp2040_flash#diff-444a0d4fe61a801ad3d5a4747537a27029db4fa7ef1090fa96f4a6d2f0fe9d92