gambrose commented 2 years ago

Is it possible to use this hal to read/write to the flash storage on the pico? I would like to use the flash space, which is not my program, to store and retrieve data.

WeirdConstructor commented 2 years ago

Same question crossed my mind recently. That is why I looked into using an SD card with my Raspberry Pi Pico board - for stuff I would usually put into the EEPROM of my Arduino.

Storing stuff in flash has it's limited write cycles, but for occasionally storing user settings this seems fine (in contrast to logging data, which could also be implemented with some care).

The main downside is of course, that every firmware update could potentially move the first unused byte offset of the flash and thus requires some more care for keeping settings across firmware updates.

I would love some control and reserve the first 1MB of the flash for program data and the remaining space for storing data.

gambrose commented 2 years ago

As far a can see, you can use the memory.x to have some control over the memory layout.

I was planning on using a tiny board which has 8MB of flash so if I can restrict the size of the program to say 2MB it would leave plenty of room for data.

I would prefer to restrict the number of components I use and I don't need that many write cycles so using the flash seems like the optimal solution.

thejpster commented 2 years ago

To read from flash, you can just core::ptr::volatile_read() the appropriate address of the appropriate type (e.g. *const u32). The start of flash is at 0x1000_0000.

The HAL currently doesn't have support for writing or erasing flash. You could look at the pico-sdk for inspiration, but I believe the steps would be:

Jump to a function in RAM
Disable interrupts
Disable the XIP engine
Send the appropriate flash write commands over QSPI (they vary depending on the chip and its size)
Flush the XIP caches
Re-enable the XIP
Re-enable interrupts

An alternative to disabling all interrupts would be to re-enable during the (relatively expensive) erase cycle, by temporarily replacing the interrupt vector table with a copy in RAM where every vector:

Suspends the erase operation
Flushes the XIP cache
Re-enables XIP
Jumps to the original IRQ handler
Disables XIP
Resumes the erase operation

This is assuming that QSPI flash chips can suspend erase operations - I haven't checked, but I know that parallel NOR flash chips can, and I've seen this approached used on other MCUs that have NOR flash (albeit only on a chip that had only two IRQ handlers).

Yes, you would want to change memory.x to ensure that at least part of the flash chip is unaffected by programming your application, and is guaranteed to not contain program data. I guess in theory you could read/modify/write the currently running program, but that's probably not advisable.

thejpster commented 2 years ago

Here's a pico-sdk example written in C: https://github.com/raspberrypi/pico-examples/blob/master/flash/program/flash_program.c

thejpster commented 2 years ago

Oh, hey, there's even ROM funcs to do most of the work: https://github.com/raspberrypi/pico-sdk/blob/2062372d203b372849d573f252cf7c6dc2800c0a/src/rp2_common/hardware_flash/flash.c

9names commented 2 years ago

Yeah, they still have a function in RAM for coordinating the whole thing though. Also note it's pretty much impossible to do this safely while using both cores unless we have some way of ensuring that the second core is parked. Ditto for DMA accessing flash.

gambrose commented 2 years ago

Thanks, I have had a look at what you linked. If I understand correctly, I can use something like this

unsafe {
   hal::rom_data::flash_range_program(addr, data.as_ptr(), data.len());
}

to write a data u8 array (with a length which is multiple of 256) to location addr, aligned to a 256-byte boundary.

That function should handle the XIP and cache flushing. I would still be responsible for disabling interrupts and not using two cores.

jannic commented 2 years ago

hal::rom_data::flash_range_program(addr, data.as_ptr(), data.len());

That function should handle the XIP and cache flushing. I would still be responsible for disabling interrupts and not using two cores.

That's not enough. See the datasheet, section 2.8.3.1.3. Flash Access Functions. You need to call more than one function, and: "Note that, in between the first and last calls in this sequence, the SSI is not in a state where it can handle XIP accesses, so the code that calls the intervening functions must be located in SRAM. The SDK hardware_flash library hides these details."

gambrose commented 2 years ago

Thanks, I was getting confused thinking I was calling this function rather than the actual function in the ROM.

So I would need something more like this;

unsafe {
    let connect_internal_flash = hal::rom_data::connect_internal_flash;
    let flash_exit_xip = hal::rom_data::flash_exit_xip;
    let flash_range_program = hal::rom_data::flash_range_program;
    let flash_flush_cache = hal::rom_data::flash_flush_cache;
    let flash_enter_cmd_xip = hal::rom_data::flash_enter_cmd_xip;

    connect_internal_flash();
    flash_exit_xip();
    flash_range_program(addr, data.as_ptr(), data.len());
    flash_flush_cache();
    flash_enter_cmd_xip();
}

The datasheet says that I should avoid calling flash_enter_cmd_xip as it is very slow and should instead call into the flash second stage but that looks to be board specific as it depends on the flash chip. I think I need to do some more reading.

thejpster commented 2 years ago

Yes that's why there's multiple boot2 binaries. Their job is to enable high speed read mode and XIP.

9names commented 2 years ago

I spent a little bit of time reading how the pico-sdk handles the second core during flash writes. They hook the SIO interrupt up to RAM function! On receipt of a magic lockout value they disable interrupt, write the magic value back to the sender over the FIFO to let them know they're blocked, then they loop until the they receive an unlock message or a timeout occurs. Pretty clever! https://github.com/raspberrypi/pico-sdk/blob/2062372d203b372849d573f252cf7c6dc2800c0a/src/rp2_common/pico_multicore/multicore.c#L171

thejpster commented 2 years ago

scratches head

Wait, how does this work? I don't get it.

jannic commented 2 years ago

Wait, how does this work? I don't get it.

Does https://raspberrypi.github.io/pico-sdk-doxygen/group__multicore__lockout.html help?

thejpster commented 2 years ago

Oh, ok. So Core B hooks the SIO FIFO interrupt with a ram func, and when Core A wants to enter a critical section it writes to the FIFO, which triggers an interrupt on Core B. The IRQ handler on Core B pops a reply in the FIFO and spins until it gets an "all clear" at which point the interrupt ends and Core B resumes what it was doing.

The bit I was missing was that A can trigger and interrupt on B with a FIFO write. Got it!

thejpster commented 2 years ago

Also, that is totally going to knock out the video on a Neotron. Note to self - the screen will go blank during a self-update of the firmware!

jannic commented 2 years ago

You could replace the lockout function on the second core with something providing video, as long as it runs from RAM. Instead of busy looping, waiting for the release message.

thejpster commented 2 years ago

Do we have a good mechanism for ensuring an entire call stack is in RAM, and not just the top function?

riskable commented 2 years ago

Just as an FYI: I tried putting something together and it runs without hanging/panicking but it doesn't actually seem to write anything:

pub const BLOCK_SIZE: u32 = 65536;
pub const SECTOR_SIZE: usize = 4096;
pub const PAGE_SIZE: u32 = 256;
pub const SECTOR_ERASE: u8 = 0x20;
pub const BLOCK32_ERASE: u8 = 0x52;
pub const BLOCK64_ERASE: u8 = 0xD8;
pub const FLASH_START: u32 = 0x1000_0000;
pub const FLASH_END: u32 = 0x1020_0000; // It's a 2MByte flash chip

#[inline(never)]
#[link_section = ".data.ram_func"]
fn write_flash() {
    // Temp hard-coded locations for testing purposes:
    let addr = FLASH_END - 4096;
    let encoded: [u8; 4] = 22_u32.to_le_bytes(); // Just a test
    let mut buf = [200; 4096];
    buf[0] = encoded[0];
    buf[1] = encoded[1];
    buf[2] = encoded[2];
    buf[3] = encoded[3];
    unsafe {
        cortex_m::interrupt::free(|_cs| {
            rom_data::connect_internal_flash();
            rom_data::flash_exit_xip();
            rom_data::flash_range_erase(addr, SECTOR_SIZE, BLOCK_SIZE, SECTOR_ERASE);
            rom_data::flash_range_program(addr, buf.as_ptr(), buf.len());
            rom_data::flash_flush_cache(); // Get the XIP working again
            rom_data::flash_enter_cmd_xip(); // Start XIP back up
        });
    }
    defmt::println!("write_flash() Complete"); // TEMP
}

#[inline(never)]
#[link_section = ".data.ram_func"]
fn read_flash() -> &'static mut [u8] {
    // Temp hard-coded locations for testing purposes:
    let addr = (FLASH_END - 4096) as *mut u8;
    let my_slice = unsafe { slice::from_raw_parts_mut(addr, 256) };
    my_slice
}

When I was fooling around I swear I got it to write stuff but that was back when it was crashing/hanging like crazy. Now I can't seem to get it to write any data at all. I even verified by dumping the entire flash to a file using picotool (doesn't seem to be writing out my little buf data).

thejpster commented 2 years ago

Are you sure the rom_data::X stuff is inlined? Maybe grab all the function pointers first, then use those inside the critical section.

thejpster commented 2 years ago

Also, the read_flash function doesn't need to be in RAM.

thejpster commented 2 years ago

Sorry, me again:

flash_range_erase(addr, SECTOR_SIZE, BLOCK_SIZE, SECTOR_ERASE);

Is that right? I think you're telling ROM there's a special way to erase BLOCK_SIZE bytes at once, which is to use the SECTOR_ERASE command? Pretty sure SECTOR_ERASE is only going to erase SECTOR_SIZE bytes, which is the default. Also, a block erase not on a block boundary is not going to work.

riskable commented 2 years ago

I've got it working! My problem was that when you use rp2040-hal::rom_data::flash_range_*() functions it expects the address space to start at 0x0000_0000 but if you want to read that data in using something like slice::from_raw_parts_mut() you have to use 0x1000_0000 (aka "XIP base"). Man that was confusing! Wish the docs were more clear about that. Actually, just a working example would be great haha.

Anyway, here's the code that works:

pub const BLOCK_SIZE: u32 = 65536;
pub const SECTOR_SIZE: usize = 4096;
pub const PAGE_SIZE: u32 = 256;
// These _ERASE commands are highly dependent on the flash chip you're using
pub const SECTOR_ERASE: u8 = 0x20; // Tested and works with W25Q16JV flash chip
pub const BLOCK32_ERASE: u8 = 0x52;
pub const BLOCK64_ERASE: u8 = 0xD8;
/* IMPORTANT NOTE ABOUT RP2040 FLASH SPACE ADDRESSES:
When you pass an `addr` to a `rp2040-hal::rom_data` function it wants
addresses that start at `0x0000_0000`. However, when you want to read
that data back using something like `slice::from_raw_parts()` you
need the address space to start at `0x1000_0000` (aka `FLASH_XIP_BASE`).
*/
pub const FLASH_XIP_BASE: u32 = 0x1000_0000;
pub const FLASH_START: u32 = 0x0000_0000;
pub const FLASH_END: u32 = 0x0020_0000;
pub const FLASH_USER_SIZE: u32 = 4096; // Amount dedicated to user prefs/stuff

#[inline(never)]
#[link_section = ".data.ram_func"]
fn write_flash(data: &[u8]) {
    let addr = FLASH_END - FLASH_USER_SIZE;
    unsafe {
        cortex_m::interrupt::free(|_cs| {
            rom_data::connect_internal_flash();
            rom_data::flash_exit_xip();
            rom_data::flash_range_erase(addr, SECTOR_SIZE, BLOCK_SIZE, SECTOR_ERASE);
            rom_data::flash_range_program(addr, data.as_ptr(), data.len());
            rom_data::flash_flush_cache(); // Get the XIP working again
            rom_data::flash_enter_cmd_xip(); // Start XIP back up
        });
    }
    defmt::println!("write_flash() Complete"); // TEMP
}

fn read_flash() -> &'static mut [u8] {
    let addr = (FLASH_XIP_BASE + FLASH_END - FLASH_USER_SIZE) as *mut u8;
    let my_slice = unsafe { slice::from_raw_parts_mut(addr, FLASH_USER_SIZE as usize) };
    my_slice
}

...and here's the code I was using to test it out (I bound it to a keystroke on my numpad):

let data = crate::read_flash();
defmt::println!("Flash data[0]: {:?}", data[0]);
defmt::println!("Incrementing data[0] by 1...");
let mut buf = [0; 256];
if data[0] == u8::MAX {
    buf[0] = 0;
} else {
    buf[0] = data[0] + 1;
}
crate::write_flash(&buf);
let data2 = crate::read_flash();
defmt::println!("Flash data[0]: {:?}", data2[0]);

The output of which looks like this:

Flash data[0]: 137
Incrementing data[0] by 1...
write_flash() Complete
Flash data[0]: 138

...and I confirmed that the data survives reboots/power cycle (so it wasn't just a trick of optimization). Speaking of optimization, I had a lot of trouble trying to get this to work until I specified lto = 'fat' in my Cargo.toml:

[profile.release]
codegen-units = 1
debug = 2
debug-assertions = false
incremental = false
lto = 'fat' # <-- HERE
opt-level = 3
overflow-checks = false

However, to be thorough I just tested all the lto options:

lto = 'thin': Works
lto = false: Causes hang
lto = true: Works (it's the same as 'fat')
lto = 'off': Causes hang

Note that you can put defmt::println!() calls inside of write_flash() but not if you use format strings. So printing static text like, "foo" would work fine but trying to print out a variable, "foo {:?}" would cause it to hang indefinitely.

Other notes:

I'm using RTIC and my keystroke-bound function is actually calling write_flash() from within a spawn_at() call (and I'm using a monotonic timer a la rp2040-monotonic). When I first started I was getting panics until I put #[link_section = ".data.ram_func"] in front of all the dispatchers and the function that calls write_flash() but now that I've worked everything out that doesn't seem to be necessary (I've since removed those lines that force functions into RAM).
I'm using PIO (ws2812-pio) in the background while these write_flash() calls are taking place and it doesn't seem to be bothered. Not getting any flickering or anything like that either (nice and smooth 👍)

riskable commented 2 years ago

Sorry, me again:
flash_range_erase(addr, SECTOR_SIZE, BLOCK_SIZE, SECTOR_ERASE);
Is that right? I think you're telling ROM there's a special way to erase BLOCK_SIZE bytes at once, which is to use the SECTOR_ERASE command? Pretty sure SECTOR_ERASE is only going to erase SECTOR_SIZE bytes, which is the default. Also, a block erase not on a block boundary is not going to work.

Well you have to pass something as the 3rd and 4th argument and that's what worked :shrug: . Don't assume I know what I'm doing haha.

jannic commented 2 years ago

Well you have to pass something as the 3rd and 4th argument and that's what worked shrug . Don't assume I know what I'm doing haha.

The comment in the bootrom source code explains those parameters:

// block_size must be a power of 2.
// Generally block_size > 4k, and block_cmd is some command which erases a block
// of this size. This accelerates erase speed.
// To use sector-erase only, set block_size to some value larger than flash,
// e.g. 1ul << 31.
// To override the default 20h erase cmd, set block_size == 4k.
void __noinline flash_range_erase(uint32_t addr, size_t count, uint32_t block_size, uint8_t block_cmd) {

MathiasKoch commented 2 years ago

Perhaps it would be possible to add abstrations based on https://github.com/rust-embedded-community/embedded-storage for this, to make it a bit easier for everyone to use?

jannic commented 2 years ago

I am working on some functions which cover the 'needs to run from RAM' requirement: https://github.com/jannic/rp2040-flash/ Just a work in progress, and still missing documentation. But perhaps it's already useful?

afaber999 commented 2 years ago

lto = 'thin': Works

lto = false: Causes hang

lto = true: Works (it's the same as 'fat')

lto = 'off': Causes hang

I think the issue is that all functions have to be executed from RAM, however, the HAL definitions for the rom_table_lookup can be compiled into flash (since there is no #[inline(always)] depending on your optimization settings (same holds for the rom_hword_as_ptr function).

when I compile the following code snippet:

[inline(never)]

[link_section = ".data.ram_func"]

fn flash_experiment( ) {

unsafe {
    flash_enter_cmd_xip();
}

}

wih lto-='off' causes hang since it the code in RAM is jumping to code in flash:

20000000 <__sdata>: 20000000: 80 b5 push {r7, lr} 20000002: 00 af add r7, sp, #0 20000004: 00 f0 02 f8 bl 0x2000000c <__Thumbv6MABSLongThunk_rp2040_hal::rom_data::flash_enter_cmd_xip::he084f9a4ab71acef> @ imm = #4 20000008: 80 bd pop {r7, pc} 2000000a: d4 d4 bmi 0x1fffffb6 <__veneer_limit+0xfff8c76> @ imm = #-88

2000000c <__Thumbv6MABSLongThunk_rp2040_hal::rom_data::flash_enter_cmd_xip::he084f9a4ab71acef>: 2000000c: 03 b4 push {r0, r1} 2000000e: 01 48 ldr r0, [pc, #4] @ 0x20000014 <$d> 20000010: 01 90 str r0, [sp, #4] 20000012: 01 bd pop {r0, pc}

20000014 <$d>: 20000014: b1 3d 00 10 .word 0x10003db1

with lto='thin', no jump to FLASH memory

20000000 <__sdata>: 20000000: 80 b5 push {r7, lr} 20000002: 00 af add r7, sp, #0 20000004: 18 20 movs r0, #24 20000006: 02 88 ldrh r2, [r0] 20000008: 14 20 movs r0, #20 2000000a: 00 88 ldrh r0, [r0] 2000000c: 01 49 ldr r1, [pc, #4] @ 0x20000014 <$d.24> 2000000e: 90 47 blx r2 20000010: 80 47 blx r0 20000012: 80 bd pop {r7, pc}

20000014 <$d.24>: 20000014: 43 58 00 00 .word 0x00005843

So I think this has to be fixed in the HAL by adding the #[inline(never)] option

jannic commented 2 years ago

I don't think inline attributes can guarantee that no flash accesses are inserted by the compiler. Rust currently just doesn't provide a way to say "this function must be in RAM and must not depend on any other memory section". Of course #[inline(always)] may work (it usually does) - but there is no guarantee that it always will.

That's why I implemented the relevant parts in assembly: https://github.com/jannic/rp2040-flash/blob/master/src/lib.rs#L189

werediver commented 2 years ago

@jannic On the higher-level API of your rp2040-flash crate (using it in an ongoing project; appreciate your work). I wrote a slightly more ergonomic contraption based on your original example and can prepare a pull request with either an example update or to include the suggested interface into the crate, if you find it suitable. Open to discuss improvements you may find necessary.

https://github.com/werediver/escale/blob/b0fb37f120edd2cc3f8145f326f218a94ad06d69/escale_fw_rs/app/src/flash.rs

jannic commented 2 years ago

Hi @werediver, as I didn't use the rp2040-flash library in a real application context yet, any feedback on it's usability is very welcome! Pull requests, ideas, anything. We are just discussing the topic on the matrix channel, https://matrix.to/#/#rp-rs:matrix.org. (start of discussion), join in if you like.

rp-rs / rp-hal

Reading/writing to flash #257

[inline(never)]

[link_section = ".data.ram_func"]