raspberrypi / pico-sdk

BSD 3-Clause "New" or "Revised" License
3.68k stars 917 forks source link

can't reboot into code in RAM with watchdog_reboot() #1548

Closed jmswu closed 4 months ago

jmswu commented 11 months ago

I got my code running from RAM. But I cannot reboot into RAM inside the code running from RAM. Is this a bug? Or me using the sdk wrong?

I have asked the same question, other user seem to have the same problem: link

Here is the code I use, I can see with gdb the 4 watchdog scratch reg are set to correct values

            const auto dummy = save_and_disable_interrupts();
            (void)dummy;

            watchdog_reboot(0x2000'0000, 0x2004'2000, 0);
            while (1);

Also trying using my beginner level asm to setup the pc and sp to the correct addr, still doesn't work.

    // put pc on r0
    asm volatile("mov r0, #0x20");
    asm volatile("lsl r0, #0x18");
    asm volatile("add r0, #0x01");

    // put sp on r1
    asm volatile("mov r1, #0x20");
    asm volatile("lsl r1, #0x08");
    asm volatile("add r1, #0x04");
    asm volatile("lsl r1, #0x08");
    asm volatile("add r1, #0x1f"); // or 0x20
    asm volatile("lsl r1, #0x08");

    asm volatile("msr msp, r1");
    asm volatile("blx r0");

Here is my stack trace with my assembly code:

Resetting target with halt
Successfully halted device on reset
--Type <RET> for more, q to quit, c to continue without paging--
0x40004000:     0x00000000
0xe000e010:     0x00000000
0xe000e180:     0x00000000
0xe000e280:     0x00018000
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
isr_hardfault () at /home/jms/projects/id_6045_pico/pico-sdk/src/rp2_common/pico_standard_link/crt0.S:98
98      decl_isr_bkpt isr_hardfault
(gdb) bt
#0  isr_hardfault () at /home/jms/projects/id_6045_pico/pico-sdk/src/rp2_common/pico_standard_link/crt0.S:98
#1  <signal handler called>
#2  0x00000000 in ?? ()
#3  0x200036b4 in __aeabi_double_init () at /home/jms/projects/id_6045_pico/pico-sdk/src/rp2_common/pico_double/double_init_rom.c:51
#4  0x20001d38 in runtime_init () at /home/jms/projects/id_6045_pico/pico-sdk/src/rp2_common/pico_runtime/runtime.c:105
#5  0x20000028 in platform_entry () at /home/jms/projects/id_6045_pico/pico-sdk/src/rp2_common/pico_standard_link/crt0.S:258
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) p/a *(uint32_t[8] *)$msp
$1 = {0x20005280 <sd_table>, 0x24c, 0x80, 0x0, 0x34000040, 0x200036b5 <__aeabi_double_init+88>, 0x0, 0x20000000 <_entry_point>}
(gdb) 

Here is the point to enter the rom functions and envently hit the hardfault:

(gdb) c
Continuing.
warning: Breakpoint 1 address previously adjusted from 0x200036b5 to 0x200036b4.

Breakpoint 1, __aeabi_double_init () at /home/jms/projects/id_6045_pico/pico-sdk/src/rp2_common/pico_double/double_init_rom.c:52
52              if (rom_version == 2) {
(gdb) s
64          if (rom_version < 3) {
(gdb) s
69          sf_clz_func = rom_func_lookup(ROM_FUNC_CLZ32);
(gdb) s
rom_func_lookup (code=code@entry=13132) at /home/jms/projects/id_6045_pico/pico-sdk/src/rp2_common/pico_bootrom/bootrom.c:12
12          return rom_func_lookup_inline(code);
andygpz11 commented 11 months ago

@jmswu How are you building you RAM image please? Please could you share your linker file? And, how are you loading that image into RAM? I'm also wondering where the Arm's VTOR register gets set-up. That must point to the start of the image's vector table before any interrupts are used. I will look. Thanks.

jmswu commented 11 months ago

Hi @andygpz11 I am building my code as per pico-sdk instruction, I add set(PICO_NO_FLASH ON) on the CMakeLists.txt file. I have not changed the linker script, I think with the above CMake flag. I should be suing this linker script from the pico-sdk ./pico-sdk/src/rp2_common/pico_standard_link/memmap_no_flash.ld

On my application, host MCU connect to the rp2040's SWD pins. My host MCU then act as CMSISI-DAP and load the firmware into the rp2040 every boot. Here is how my main MCU load the code:

When debugging, I am using GDB to load the .elf file with the load command. Here is how I do it:

I do find using gdb is not as reliable as using my host MCU's CMSIS-DAP interface, or the picotool over usb.

Per VTOR for rp2040 compile for RAM, the VTOR is at 0x2000'0100. It seem that the there is some bootcode running before hitting the reset vector, here is my disassembled code:

...
20000100 <__VECTOR_TABLE>:
20000100:   20042000    .word   0x20042000
20000104:   2000000f    .word   0x2000000f
20000108:   200001c3    .word   0x200001c3
2000010c:   200001c5    .word   0x200001c5
20000110:   200001c1    .word   0x200001c1
20000114:   200001c1    .word   0x200001c1
20000118:   200001c1    .word   0x200001c1
2000011c:   200001c1    .word   0x200001c1
20000120:   200001c1    .word   0x200001c1
...

My pico-sdk is:

pico-sdk git:(05d41fd) git status
HEAD detached from 1.5.1

Compiler version:

arm-none-eabi-c++ --version
arm-none-eabi-c++ (15:9-2019-q4-0ubuntu1) 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599]
adamgreen commented 11 months ago

I was able to reproduce this problem in the debugger and put what I learned in the original forum post here: https://forums.raspberrypi.com/viewtopic.php?t=359490#p2157263

The Boot ROM does appear to try and do the right thing and start executing the code in RAM by setting the PC and SP to the correct values but it triggers a hard fault as soon as it tries to execute the first instruction from RAM. I see there is code in the Boot ROM, enable_clocks, to re-enables the clocks for WDT reboots. Even though enable_clocks is executed before check_wdog maybe the clock for SRAM0 hasn't stabilized yet and leads to this hard fault?

daveythacher commented 10 months ago

What is the use case for this?

Edit: Is the watchdog_reboot not for exception handling or calling into flash boot? Part of me wanted to believe it was for a bootloader, but I am not sure that is a justified premise.

jmswu commented 10 months ago

@daveythacher So my app running in the RAM can quickly reboot without reloading

jmswu commented 10 months ago

Sorry press the wrong button and close this accidentally.

jmswu commented 10 months ago

When I call the watchdog_reboot() api, what are you trying to get to?Sent from my iPhoneOn 19/11/2023, at 1:29 PM, David Thacher @.***> wrote: What causes it to need to reboot?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you modified the open/close state.Message ID: @.***>

andygpz11 commented 10 months ago

@jmswu I think your problem is that running object code copied to RAM by a Pico-SDK program resident in Flash is not the same as building an image specifically to be located to RAM.

It might sound like the same thing but I believe that the CRT code that initialises the .data section of the program to be run does not itself get copied to RAM.

So SDK built code can run be from RAM once but it cannot be successfully re- run in RAM. because it is not actually complete.

I'm not quite sure what this RAM re-boot buys you anyway?

jmswu commented 10 months ago

@andygpz11 Thanks for the reply. What is the best way to fix this? If I understand correctly, I could build the image as normal (not setting set(PICO_NO_FLASH ON) ), but change the linker script to put the code into the RAM region, so I will have a full image, do you think this will work?

daveythacher commented 10 months ago

All it takes is one bad write and this is done for.

Overall I would ask documentation to clarify the nature of watchdog_reboot's intended purpose. I do not believe this request is consistent with this. I have seen this a few times on the forums.

Edit: Section 2.8.1.1 of Datasheet:

Watchdog boot allows users to install their own boot handler, and divert control away from the main boot sequence on non-POR/BOR resets. It also simplifies running code over the JTAG test interface. It recognises the following values written to the watchdog’s upper scratch registers:

kilograham commented 4 months ago

Closing this in favor of the forum thread; if there is a specific issue/bug then please re-open a new issue