riscv-collab / riscv-openocd

Fork of OpenOCD that has RISC-V support
Other
437 stars 319 forks source link

Avoid resetting the target when loading an elf #662

Closed fcuzzocrea closed 3 months ago

fcuzzocrea commented 2 years ago

I am not sure where asking this, but I am putting it here with the hope of getting some sort of guidance.

I am using the Microsemi polarfire soc, to be more accurate, I am using the Microchip Polarfire ICICLE Kit and the version of openocd which comes with the latest version of Softconsole.

Versions Microchip SoftConsole version is v2021.1-6.6.0.507, while openocd version is 0.10.0+dev-00859-g95a8cd9b5-dirty (2020-10-21-21:16)

Problem What I would like to do is to be able to attach to a running program without resetting the board in order to load an elf into the memory (L2LIM) and start executing it.

In the normal use case scenario what I do is this:

Sadly however, when openocd connects it resets the soc, so I lost the initialization done by the bootloader, and when the app start executing I end up trapped in here

By default the microsemi-riscv.cfg file is doing

proc do_board_reset_init {}

what I have tried so far is to modify this file and instead of the aforementioned line I putted:

reset_config none

but neither this worked.

The command I am using with gdb to load the elf are:

    set arch riscv:rv64
    set mem inaccessible-by-default off
    target extended-remote localhost:3333
    monitor halt
    load
    monitor resume
    monitor shutdown
    quit

Is that behavior intended (so there is not way to attach openocd to a running program on the board without resetting the board itself) or am I doing something wrong?

JanMatCodasip commented 2 years ago

Hi @fcuzzocrea, OpenOCD should be able to accomplish the connection to the target without reset.

AFAIK, OpenOCD does not trigger reset of the target unless you explicitly instruct it to do so.

Please double-check all your OpenOCD configuration files whether they contain reset [halt|run|init] command (which would reset the CPU target) or adapter assert|deassert [...] which may trigger SRST signal.

Check all your config files. Or run OpenOCD with higher verbosity (-d3 on command line) and take a look at what TCL commands are executed, whether reset or adapter assert is among them.

fcuzzocrea commented 2 years ago

Ciao @JanMatCodasip! Thanks a lot for your quick reply!

I tried to check my openocd cfg files, and I found that scripts/target/microsemi-riscv.cfg there the following lines

        $_TARGETNAME_1 configure -event reset-init init_regs
        $_TARGETNAME_2 configure -event reset-init init_regs
        $_TARGETNAME_3 configure -event reset-init init_regs
        $_TARGETNAME_4 configure -event reset-init init_regs

and when running openocd with the debug mode I see the following output


Debug: 169 1 command.c:143 script_debug(): command - mpfs.hart0_e51 mpfs.hart0_e51 configure -event reset-init board_reset_init
Debug: 170 1 command.c:143 script_debug(): command - mpfs.hart0_e51 mpfs.hart0_e51 configure -event reset-init init_regs
Debug: 171 1 command.c:143 script_debug(): command - reset_config reset_config trst_only
Debug: 173 1 command.c:143 script_debug(): command - mpfs.hart0_e51 mpfs.hart0_e51 configure -event gdb-detach 
    # resume execution on debugger detach
    resume

Debug: 174 1 command.c:143 script_debug(): command - reset_config reset_config none
User : 176 1 options.c:63 configuration_output_handler(): none separate
User : 177 1 options.c:63 configuration_output_handler(): 
Info : 178 1 server.c:310 add_service(): Listening on port 6666 for tcl connections
Info : 179 1 server.c:310 add_service(): Listening on port 4444 for telnet connections
Debug: 180 1 command.c:143 script_debug(): command - init init
Debug: 182 1 command.c:143 script_debug(): command - target target init
Debug: 184 1 command.c:143 script_debug(): command - target target names
Debug: 185 1 command.c:143 script_debug(): command - mpfs.hart0_e51 mpfs.hart0_e51 cget -event gdb-flash-erase-start
Debug: 186 1 command.c:143 script_debug(): command - mpfs.hart0_e51 mpfs.hart0_e51 configure -event gdb-flash-erase-start reset init
Debug: 187 1 command.c:143 script_debug(): command - mpfs.hart0_e51 mpfs.hart0_e51 cget -event gdb-flash-write-end
Debug: 188 1 command.c:143 script_debug(): command - mpfs.hart0_e51 mpfs.hart0_e51 configure -event gdb-flash-write-end reset halt
Debug: 189 1 command.c:143 script_debug(): command - mpfs.hart0_e51 mpfs.hart0_e51 cget -event gdb-attach
Debug: 190 1 command.c:143 script_debug(): command - mpfs.hart0_e51 mpfs.hart0_e51 configure -event gdb-attach halt
Debug: 191 1 target.c:1428 handle_target_init_command(): Initializing targets...
Debug: 192 1 riscv.c:473 riscv_init_target(): riscv_init_target()
Debug: 193 1 semihosting_common.c:97 semihosting_common_init():  

However, commenting out the aforementioned lines in the cfg, does not help :( The output I see after doing the edits is this one: https://pastebin.com/BZeYx2au

JanMatCodasip commented 2 years ago

It seems you have reset commands in one of your configuration files, which will be triggered when GDB loads an application binary to flash. I wonder if that could be the issue. See the following lines from your log:

Debug: 186 1 command.c:143 script_debug(): command - mpfs.hart0_e51 mpfs.hart0_e51 configure -event gdb-flash-erase-start reset init
...
Debug: 188 1 command.c:143 script_debug(): command - mpfs.hart0_e51 mpfs.hart0_e51 configure -event gdb-flash-write-end reset halt

You can find in which .cfg file the commands are located, and change them to plain halt. Then check if it helps:

mpfs.hart0_e51 configure -event gdb-flash-erase-start halt
mpfs.hart0_e51 configure -event gdb-flash-write-end halt

Or you may try loading the application binary to the RAM memory directly via OpenOCD's load_image command (that is, without GDB).

fcuzzocrea commented 2 years ago

Thanks again for the help!

Sadly I wasn't able to workaround the thing. What I tried are the following things:

 $_TARGETNAME_0 configure -event gdb-flash-erase-start {
    # halt execution
    halt
}

$_TARGETNAME_0 configure -event gdb-flash-write-end {
    # resume execution after write
    resume
}

$_TARGETNAME_0 configure -event gdb-attach {
    # halt execution on debugger attach
    halt
}

My application does not start, but now it does not trap anymore, but it is just stucked somewhere in the code. I can say that because if I start again openocd and connect trough GDB I can see were it is stucked:

Reading symbols from build/debug/app.elf...
(gdb) set arch riscv:rv64
The target architecture is set to "riscv:rv64".
(gdb) set mem inaccessible-by-default off
(gdb) target extended-remote localhost:3333
Remote debugging using localhost:3333
0x0000000008004d78 in prvSocInfoCommand (pcWriteBuffer=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    xWriteBufferLen=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    pcCommandString=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at ../../src/cli/cmd/util_cmd.c:222
222     buf += snprintf(pcWriteBuffer + buf, xWriteBufferLen - buf,
TommyMurphyTM1234 commented 2 years ago

For what it's worth I used to work for Microchip on SoftConsole so have some knowledge of what you're working on here. However I'm a bit confused. I presume that this is the specific problem?

In the other debug use case scenario what I would like to do is this:

  • the same bootloader initializes the SoC, but instead of loading an application in ram and jumping to the exec point, sits into a while loop after the SoC has been initialized
  • with openocd I load the same elf directly into the lim trough GDB and the application starts executing

But I still find this a bit confusing.

  1. Is the while loop after the SoC has been initialized part of your basic bootloader?
  2. "With OpenOCD I load the same ELF directly into the LIM..." - I don't know what you mean by "same" here? The booloader again or something else?

Have you tried the following from SoftConsole:

  1. Configure the Icicle board to use PolarFire SoC boot mode 1 to execute your bootloader from eNVM (I am assuming that you have done this already but if not then refer to .../Microchip/SoftConsole-v2021.1/extras/mpfs/mpfs-bootmodes-readme.txt for instructions).
  2. Create a debug launch configuration for your "main" program using an existing bare metal example program as a guide
  3. In the debug launch configuration go to Startup > Initialization Commands > Initial Reset and uncheck that to prevent the debugger from doing a reset [init] after connecting - that should leave the target "undisturbed" following the execution of the bootloader.

If you do that then power cycle the Icicle board it should boot your bootloader, it will (presumably) sit in your busy while loop, you launch the SoftConsole debug session, it will NOT reset the target and then the debugger will load the program and start executing/debugging it.

(Obviously all of this can be done without using SoftConsole and using just command line tools but it might be worth trying this SoftConsole based approach first).

If that does not help/work then please clarify what exactly happens/does not work as described and if I have misunderstood anything.

Hope this helps.

fcuzzocrea commented 2 years ago

Hi @TommyMurphyTM1234 , thanks a lot for you answer!

For what regards your first question, yes, the SoC it is initialized by my basic bootloader (so to speak, I initialize the PLIC, enable irqs, initialize RTC and GPIO and SPI driver).

The bootloader it is flashed into the eNVM (with bootmode 1 - non-secure boot from eNVM) and it is executed by the eNVM.

After it starts, the bootloader can either load an application from SPI flash (by copying the application from the flash to the LIM and then jumping to the entry point of the ELF), or, if a button is pressed when the board is powered up, it sits in a busy while loop waiting for debugger to upload an ELF file directly to the L2LIM.

By same I mean that the ELF that I am trying to program with OpenOCD is the same ELF that I store into my SPI flash and gets loaded by the bootloader if no button is pressed when the board is power cycled. So just the way to load it to the LIM is different. I wrote this just to point out that in one way the ELF gets loaded correctly (when I copy it from the SPI flash to the LIM and then jump to the entry point), while in the other way (when I write it to the LIM using openocd) it does not start.

Of couse my application is built using IMAGE_LOADED_BY_BOOTLOADER=1

Anyway thanks for your suggestions! Sadly I have a custom setup to work without the need of softconsole, just using cli tools, but yeah, I will give a try to the softconsole based approach.

Let me know if you find this still confusing!

TommyMurphyTM1234 commented 2 years ago

Hi @fcuzzocrea - thanks for the reply and clarifications.

Sadly I have a custom setup to work without the need of softconsole, just using cli tools, but yeah, I will give a try to the softconsole based approach.

OK - I'm pretty sure that that should still be possible even using just command line tools and without changing the OpenOCD target/microsemi-riscv.cfg script.

Off the top of my head (so I could be missing something here)...

  1. Please ensure that any changes that you may have made to <path-to-softconsole>/openocd/share/openocd/scripts/target/microsemi-riscv.cfg have been undone first.

  2. Run the SoftConsole OpenOCD from the command line as follows:

    cd <path-to-softconsole>openocd/bin
    openocd --command "set DEVICE MPFS" --file board/microsemi-riscv.cfg
  3. Run the SoftConsole RISC-V GDB from the command line as follows:

    cd <path-to-softconsole>/riscv-unknown-elf-gcc/bin
    riscv64-unknown-elf-gdb
    (gdb) set mem inaccessible-by-default off
    (gdb) set $target_riscv=1
    (gdb) set arch riscv:rv64
    (gdb) source <path-to-softconsole>/gdbinit/softconsole.gdbinit
    (gdb) target remote localhost:3333
    (gdb) load yourprogram.elf
    (gdb) thread apply all set $pc=_start
    (gdb) tb main
    (gdb) continue

Maybe you can try the above and post back with the results? As I say I think that that should be the gist here, but I could have overlooked something and, unfortunately, don't have an Icicle board to try it out myself. The key thing here is that there is no reset of the target (e.g. monitor reset init) via GDB so the post bootloader execution state of the target should not be disturbed before your program is loaded and debugged.

Regards Tommy

fcuzzocrea commented 2 years ago

Thanks again for your help.

I tried what you suggested using an untouched openocd copy taken from softconsole:

fcuzzocrea@Latitude-5420:~$ /opt/openocd/bin/openocd --command "set DEVICE MPFS" --command "set COREID 0" --file /opt/openocd/share/openocd/scripts/board/microsemi-riscv.cfg 
xPack OpenOCD (Microchip SoftConsole build), x86_64 Open On-Chip Debugger 0.10.0+dev-00859-g95a8cd9b5-dirty (2020-10-21-21:16)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
MPFS
0
Info : only one transport option; autoselect 'jtag'
do_board_reset_init
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : Embedded FlashPro6 (revision B) found (USB_ID=1514:200b path=/dev/hidraw1)
Info : Embedded FlashPro6 (revision B) CM3 firmware version: F4.0
Info : clock speed 6000 kHz
Info : JTAG tap: mpfs.cpu tap/device found: 0x0f81a1cf (mfg: 0x0e7 (GateField), part: 0xf81a, ver: 0x0)
Info : datacount=2 progbufsize=16
Info : Disabling abstract command reads from CSRs.
Info : Examined RISC-V core; found 5 harts
Info :  hart 0: XLEN=64, misa=0x8000000000101105
Info :  hart 1: currently disabled
Info :  hart 2: currently disabled
Info :  hart 3: currently disabled
Info :  hart 4: currently disabled
Info : Listening on port 3333 for gdb connections

While from GDB

(gdb) file build/debug/c3app.elf
Reading symbols from build/debug/c3app.elf...
(gdb) set mem inaccessible-by-default off
(gdb) set $target_riscv=1
(gdb) set arch riscv:rv64
The target architecture is set to "riscv:rv64".
(gdb) source softconsole.gdbinit
(gdb) target remote localhost:3333
Remote debugging using localhost:3333
0x00000000202224e4 in ?? ()
(gdb) load build/debug/c3app.elf
Loading section .text, size 0x22230 lma 0x8000000
Loading section .sdata, size 0x70 lma 0x8022230
Loading section .data, size 0x3930 lma 0x80222a0
Loading section .sdram, size 0x1388 lma 0x8025bd0
Start address 0x0000000008000000, load size 159576
Transfer rate: 9 KB/sec, 13298 bytes/write.
(gdb) thread apply all set $pc=_start

Thread 1 (Remote target):
(gdb) tb e51
Temporary breakpoint 1 at 0x8005f4c: file ../../ext/pfsoc_platform/mpfs_hal/common/mss_plic.h, line 719.
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
trap_from_machine_mode (regs=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    dummy=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    mepc=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at ../../ext/pfsoc_platform/mpfs_hal/common/mss_mtrap.c:755
755             i++;        /* added some code as SC debugger hangs if in loop doing nothing */
(gdb) 

and then nothing happens on my UART terminal because I ending up in trap:(

TommyMurphyTM1234 commented 2 years ago

While from GDB

trap_from_machine_mode (regs=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    dummy=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    mepc=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at ../../ext/pfsoc_platform/mpfs_hal/common/mss_mtrap.c:755
755               i++;        /* added some code as SC debugger hangs if in loop doing nothing */
(gdb) 

You should not be getting those "Corrupted DWARF expression" error messages. Are you sure that you compiled the program with the SoftConsole RISC-V GCC toolchain and not some other RISC-V toolchain? Using a different compiler may result in mismatches with the SoftConsole GDB.

and then nothing happens on my UART terminal because I ending up in trap:(

Ignoring DWARF errors, it looks to me like your program is getting an exception - probably in the startup code since the temporary breakpoint at the e51() "main" function never fires - which leaves it in the trap_from_machine_mode() default trap handler. You need to debug the program to find out how/when/why that happens. E.g. at least look at the mcause CSR to see what kind of trap/exception is happening and maybe also debug the program from _start through the startup code to the point at which things go wrong.

TommyMurphyTM1234 commented 2 years ago

BTW - probably related to the DWARF messages but this looks wrong because the e51() function is almost certainly not in that header file if you are using the PolarFire SoC bare metal library code for your program:

> (gdb) tb e51
> Temporary breakpoint 1 at 0x8005f4c: file ../../ext/pfsoc_platform/mpfs_hal/common/mss_plic.h, line 719.

How exactly are you compiling your program and what toolchain are you using?

fcuzzocrea commented 2 years ago

Hi Tommy and thanks again for your help!

What leaves me very confused is that, if instead of loading the ELF directly in LIM with OpenOCD, I put it on an external SPI Flash connected to the Icicle Kit, and I read and copy the ELF from the flash into the LIM using the MSS SPI driver, and then I jump to the entry point (which I extract from the ELF header) using this funcion the application start and runs correctly (the application is linked against the LIM).

It instead I program it directly into the LIM using OpenOCD the program traps. The program is the same and it is linked against the LIM and also the bootloader is the same between the two experiments.

I would expect the same behavior of the program when loaded into the LIM, regardless the way I load it.

For what regards the toolchain, I used before what was bundled with SoftConsole, but I switched to use a self compiled version of this toolchain, which AFAIK should be the riscv official one? I prefer to self build the tools I use when possible (in that regards, would be a nice to have the sources of the OpenOCD version which is shipped with SoftConsole, as standard OpenOCD does not have support for the onboard FP6).

For the second answer, my program is compiled with the aforementioned toolchain, it is using the baremetal library (with IMAGE_LOADED_BY_BOOTLOADER 1 in the mss_sw_config.h), and it is FreeRTOS based (I am using vanilla upstream FreeRTOS 10).

The CFLAGS I am using are the one I extracted from SoftConsole projects:

  - '-fdata-sections'
  - '-ffunction-sections'
  - '-fmessage-length=0'
  - '-fsigned-char'
  - '-mabi=lp64'
  - '-march=rv64imac'
  - '-mcmodel=medany'
  - '-mno-save-restore'
  - '-msmall-data-limit=8'
  - '-mstrict-align'
  - '-mtune=sifive-5-series'
  - '-Os'
  - '-D__DYNAMIC_REENT__'
  - '-DDDR_INIT'
  - '-DMSS_CAN_USER_ISR=1'
  - '-DUSING_FREERTOS'

As well as the ASFLAGS

  - '-fdata-sections'
  - '-ffunction-sections'
  - '-fmessage-length=0'
  - '-fsigned-char'
  - '-mabi=lp64'
  - '-march=rv64imac'
  - '-mcmodel=medany'
  - '-mno-save-restore'
  - '-msmall-data-limit=8'
  - '-mstrict-align'
  - '-mtune=sifive-5-series'

and the LDFLAGS

  - '--specs=nano.specs'
  - '-mabi=lp64'
  - '-march=rv64imac'
  - '-nostartfiles'
  - '-Wl,--gc-sections'

I can give another try at the toolchain shipped with SoftConsole though

TommyMurphyTM1234 commented 2 years ago

Did you debug the code to see what exception/fault is occurring and where? That's what I would do regardless of the fact that the program runs ok on one scenario but not this one.

fcuzzocrea commented 2 years ago

Alright, will dig more into my code and report back.

Just to be sure, I can use the bundled openocd cfg files as they are ? They do not issue a reset the SoC when OpenOCD Starts ? A reset of the SoC will not be done if not explicitly requested trough GDB, right?

TommyMurphyTM1234 commented 2 years ago

Alright, will dig more into my code and report back.

You don't need to dig into the code to get the exception type. Just Ctrl-C break into the program when it's stuck in the default fault handler and check the mcause CSR (p/x $mcause) to see what it is. And maybe the mepc CSR to see at what PC/instruction the trap occurred.

That should shed some light on the problem. But if it's still not obvious what's happening then you can single step debug from _start rather than continue after loading the program to see exactly where things go wrong.

Just to be sure, I can use the bundled openocd cfg files as they are ?

Correct.

They do not issue a reset the SoC when OpenOCD Starts ? A reset of the SoC will not be done if not explicitly requested trough GDB, right?

Correct. A target reset will only occur if you explicitly do monitor reset from GDB.

fcuzzocrea commented 2 years ago

I tried to follow your suggestion, putting a breakpoint at start but I am even more confused , it seems it traps right after reset_vector :(

That is what I did:

(gdb) file build/debug/c3app.elf
Reading symbols from build/debug/c3app.elf...
(gdb) set mem inaccessible-by-default off
(gdb) set $target_riscv=1
(gdb) set arch riscv:rv64
The target architecture is set to "riscv:rv64".
(gdb) target remote localhost:3333
0x0000000020222358 in ?? ()
Loading section .text, size 0x22230 lma 0x8000000
Loading section .sdata, size 0x70 lma 0x8022230
Loading section .data, size 0x3930 lma 0x80222a0
Loading section .sdram, size 0x1388 lma 0x8025bd0
Start address 0x0000000008000000, load size 159576
Transfer rate: 9 KB/sec, 13298 bytes/write.
(gdb) thread apply all set $pc=_start

Thread 1 (Remote target):
(gdb) 

Thread 1 (Remote target):
(gdb) b _start
Breakpoint 1 at 0x8000008
(gdb) step
Single stepping until exit from function reset_vector,
which has no line number information.

Breakpoint 1, 0x0000000008000008 in reset_vector ()
(gdb) step
Single stepping until exit from function reset_vector,
which has no line number information.
0x00000000080000ac in trap_vector ()
(gdb) step
Single stepping until exit from function trap_vector,
which has no line number information.
trap_from_machine_mode (regs=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    dummy=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    mepc=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at ../../ext/pfsoc_platform/mpfs_hal/common/mss_mtrap.c:731
731     volatile uintptr_t mcause = read_csr(mcause);
(gdb) p/x $mcause
$1 = 0x2
(gdb) p/x $mepc  
$2 = 0x8000018
(gdb) p/x $mstatus
$3 = 0x200001880

mcause should be illegal istruction?

TommyMurphyTM1234 commented 2 years ago

You do not need to put a breakpoint at _start. After loading the program and using thread apply all set $pc=_start the program is ready to run and you should use si (single instruction stepping: https://sourceware.org/gdb/download/onlinedocs/gdb/Continuing-and-Stepping.html) rather than C code line stepping.

However, again your logs suggest that your program is compiled and linked in a way that the debugging symbolic information may not be correct so any debugging is going to be difficult until you sort that out.

As per the RISC-V Privileged Specification (https://riscv.org/technical/specifications/) $mcause == 0x2 means illegal instruction. And $mepc == 0x8000018 is the program counter at which the offending instruction resides.

Have you manually checked the contents of LIM (at least the start addresses) against the list file for your program to see if there's any mismatch?

Or at least do disasm 0x08000000 to see what the start of LIM disassembles as.

fcuzzocrea commented 2 years ago

No, actually I didn't checked the content of the LIM to check if there is what I am expecting (I used to do this with Lauterbach Trace32, GDB is very new for me so I need to learn how to do it properly)

TommyMurphyTM1234 commented 2 years ago

Normally you could do compare-sections in GDB to check the contents of memory against your local ELF file but (a) my recollection is that the Microchip OpenOCD script is not set up for this to work (no RAM work area defined which may be required) and (b) if your debug info is incorrect then such a comparison will almost certainly fail.

TommyMurphyTM1234 commented 2 years ago

No, actually I didn't checked the content of the LIM to check if there is what I am expecting (I used to do this with Lauterbach Trace32, GDB is very new for me so I need to learn how to do it properly)

Try disasm 0x08000000 and see that it gives and if it matches the list file for your program.

TommyMurphyTM1234 commented 2 years ago

One thing that I noticed:

Start address 0x0000000008000000, load size 159576 ... (gdb) b _start Breakpoint 1 at 0x8000008

Is this correct? I.e. is your program actually linked so that _start is at 0x08000008 and not the start of LIM which is 0x08000000? This doesn't match with the start address displayed when loading the program.

What happens if, instead of thread apply all set $pc = _start you do thread apply all set $pc = 0x08000000 and then continue?

fcuzzocrea commented 2 years ago

Hi Tommy, sorry for the late reply.

On friday I did some more debugging using Lauterbach debugging tools, and I think that the issue could be related to the way I am preparing the board to do the debugger ELF loading.

In particular I found that programming the ELF into the LIM using Lauterbach led me to the same result I had when doing it with OpenOCD (ending up trapped). However, if with Lauterbach I put a breakpoint at this line:

https://github.com/polarfire-soc/polarfire-soc-bare-metal-examples/blob/3e45221cb287978a35213d7687ab050861e4bd9a/driver-examples/mss/mpfs-hal/mpfs-hal-ddr-demo/src/application/hart0/e51.c#L175

And then I load the ELF, then the application starts correctly, so I presume that I am doing something wrong in preparing the board to accept the loading of a program trough OpenOCD at this point. I tried searching trough the HSS code, but I don't really understand where they are implementing the logic for allowing HSS execute an ELF built with IMAGE_LOADED_BY_BOOTLOADER 1.

For reference, the code which I am using to implement my while loop (shamefully copied from the jump_to_application_example) is this one:

void wait_for_debugger(HLS_DATA* hls)
{
    /* Store current hardid */
    uint32_t hartid = read_csr(mhartid);

    /* Restore PLIC to known state */
    __disable_irq();
    PLIC_init();

    /* Disable all interrupts: */
    write_csr(mie, 0);

    while (true) {
        static volatile uint64_t counter = 0U;
        /* Added some code as debugger hangs if in loop doing nothing */
        counter = counter + 1U;
    }

    register unsigned long a0 asm("a0") = hartid;
    register unsigned long a1 asm("a1") = (unsigned long)hls;
    __asm__ __volatile__("mret" : : "r"(a0), "r"(a1));
    __builtin_unreachable();
}

For what concerns your questions - running the disasm confirmed that I actually put in LIM the code I compile.

For what concerns the _start breakpoint, this leave me very confused, using nm on the ELF gives me:

0000000008000000 T _start
0000000008000000 t _start_non_bootloader_image

At address 0x8000008 objdump tells me that I have:

 0000000008000000 <_start>:
 8000000:       00000717                auipc   a4,0x0
 8000004:       0ac70713                addi    a4,a4,172 # 80000ac <trap_vector>
 **8000008:       30571073                csrw    mtvec,a4**
 800000c:       305027f3                csrr    a5,mtvec
 8000010:       fef71ee3                bne     a4,a5,800000c <_start+0xc>
 8000014:       00050663                beqz    a0,8000020 <_start+0x20>

Which matches what mss_entry.S is doing here.

So, at the end, I believe that probably my while loop isn't enough the put the board into a state to accept the execution of a file programmed using OpenOCD.

A simple way to workaround this could be, I think, to load trough the jump_to_application() function a simple ELF written in assembly like that:

_start:
    nop
    nop
    nop
    j _start

And then overwrite it with OpenOCD + GDB. I believe this should be enough to make it load the application as the board would be into a state which should be ready to run software (?)

TommyMurphyTM1234 commented 2 years ago

I don't really understand why you have, once again, ignored my suggestion that you debug the program execution from 0x08000000 through the startup code to the point at which the illegal instruction exception occurs in order to actually understand what's happening here instead of guessing and proposing workaround hacks such as the endless loop stub program with the nops?

fcuzzocrea commented 2 years ago

I tried to debug the execution from the 0x08000000 trough the startup code, both with OpenOCD and Lauterbach debugger.

All I was able to find is that the code goes trough the reset_vector, then it goes trough the trap_vector, and right after I end up trapped into trap_from_machine_mode. I wasn't able to pinpoint in the assembly code which is run in trap_vector where the actual illegal instruction exception occurs.

All I was trying to say is that, probably, the while loop I proposed in the previous comment and which I implemented into my bootloader, is not sufficient to prepare the SoC to accept the loading of an application programmed trough the OpenOCD into the LIM and which is expecting to be loaded by the bootloader.

I believe that the startup code expect to find several registers populated correctly (for instance a0 and a1).

Sorry if it seemed that I wanted to ignore your suggestion, wasn't my intention.

TommyMurphyTM1234 commented 2 years ago

As I said before when you end up in the trap handler with $mcause == 0x2 (illegal instruction) then $mepc gives the address of the instruction that caused the exception. According to your earlier post you had $mepc == 0x08000018 so what instruction is that in the disassembly and what is the value actually in memory at that address? If the first few instructions of the program are executing ok then I'm not sure that there's any evidence to place the blame on the bootloader busy wait loop for subsequent erroneous execution.

The other issue regarding what looks like a mismatch between the symbolic debugging information and the actual program also remains. E.g. _start is actually at 0x08000000 but your symbolic debugging information seems to think that it's at 0x08000008.

fcuzzocrea commented 2 years ago

If you are referring to assembly instruction, the instruction which I see using disassembly in GDB by manually inspecting the memory, matches what is in the ELF file inspected using objdump:

GDB Output:

(gdb) disassemble 0x8000018
Dump of assembler code for function reset_vector:
   0x0000000008000000 <+0>: auipc   a4,0x0
   0x0000000008000004 <+4>: addi    a4,a4,172 # 0x80000ac <trap_vector>
   0x0000000008000008 <+8>: csrw    mtvec,a4
   0x000000000800000c <+12>:    csrr    a5,mtvec
   0x0000000008000010 <+16>:    bne a4,a5,0x800000c <reset_vector+12>
   0x0000000008000014 <+20>:    beqz    a0,0x8000020 <reset_vector+32>
   0x0000000008000018 <+24>:    csrwi   mideleg,0
   0x000000000800001c <+28>:    csrwi   medeleg,0
   0x0000000008000020 <+32>:    csrw    mscratch,zero
   0x0000000008000024 <+36>:    csrw    mcause,zero
   0x0000000008000028 <+40>:    csrw    mepc,zero
   0x000000000800002c <+44>:    beqz    a0,0x8000030 <reset_vector+48>
   0x0000000008000030 <+48>:    csrr    t0,misa
   0x0000000008000034 <+52>:    bltz    t0,0x800003c <reset_vector+60>
   0x0000000008000038 <+56>:    j   0x8000030 <reset_vector+48>
   0x000000000800003c <+60>:    auipc   gp,0x23
   0x0000000008000040 <+64>:    addi    gp,gp,-1564 # 0x8022a20 <local_irq_handler_u54_1_table+16>
   0x0000000008000044 <+68>:    auipc   a4,0x49
   0x0000000008000048 <+72>:    addi    a4,a4,-68 # 0x8049000
   0x000000000800004c <+76>:    auipc   a5,0x4b
   0x0000000008000050 <+80>:    addi    a5,a5,-76 # 0x804b000
   0x0000000008000054 <+84>:    auipc   sp,0x4b
   0x0000000008000058 <+88>:    addi    sp,sp,-84 # 0x804b000
   0x000000000800005c <+92>:    sd  zero,0(a4)
   0x0000000008000060 <+96>:    addi    a4,a4,8
   0x0000000008000064 <+100>:   blt a4,a5,0x800005c <reset_vector+92>
   0x0000000008000068 <+104>:   auipc   a4,0x31
   0x000000000800006c <+108>:   addi    a4,a4,-1832 # 0x8030940
   0x0000000008000070 <+112>:   auipc   a5,0x49
   0x0000000008000074 <+116>:   addi    a5,a5,-1840 # 0x8048940
   0x0000000008000078 <+120>:   sd  zero,0(a4)
   0x000000000800007c <+124>:   addi    a4,a4,8
   0x0000000008000080 <+128>:   blt a4,a5,0x8000078 <reset_vector+120>
   0x0000000008000084 <+132>:   bnez    a1,0x8000098 <reset_vector+152>
   0x0000000008000088 <+136>:   addi    sp,sp,-64
--Type <RET> for more, q to quit, c to continue without paging--
   0x000000000800008c <+140>:   mv  tp,sp
   0x0000000008000090 <+144>:   mv  a0,tp
   0x0000000008000094 <+148>:   j   0x80023f8 <u54_single_hart>
   0x0000000008000098 <+152>:   mv  a0,a1
   0x000000000800009c <+156>:   j   0x80023f8 <u54_single_hart>
   0x00000000080000a0 <+160>:   nop
   0x00000000080000a4 <+164>:   nop
   0x00000000080000a8 <+168>:   j   0x80000a0 <reset_vector+160>
End of assembler dump.

objdump output:

build/debug/c3app.elf:     file format elf64-littleriscv

Disassembly of section .text:

0000000008000000 <_start>:
 8000000:       00000717                auipc   a4,0x0
 8000004:       0ac70713                addi    a4,a4,172 # 80000ac <trap_vector>
 8000008:       30571073                csrw    mtvec,a4
 800000c:       305027f3                csrr    a5,mtvec
 8000010:       fef71ee3                bne     a4,a5,800000c <_start+0xc>
 8000014:       00050663                beqz    a0,8000020 <_start+0x20>
 8000018:       30305073                csrwi   mideleg,0
 800001c:       30205073                csrwi   medeleg,0
 8000020:       34001073                csrw    mscratch,zero
 8000024:       34201073                csrw    mcause,zero
 8000028:       34101073                csrw    mepc,zero
 800002c:       00050263                beqz    a0,8000030 <_start+0x30>
 8000030:       301022f3                csrr    t0,misa
 8000034:       0002c463                bltz    t0,800003c <_start+0x3c>
 8000038:       ff9ff06f                j       8000030 <_start+0x30>
 800003c:       00023197                auipc   gp,0x23
 8000040:       9e418193                addi    gp,gp,-1564 # 8022a20 <__global_pointer$>
 8000044:       00049717                auipc   a4,0x49
 8000048:       fbc70713                addi    a4,a4,-68 # 8049000 <__app_stack_bottom>
 800004c:       0004b797                auipc   a5,0x4b
 8000050:       fb478793                addi    a5,a5,-76 # 804b000 <__app_stack_top>
 8000054:       0004b117                auipc   sp,0x4b
 8000058:       fac10113                addi    sp,sp,-84 # 804b000 <__app_stack_top>
 800005c:       00073023                sd      zero,0(a4)
 8000060:       00870713                addi    a4,a4,8
 8000064:       fef74ce3                blt     a4,a5,800005c <_start+0x5c>
 8000068:       00031717                auipc   a4,0x31
 800006c:       8d870713                addi    a4,a4,-1832 # 8030940 <__bss_end>
 8000070:       00049797                auipc   a5,0x49
 8000074:       8d078793                addi    a5,a5,-1840 # 8048940 <__heap_end>
 8000078:       00073023                sd      zero,0(a4)
 800007c:       00870713                addi    a4,a4,8
 8000080:       fef74ce3                blt     a4,a5,8000078 <_start+0x78>
 8000084:       00059a63                bnez    a1,8000098 <_start+0x98>
 8000088:       fc010113                addi    sp,sp,-64
 800008c:       00010213                mv      tp,sp
 8000090:       00020513                mv      a0,tp
 8000094:       3640206f                j       80023f8 <u54_single_hart>
 8000098:       00058513                mv      a0,a1
 800009c:       35c0206f                j       80023f8 <u54_single_hart>
 80000a0:       00000013                nop
 80000a4:       00000013                nop
 80000a8:       ff9ff06f                j       80000a0 <_start+0xa0>

Actually, I was able to replicate the issue I have got also with this example from Microchip, which is an example of an application which is supposed to be loaded by the bootloader (so compiled with IMAGE_LOADED_BY_BOOTLOADER 1) which is the exact same thing I am trying to do in my use case.

Prior doing this experiment I flashed reference design ver 2021.11 (which also ships HSS, so the board was loaded with HSS). All I did was to open up the latest release of the example in SoftConsole and then build it using the build button. After, I just manually invoked OpenOCD:

fcuzzocrea@Latitude-5420:~$ /opt/openocd/bin/openocd --command "set DEVICE MPFS" --command "set COREID 1" --file /opt/openocd/share/openocd/scripts/board/microsemi-riscv.cfg 
xPack OpenOCD (Microchip SoftConsole build), x86_64 Open On-Chip Debugger 0.10.0+dev-00859-g95a8cd9b5-dirty (2020-10-21-21:16)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
MPFS
1
Info : only one transport option; autoselect 'jtag'
do_board_reset_init
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : Embedded FlashPro6 (revision B) found (USB_ID=1514:200b path=/dev/hidraw1)
Info : Embedded FlashPro6 (revision B) CM3 firmware version: F4.0
Info : clock speed 6000 kHz
Info : JTAG tap: mpfs.cpu tap/device found: 0x0f81a1cf (mfg: 0x0e7 (GateField), part: 0xf81a, ver: 0x0)
Info : datacount=2 progbufsize=16
Info : Disabling abstract command reads from CSRs.
Info : Examined RISC-V core; found 5 harts
Info :  hart 0: currently disabled
Info :  hart 1: XLEN=64, misa=0x800000000014112d
Info :  hart 2: currently disabled
Info :  hart 3: currently disabled
Info :  hart 4: currently disabled
Info : Listening on port 3333 for gdb connections
Info : accepting 'gdb' connection on tcp/3333
Info : Disabling abstract command writes to CSRs.

The arguments match what SoftConsole default configuration is setting

After doing that I manually invoked SoftConsole GDB and I got trapped again:

fcuzzocrea@Latitude-5420:~/.local/microchip/SoftConsole-v2021.1/extras/home/polarfire-soc-bare-metal-examples/applications/mpfs-pmp-demo/mpfs-pmp-app-u54-1/DDR-Release$ /home/fcuzzocrea/.local/microchip/SoftConsole-v2021.1/riscv-unknown-elf-gcc/bin/riscv64-unknown-elf-gdb mpfs-pmp-app-u54-1.elf 
GNU gdb (xPack GNU RISC-V Embedded GCC (Microsemi SoftConsole build), 64-bit) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-unknown-linux-gnu --target=riscv64-unknown-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://github.com/sifive/freedom-tools/issues>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Really redefine built-in command "remote"? (y or n) [answered Y; input not from terminal]
Reading symbols from mpfs-pmp-app-u54-1.elf...
(gdb) set mem inaccessible-by-default off
(gdb) set $target_riscv=1
(gdb) set arch riscv:rv64
The target architecture is assumed to be riscv:rv64
(gdb) target remote localhost:3333
0x000000000800c9ce in ?? ()
Loading section .text, size 0x2780 lma 0x80000000
Loading section .sdata, size 0x10 lma 0x80002780
Loading section .data, size 0xe30 lma 0x80002790
Start address 0x0000000080000000, load size 13760
Transfer rate: 11 KB/sec, 4586 bytes/write.
(gdb) thread apply all set $pc=_start

Thread 1 (Remote target):
(gdb) tb u54_1
Temporary breakpoint 1 at 0x8000138a: file ../src/application/hart1/u54_1.c, line 46.
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
trap_from_machine_mode (regs=0x80413e8, dummy=3149939, mepc=2147483696) at ../src/platform/mpfs_hal/common/mss_mtrap.c:806
806             if(i == 0x1000U)
(gdb) p/x $mcause
$1 = 0x2
(gdb) p/x $mepc
$2 = 0x80000030
(gdb) 
(gdb) disassemble 0x80000030
Dump of assembler code for function reset_vector:
   0x0000000080000000 <+0>: auipc   a4,0x0
   0x0000000080000004 <+4>: addi    a4,a4,176 # 0x800000b0 <trap_vector>
   0x0000000080000008 <+8>: csrw    mtvec,a4
   0x000000008000000c <+12>:    csrr    a5,mtvec
   0x0000000080000010 <+16>:    bne a4,a5,0x8000000c <reset_vector+12>
   0x0000000080000014 <+20>:    beqz    a0,0x80000020 <reset_vector+32>
   0x0000000080000018 <+24>:    csrwi   mideleg,0
   0x000000008000001c <+28>:    csrwi   medeleg,0
   0x0000000080000020 <+32>:    csrw    mscratch,zero
   0x0000000080000024 <+36>:    csrw    mcause,zero
   0x0000000080000028 <+40>:    csrw    mepc,zero
   0x000000008000002c <+44>:    beqz    a0,0x80000034 <reset_vector+52>
   0x0000000080000030 <+48>:    fscsr   zero
   0x0000000080000034 <+52>:    csrr    t0,misa
   0x0000000080000038 <+56>:    bltz    t0,0x80000040 <reset_vector+64>
   0x000000008000003c <+60>:    j   0x80000034 <reset_vector+52>
   0x0000000080000040 <+64>:    auipc   gp,0x3
   0x0000000080000044 <+68>:    addi    gp,gp,-192 # 0x80002f80 <local_irq_handler_u54_1_table+112>
   0x0000000080000048 <+72>:    auipc   a4,0x4
   0x000000008000004c <+76>:    addi    a4,a4,-72 # 0x80004000
   0x0000000080000050 <+80>:    auipc   a5,0x6
   0x0000000080000054 <+84>:    addi    a5,a5,-80 # 0x80006000
   0x0000000080000058 <+88>:    auipc   sp,0x6
   0x000000008000005c <+92>:    addi    sp,sp,-88 # 0x80006000
   0x0000000080000060 <+96>:    sd  zero,0(a4)
   0x0000000080000064 <+100>:   addi    a4,a4,8
   0x0000000080000068 <+104>:   blt a4,a5,0x80000060 <reset_vector+96>
   0x000000008000006c <+108>:   auipc   a4,0x4
   0x0000000080000070 <+112>:   addi    a4,a4,-1404 # 0x80003af0
   0x0000000080000074 <+116>:   auipc   a5,0x4
   0x0000000080000078 <+120>:   addi    a5,a5,-1412 # 0x80003af0
   0x000000008000007c <+124>:   sd  zero,0(a4)
   0x0000000080000080 <+128>:   addi    a4,a4,8
   0x0000000080000084 <+132>:   blt a4,a5,0x8000007c <reset_vector+124>
   0x0000000080000088 <+136>:   bnez    a1,0x8000009c <reset_vector+156>
   0x000000008000008c <+140>:   addi    sp,sp,-64
   0x0000000080000090 <+144>:   mv  tp,sp
   0x0000000080000094 <+148>:   mv  a0,tp
   0x0000000080000098 <+152>:   j   0x800003ee <u54_single_hart>
   0x000000008000009c <+156>:   mv  a0,a1
   0x00000000800000a0 <+160>:   j   0x800003ee <u54_single_hart>
   0x00000000800000a4 <+164>:   nop
   0x00000000800000a8 <+168>:   nop
   0x00000000800000ac <+172>:   j   0x800000a4 <reset_vector+164>

Actually I also tried single stepping but it is not clear the cause why I end up in trap_from_machine_mode (at least to me). The only thing I can think of is that no valid data are found in a0 and a1.

(gdb) thread apply all set $pc = _start

Thread 1 (Remote target):
(gdb) bt
#0  reset_vector () at ../src/platform/mpfs_hal/startup_gcc/mss_entry.S:300
(gdb) si
0x0000000080000004  300     la a4, trap_vector
(gdb) si
301     csrw mtvec, a4          # initalise machine trap vector address
(gdb) si
304     csrr    a5, mtvec
(gdb) si
305     bne a4, a5, 2b
(gdb) si
311     beqz a0, 3f
(gdb) si
312     csrw mideleg, 0
(gdb) si
313     csrw medeleg, 0
(gdb) si
316     csrw mscratch, zero
(gdb) si
317     csrw mcause, zero
(gdb) si
318     csrw mepc, zero
(gdb) si
323     beqz a0, 1f
(gdb) si
325     fscsr x0
(gdb) si
trap_vector () at ../src/platform/mpfs_hal/startup_gcc/mss_entry.S:391
391     addi sp, sp, -INTEGER_CONTEXT_SIZE     # moves sp down stack to make I
(gdb) si
394     STORE sp, 2*REGBYTES(sp)               # sp
(gdb) si
395     STORE a0, 10*REGBYTES(sp)              # save a0,a1 in the created CONTEXT
(gdb) si
396     STORE a1, 11*REGBYTES(sp)
(gdb) si
397     STORE ra, 1*REGBYTES(sp)
(gdb) si
398     STORE gp, 3*REGBYTES(sp)
(gdb) si
399     STORE tp, 4*REGBYTES(sp)
(gdb) si
400     STORE t0, 5*REGBYTES(sp)
(gdb) si
401     STORE t1, 6*REGBYTES(sp)
(gdb) si
402     STORE t2, 7*REGBYTES(sp)
(gdb) si
403     STORE s0, 8*REGBYTES(sp)
(gdb) si
404     STORE s1, 9*REGBYTES(sp)
(gdb) si
405     STORE a2,12*REGBYTES(sp)
(gdb) si
406     STORE a3,13*REGBYTES(sp)
(gdb) si
407     STORE a4,14*REGBYTES(sp)
(gdb) si
408     STORE a5,15*REGBYTES(sp)
(gdb) si
409     STORE a6,16*REGBYTES(sp)
(gdb) si
410     STORE a7,17*REGBYTES(sp)
(gdb) si
411     STORE s2,18*REGBYTES(sp)
(gdb) si
412     STORE s3,19*REGBYTES(sp)
(gdb) si
413     STORE s4,20*REGBYTES(sp)
(gdb) si
414     STORE s5,21*REGBYTES(sp)
(gdb) si
415     STORE s6,22*REGBYTES(sp)
(gdb) si
416     STORE s7,23*REGBYTES(sp)
(gdb) si
417     STORE s8,24*REGBYTES(sp)
(gdb) si
418     STORE s9,25*REGBYTES(sp)
(gdb) si
419     STORE s10,26*REGBYTES(sp)
(gdb) si
420     STORE s11,27*REGBYTES(sp)
(gdb) si
421     STORE t3,28*REGBYTES(sp)
(gdb) si
422     STORE t4,29*REGBYTES(sp)
(gdb) si
423     STORE t5,30*REGBYTES(sp)
(gdb) si
424     STORE t6,31*REGBYTES(sp)
(gdb) si
426     mv a0, sp                          # a0 <- regs
(gdb) si
432     csrr a1, mbadaddr                 # useful for anaysis when things go wrong
(gdb) si
433     csrr a2, mepc
(gdb) si
434     jal trap_from_machine_mode
(gdb) si
trap_from_machine_mode (regs=0x8041258, dummy=3149939, mepc=2147483696) at ../src/platform/mpfs_hal/common/mss_mtrap.c:760
760     volatile uintptr_t mcause = read_csr(mcause);
(gdb) si
0x0000000080000a6a  760     volatile uintptr_t mcause = read_csr(mcause);
(gdb) si
0x0000000080000a6c  760     volatile uintptr_t mcause = read_csr(mcause);
(gdb) si
0x0000000080000a6e  760     volatile uintptr_t mcause = read_csr(mcause);
(gdb) si
0x0000000080000a70  760     volatile uintptr_t mcause = read_csr(mcause);
(gdb) si
0x0000000080000a72  760     volatile uintptr_t mcause = read_csr(mcause);
(gdb) si
760     volatile uintptr_t mcause = read_csr(mcause);
(gdb) si
762     if (((mcause & MCAUSE_INT) == MCAUSE_INT) && ((mcause & MCAUSE_CAUSE)  > 15U)&& ((mcause & MCAUSE_CAUSE)  < 64U))
(gdb) si
0x0000000080000a7a  762     if (((mcause & MCAUSE_INT) == MCAUSE_INT) && ((mcause & MCAUSE_CAUSE)  > 15U)&& ((mcause & MCAUSE_CAUSE)  < 64U))
(gdb) si
766     else if (((mcause & MCAUSE_INT) == MCAUSE_INT) && ((mcause & MCAUSE_CAUSE)  == IRQ_M_EXT))
(gdb) si
0x0000000080000aaa  766     else if (((mcause & MCAUSE_INT) == MCAUSE_INT) && ((mcause & MCAUSE_CAUSE)  == IRQ_M_EXT))
(gdb) si
770     else if (((mcause & MCAUSE_INT) == MCAUSE_INT) && ((mcause & MCAUSE_CAUSE)  == IRQ_M_SOFT))
(gdb) si
0x0000000080000ac2  770     else if (((mcause & MCAUSE_INT) == MCAUSE_INT) && ((mcause & MCAUSE_CAUSE)  == IRQ_M_SOFT))
(gdb) si
774     else if (((mcause & MCAUSE_INT) == MCAUSE_INT) && ((mcause & MCAUSE_CAUSE)  == IRQ_M_TIMER))
(gdb) si
0x0000000080000ada  774     else if (((mcause & MCAUSE_INT) == MCAUSE_INT) && ((mcause & MCAUSE_CAUSE)  == IRQ_M_TIMER))
(gdb) si
778     else if ((mcause == CAUSE_STORE_ACCESS) | (mcause == CAUSE_LOAD_ACCESS) | (mcause == CAUSE_FETCH_ACCESS))
(gdb) si
0x0000000080000af2  778     else if ((mcause == CAUSE_STORE_ACCESS) | (mcause == CAUSE_LOAD_ACCESS) | (mcause == CAUSE_FETCH_ACCESS))
(gdb) si
0x0000000080000af4  778     else if ((mcause == CAUSE_STORE_ACCESS) | (mcause == CAUSE_LOAD_ACCESS) | (mcause == CAUSE_FETCH_ACCESS))
(gdb) si
0x0000000080000af6  778     else if ((mcause == CAUSE_STORE_ACCESS) | (mcause == CAUSE_LOAD_ACCESS) | (mcause == CAUSE_FETCH_ACCESS))
(gdb) si
0x0000000080000af8  778     else if ((mcause == CAUSE_STORE_ACCESS) | (mcause == CAUSE_LOAD_ACCESS) | (mcause == CAUSE_FETCH_ACCESS))
(gdb) si
0x0000000080000afa  778     else if ((mcause == CAUSE_STORE_ACCESS) | (mcause == CAUSE_LOAD_ACCESS) | (mcause == CAUSE_FETCH_ACCESS))
(gdb) si
0x0000000080000afe  778     else if ((mcause == CAUSE_STORE_ACCESS) | (mcause == CAUSE_LOAD_ACCESS) | (mcause == CAUSE_FETCH_ACCESS))
(gdb) si
0x0000000080000b02  778     else if ((mcause == CAUSE_STORE_ACCESS) | (mcause == CAUSE_LOAD_ACCESS) | (mcause == CAUSE_FETCH_ACCESS))
(gdb) si
0x0000000080000b04  778     else if ((mcause == CAUSE_STORE_ACCESS) | (mcause == CAUSE_LOAD_ACCESS) | (mcause == CAUSE_FETCH_ACCESS))
(gdb) si
0x0000000080000b06  778     else if ((mcause == CAUSE_STORE_ACCESS) | (mcause == CAUSE_LOAD_ACCESS) | (mcause == CAUSE_FETCH_ACCESS))
(gdb) si
0x0000000080000b0a  778     else if ((mcause == CAUSE_STORE_ACCESS) | (mcause == CAUSE_LOAD_ACCESS) | (mcause == CAUSE_FETCH_ACCESS))
(gdb) si
^[[A0x0000000080000b0c  778     else if ((mcause == CAUSE_STORE_ACCESS) | (mcause == CAUSE_LOAD_ACCESS) | (mcause == CAUSE_FETCH_ACCESS))
(gdb) si
^[[A805             i++;        /* added some code as SC debugger hangs if in loop doing nothing */
(gdb) si
^[[A806             if(i == 0x1000U)
(gdb) si
805             i++;        /* added some code as SC debugger hangs if in loop doing nothing */
TommyMurphyTM1234 commented 2 years ago

If the illegal instruction exception is happening on the fscsr instruction at 0x08000030 in the second disassembly listing, then it suggests to me that the target hart doesn't support floating point (F/D extension) - or maybe this extension has been disabled via $misa? However, I'm a bit confused because there are three disassembly listings and I'm not sure which one relates to the debug session and trap scenario. The earlier two have a different instruction at 0x08000030, namely csrr to the third (fscsr). And earlier in the thread you said that the trap was happening at 0x08000018 so it's difficult to keep track of what the issue is and what scenario is being exercised, tested and debugged.

I think you probably need to take this up with Microchip customer support. It's not an OpenOCD issue as far as I can see.

fcuzzocrea commented 2 years ago

The first disassembly listing was related to my custom app (the one for which 0x08000030 was flagged as the offending instruction), the second disassembly listing instead is what I get when I run objdump against the ELF I am loading trough openocd (always related to my custom app). I posted both just to show that openocd is correctly loading the ELF (so what it is memory matches what I have compiled).

The third disassembly listing instead is what I get when playing with the microchip example.

In both cases I get trapped (although the offending instruction is different for the two cases). I tried the Microchip example just to try some code which is supposed to be tested and working, and noticed it traps too (but the offending instruction is another one).

Anyway, thanks for all your help! I'll try to get in touch with Microchip support!

TommyMurphyTM1234 commented 2 years ago

Seems that, regardless of what code you're using, the problem is always an illegal instruction exception. That being the case I doubt that any code that runs before your program proper (e.g. the bootloader) is relevant - unless, perhaps, it is disabling features/extensions by changing $misa (e.g. such as switching off floating point F/D extensions) this disabling instructions on which your code depends. That's assuming that the target supports a "dynamic" $misa in the first place. So either your program is using instructions that the target doesn't support (or, for which, support has been disabled by the default $misa being changed) or the instruction fetched from memory is simply invalid or corrupted.

fcuzzocrea commented 2 years ago

Just a brief update to let you know that I was able to achieve my goal using ebreak. Probably it is just an hack, but maybe could be useful for someone else in future.

Basically I have created a wait_for_debugger() function:

void wait_for_debugger(HLS_DATA* hls, MODE_CHOICE mode_choice,
                       uint64_t next_addr)
{
    /* Store current hardid */
    uint32_t hartid = read_csr(mhartid);

    /* Restore PLIC to known state */
    __disable_irq();
    PLIC_init();

    /* Disable all interrupts: */
    write_csr(mie, 0);

    switch (mode_choice) {
    default:
    case M_MODE:
        /**
         * User application execution should now start and never return
         * here....
         */
        write_csr(mepc, next_addr);
        break;
    case S_MODE:
        /**
         * User application execution should now start and never return
         * here....
         */
        write_csr(mepc, next_addr);
        break;
    }

    register unsigned long a0 asm("a0") = hartid;
    register unsigned long a1 asm("a1") = (unsigned long)hls;

    /* Hold for debugger to upload the app */
    __asm__("ebreak");

    __asm__ __volatile__("mret" : : "r"(a0), "r"(a1));
    __builtin_unreachable();
}

Of course I know the value of next_addr as the entry point of the elf I want to load is the start of the LIM, so I can just hardcode it.

When my bootloader is started from a debugging environment (Trace32 or OpenOCD+GDB), it executes the function and then hits the ebreak, so wait for the application to be loaded in the LIM.

With GDB I do:

fcuzzocrea@Latitude-5420:~$ riscv64-unknown-elf-gdb build/debug/c3app.elf 
GNU gdb (GDB) 10.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=riscv64-unknown-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Really redefine built-in command "remote"? (y or n) [answered Y; input not from terminal]
Reading symbols from build/debug/c3app.elf...
(gdb) set mem inaccessible-by-default off
(gdb) set $target_riscv=1
(gdb) set arch riscv:rv64
The target architecture is set to "riscv:rv64".
(gdb) target extended-remote localhost:3333
Remote debugging using localhost:3333
0x000000002022149a in ?? ()
(gdb) start
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Function "main" not defined.
Make breakpoint pending on future shared library load? (y or [n]) n
Starting program: /home/fcuzzocrea/Documenti/Progetti/core3_template_app/build/debug/c3app.elf 
Disabling abstract command writes to CSRs.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x0000000020221ed8 in ?? ()
(gdb) load
Loading section .text, size 0x22200 lma 0x8000000
Loading section .sdata, size 0x70 lma 0x8022200
Loading section .data, size 0x3930 lma 0x8022270
Loading section .sdram, size 0x1388 lma 0x8025ba0
Start address 0x0000000008000000, load size 159528
Transfer rate: 9 KB/sec, 13294 bytes/write.
(gdb) continue
Continuing.

And my application loads correctly.

As I said, this probably it is just an hack, and can hardly be integrated into an IDE to allow interactive debugging, but at least it is working and I can use GDB from the CLI.

Also - regarding my dwarf messages, I was able to fix them, I was missing the -gdwarf-2 cflag.

abelixrev commented 3 months ago

Hello,

Could you please let me know the file and line number where you added the wait_for_debugger() function, as well as where you called it? Thanks a lot for your investigation.

Best regards,

TommyMurphyTM1234 commented 3 months ago

Closing this issue as it's not an OpenOCD issue but something to do with the code executing on the target and possibly related to details of how the Microchip PolarFire SoC integrates the RISC-V (a SiFive U54-MC variant) and implements that platform's custom boot modes, how they interact with debugging etc.