riscv-collab / riscv-openocd

Fork of OpenOCD that has RISC-V support
Other
452 stars 328 forks source link

Unable to halt hart should not cause abort() #195

Open mwachs5 opened 6 years ago

mwachs5 commented 6 years ago

If a hart makes a bad memory access and the bus blocks, the debugger can't halt the hart (depending on the debug module implementation). This currently leads to "Unable to halt hart " and OpenOCD aborting().

However, debugger could still try reset halt and potentially successfully halt the hart. I am not sure how this optionality can be provided to the user, but it would be nicer to print a message "Unable to halt hart. Try "reset halt" to return to good state." and not abort()

timsifive commented 6 years ago

In the current mainline there are no calls to abort() anywhere in the src/target/riscv/ directory. What version of the code are you using?

mwachs5 commented 6 years ago

It may be an assertion failure vs abort(). I'll get you more info though.

timsifive commented 6 years ago

Closing due to lack of information. Please reopen this if you encounter the problem again.

mwachs5 commented 6 years ago

I am still having this problem. Here is the message from current master when trying to connect to a target which is blocked on a bad offchip memory access. It would be nice to be able to do monitor reset halt from the debugger for this:

Open On-Chip Debugger 0.10.0+dev-00152-gdf7ba7d-dirty (2018-04-03-14:58)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
adapter speed: 3000 kHz
Info : auto-selecting first available session transport "jtag". To override use 'transport select <transport>'.
Info : clock speed 3000 kHz
Info : JTAG tap: riscv.cpu tap/device found: 0x20000913 (mfg: 0x489 (SiFive, Inc.), part: 0x0000, ver: 0x2)
Info : datacount=2 progbufsize=16
Error: unable to halt hart 0
Error:   dmcontrol=0x80000001
Error:   dmstatus =0x00030c82
Error: Fatal: Hart 0 failed to halt during examine()
Info : Listening on port 3333 for gdb connections
Error: Target not examined yet

[And then OpenOCD quits, meaning I can't connect to it and execute a reset halt command.]

timsifive commented 6 years ago

I'm not sure how to solve this. The problem is that you want to run a command from gdb, but gdb gets the information about the target (list of registers, 32/64 bit) from OpenOCD when it first connects. Without examine() passing OpenOCD doesn't know those things, so gdb will be guessing whether the target is 32- or 64-bit.

There's a similar chicken-and-egg problem within OpenOCD. If examine() (init)fails, OpenOCD won't even expose the reset command. I'll ask the list about that. Maybe there's a way around it.

timsifive commented 6 years ago

I could make OpenOCD just do a reset from examine() if the target doesn't look cooperative. Does that seem reasonable? Having your target reset unexpectedly could be annoying, but it would only happen if you can't connect to it with OpenOCD otherwise. Not sure what the best answer is there.

TommyMurphyTM1234 commented 6 years ago

I don't think that that would fit the OpenOCD way of doing things which should allow for examine() with no reset - e.g. when one wants to debug by connecting to a running target in order to see what the running program is doing (or why it is not working correctly). In OpenOCD reset should only be driven by reset [run|halt|init] commands and not implicitly e.g. by examine ()

timsifive commented 6 years ago

@TM1234 OpenOCD would only do a reset if examine() won't work otherwise.

mwachs5 commented 6 years ago

Is there no stage between "init" and "examine"? "init" being "yes, I was able to connect to a RISC-V Debug Module" and "examine" being "ok I know everythign I need to know about every hart in the system"? What if you connect to a platform with lots of unavailable harts which later become available? Can you not "re-examine" or something to learn the new status of the system?

mwachs5 commented 6 years ago

what If I don't care about running the command from GDB, but I just want to write my .cfg script in a way that if examine() fails, I can still run reset halt? Sort of do what you are saying but as explicitly directed by the config file. My main issue is that OpenOCD just quits after failing to examine the hart instead of allowing any other action (of which GDB connection is just one possible thing one might do)

timsifive commented 6 years ago

init is the TCL command that ends up calling examine(). You can only run init once. The code enforces this explicitly.

I agree that what you want is a good idea, I just don't see how to achieve it without modifying target-independent OpenOCD code. It's probably a non-trivial change.

TommyMurphyTM1234 commented 6 years ago

init is the TCL command that ends up calling examine(). You can only run init once. The code enforces this explicitly.

I'm confused by this. I'm not aware of any "standalone" init command and don't see any such command documented in the OpenOCD user manual. Are you sure you don't mean "reset init"? OpenOCD can connect to and interact with a target without ANY reset [run|halt|init] command being used. In that case examine() still happens.

Anyway this may all be moot given your last post that what Megan is looking for may not be possible without extensive changes to OpenOCD, including non RISC-V target specific code?

timsifive commented 6 years ago

The init command is documented at http://openocd.org/doc/html/Server-Configuration.html#enteringtherunstage

TommyMurphyTM1234 commented 6 years ago

Thanks Tim - missed that. But it's odd because none of the scripts that I have ever used for RISC-V or Cortex-M explicitly call init and yet examine() does get called otherwise nothing would work...

timsifive commented 6 years ago

The documentation says:

This command terminates the configuration stage and enters the run stage. This helps when you need to have the startup scripts manage tasks such as resetting the target, programming flash, etc. To reset the CPU upon startup, add "init" and "reset" at the end of the config script or at the end of the OpenOCD command line using the -c command line switch. If this command does not appear in any startup/configuration file OpenOCD executes the command for you after processing all configuration files and/or command line options.

TommyMurphyTM1234 commented 6 years ago

Oh - thanks - missed that too. Sorry about that!

timsifive commented 6 years ago

It turns out it is possible to deal with this from OpenOCD, though I'm not sure how it all works together.

I hacked my OpenOCD to fail examine() until the target is reset. If I do that, then the following sequence gets you to a state where gdb can attach and debug:

init
reset halt
reset halt

The init is what would usually happen, and fails. The first reset is ignored. I'm not sure why. Then the second reset works as usual, and somehow examine() gets called as part of it.

fabiovito commented 5 years ago

Hi timsifive, Can you write the hacked code that you have used for a work-openocd?

timsifive commented 5 years ago

@fabiovito, I hacked OpenOCD to fail examine() so that I could reproduce the failure. I didn't make any changes to handle it any better.

mwelling commented 4 years ago

Any updates on this issue? I am not able to do the initial programming of LoFive R1 modules with the latest SDK and pre-compiled OpenOCD binary.

It fails with this exact same error and I assume it is because the blank flash is triggering it.

timsifive commented 4 years ago

My last April 5 comment indicates a possible work-around. Does that work for you? If not, can you share the full output of openocd -d when trying this sequence?

mwelling commented 4 years ago

It does not seem to help.

See debugging output of openocd here: https://pastebin.com/4qjbxBW5

mwelling commented 4 years ago

I just noticed something, the gdb was failing to launch because the prebuilt version was looking for libncurses5. I installed the legacy package for libncurses and it programmed the bootloader successfully.

I think the magic is using 'reset halt' instead of GPIO toggling reset method.

aksunlight commented 2 years ago

I am still having this problem. Here is the message from current master when trying to connect to a target which is blocked on a bad offchip memory access. It would be nice to be able to do from the debugger for this:monitor reset halt

Open On-Chip Debugger 0.10.0+dev-00152-gdf7ba7d-dirty (2018-04-03-14:58)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
adapter speed: 3000 kHz
Info : auto-selecting first available session transport "jtag". To override use 'transport select <transport>'.
Info : clock speed 3000 kHz
Info : JTAG tap: riscv.cpu tap/device found: 0x20000913 (mfg: 0x489 (SiFive, Inc.), part: 0x0000, ver: 0x2)
Info : datacount=2 progbufsize=16
Error: unable to halt hart 0
Error:   dmcontrol=0x80000001
Error:   dmstatus =0x00030c82
Error: Fatal: Hart 0 failed to halt during examine()
Info : Listening on port 3333 for gdb connections
Error: Target not examined yet

[And then OpenOCD quits, meaning I can't connect to it and execute a command.]reset halt

I am still having this problem. Here is the message from current master when trying to connect to a target which is blocked on a bad offchip memory access. It would be nice to be able to do from the debugger for this:monitor reset halt

Open On-Chip Debugger 0.10.0+dev-00152-gdf7ba7d-dirty (2018-04-03-14:58)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
adapter speed: 3000 kHz
Info : auto-selecting first available session transport "jtag". To override use 'transport select <transport>'.
Info : clock speed 3000 kHz
Info : JTAG tap: riscv.cpu tap/device found: 0x20000913 (mfg: 0x489 (SiFive, Inc.), part: 0x0000, ver: 0x2)
Info : datacount=2 progbufsize=16
Error: unable to halt hart 0
Error:   dmcontrol=0x80000001
Error:   dmstatus =0x00030c82
Error: Fatal: Hart 0 failed to halt during examine()
Info : Listening on port 3333 for gdb connections
Error: Target not examined yet

[And then OpenOCD quits, meaning I can't connect to it and execute a command.]reset halt

For FPGA, you can download a bitstream programming file to your hardware device and use OpenOCD again. I think other devices can use a similar method.

timsifive commented 2 years ago

I agree this sucks. There's a bit of a chicken and egg problem due to the way OpenOCD initializes. You can't run reset until init has happened, and init requires examine which needs to be able to halt the target. The hack I'm converging on is to make it acceptable for examine to fail early on, but I'm not actively working on that.

sifiverobert commented 2 years ago

I was working abound it by doing custom OPENOCD script (which suspended polling ...) Good debuggers should consider 'bad core' (or always running ...) core as acceptable state. Unfortunately GDB only knows two "running/stopped" states (as it was designed to debug high-level applications in Linux). Changing it can be very painful and very costly.

Spegs21 commented 2 years ago

Having the same issue on a custom RocketChip. I've tried overriding jtag_init with customized TRST and SRST logic per the OpenOCD documentation but it doesn't seem to help much. It is also more prevalent at higher clock speeds. 150 MHz works somewhat reliably but 300 MHz never works.

sifiverobert commented 2 years ago

@Spegs21 - what clock are you talking about? Core clock or JTAG clock? JTAG front-end clock (TCK) is usually de-coupled from core clocks. When core runs fast, actions from JTAG have less time to act. There is a parameter to control delay in Run-Test/IDLE state. In general if halt-in-reset is implemented you should be always able to 'catch' the core. BTW: Try to NOT use nTRST (to make things simpler). JTAG state machine reset by 5+ TCK cycle (with TMS=high) is enough.

Spegs21 commented 2 years ago

@sifiverobert I'm talking about core clock in this case. We provide it externally. I've got it set up as a fixed clock in the DTS. We are providing a separate TCK at 1000 kHz.

I figured it had something to do with catching it, I just haven't had much luck figuring it out. For the Run-Test/IDLE state parameter and halt-in-reset function, are these OpenOCD options or dmcontrol options?

jimaandro commented 1 year ago

I had the same problem. The problem was created after rebooting the server hosting openocd and having the fpga connected to it. The server was running openocd when I rebooted and was writing a Linux Image on my CVA6 design. But I had previously released the CVA6 core from reset, so it was executing some baremetal code that I had previously write on memory. The openocd got stuck on writing the image, because HBM was busy. So I thought that rebooting my machine would fix the problem. And then I got the error you are talking about.

The solution was to connect my Debug module reset to an AXI to GPIO pin which was connected to a JTAG-to-AXI IP. Then I reset the Debug module and error was gone. So the connection was: JTAG-toAXI -> AXI-to-GPIO -> reset pin of Debug Module (and then with Tcl commands from Hardware manager of Vivado you can reset the Debug module)

I noticed that reprogramming the fpga and rebooting the server and reinstalling the OpenOCD didn't work.

aap-sc commented 5 months ago

@mwachs5 it's been a while since this issue was filed. However, I want't to notice that current OpenOCD version does not abort when we are unable to halt the hart. In fact it is even possible to reset the hart in some cases. I assume that the issue is fixed.

@mwachs5 do you still have the capability to double-check if this is the case in your environment?