Unable to halt hart, then telnet reset, causes segmentation fault

likewise commented 3 years ago

I am running an unstripped OpenOCD 0.11.0 (release version) build inside GDB to provide info about a segfault that occurs when OpenOCD is resetting a RISC-V target that is in an unknown corrupt state. (I am not sure if this bug report belongs at riscv/riscv-openocd or at mainline openocd, I am starting here).

The target is a RISC-V, a variation of the lowRISC Ibex simple_system running on a Xilinx FPGA, DMI via BSCANE2.

In very rare cases I get that target into a condition where OpenOCD can be segfaulted. This bug report is about the OpenOCD segfault.

(gdb) run
Starting program: /opt/openocd/bin/openocd -f ./openocd_zynqmp_bscane2.cfg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Open On-Chip Debugger 0.11.0
Licensed under GNU GPL v2
For bug reports, read
        http://openocd.org/doc/doxygen/bugs.html
force hard breakpoints
none separate

Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
[New Thread 0x7ffff79d8700 (LWP 98959)]
Info : clock speed 5000 kHz
Info : JTAG tap: uscale.tap tap/device found: 0x5ba00477 (mfg: 0x23b (ARM Ltd), part: 0xba00, ver: 0x5)
Info : JTAG tap: uscale.ps tap/device found: 0x24738093 (mfg: 0x049 (Xilinx), part: 0x4738, ver: 0x2)
Info : JTAG tap: uscale.tap tap/device found: 0x5ba00477 (mfg: 0x23b (ARM Ltd), part: 0xba00, ver: 0x5)
Info : JTAG tap: uscale.ps tap/device found: 0x24738093 (mfg: 0x049 (Xilinx), part: 0x4738, ver: 0x2)
Info : datacount=2 progbufsize=8
Error: unable to halt hart 0
Error:   dmcontrol=0x80000001
Error:   dmstatus =0x00030c82
Error: Fatal: Hart 0 failed to halt during examine()
Warn : target uscale.ps examination failed
Info : starting gdb server for uscale.ps on 3333
Info : Listening on port 3333 for gdb connections

At this point OpenOCD is waiting for any connection. From another terminal I run

telnet localhost 4444

and inside that telnet session I type

reset

OpenOCD now additionally mentions "Hart 0 doesn't exist." then segfaults:

Info : accepting 'telnet' connection on tcp/4444
Info : JTAG tap: uscale.tap tap/device found: 0x5ba00477 (mfg: 0x23b (ARM Ltd), part: 0xba00, ver: 0x5)
Info : JTAG tap: uscale.ps tap/device found: 0x24738093 (mfg: 0x049 (Xilinx), part: 0x4738, ver: 0x2)
Info : datacount=2 progbufsize=8
Error: Hart 0 doesn't exist.
Error: unable to halt hart 0
Error:   dmcontrol=0x80000001
Error:   dmstatus =0x00030c82
Error: Fatal: Hart 0 failed to halt during examine()

Thread 1 "openocd" received signal SIGSEGV, Segmentation fault.
register_cache_invalidate (cache=0x0) at ../src/target/register.c:109
109             struct reg *reg = cache->reg_list;
(gdb)

Here is a further backtrace (OpenOCD was probably compiled with optimizations, a default build)

(gdb) bt
#0  register_cache_invalidate (cache=0x0) at ../src/target/register.c:109
#1  0x0000555555613079 in riscv_invalidate_register_cache (target=target@entry=0x555555972ad0) at ../src/target/riscv/riscv.c:2933
#2  0x000055555561311d in riscv_assert_reset (target=0x555555972ad0) at ../src/target/riscv/riscv.c:1131
#3  0x00005555555f8013 in jim_target_reset (interp=0x55555592a2e0, argc=<optimized out>, argv=<optimized out>) at ../src/target/target.c:5360
#4  0x000055555562892a in command_unknown (interp=0x55555592a2e0, argc=<optimized out>, argv=0x7fffffffa410) at ../src/helper/command.c:1055
#5  0x0000555555779213 in JimInvokeCommand (interp=interp@entry=0x55555592a2e0, objc=objc@entry=4, objv=objv@entry=0x7fffffffa410)
    at ../../jimtcl/jim.c:10161

https://sourceforge.net/p/openocd/code/ci/v0.11.0/tree/src/target/register.c#l107

/** Marks the contents of the register cache as invalid (and clean). */
void register_cache_invalidate(struct reg_cache *cache)
{
    struct reg *reg = cache->reg_list;

Aparently target->reg_cache is NULLified somewhere for this situation, and there is no safe-guarding against target->reg_cache == NULL which causes the segfault:

void riscv_invalidate_register_cache(struct target *target)
{
    RISCV_INFO(r);

    LOG_DEBUG("[%d]", target->coreid);
    register_cache_invalidate(target->reg_cache);

While my RISC-V target is in this rare condition I will rebuild OpenOCD as unoptimized unstripped for more debugging.

likewise commented 3 years ago

Also see #195 which might be related.

timsifive commented 3 years ago

I tried to reproduce this by hacking my source to return ERROR_FAIL at the same point, with the result being that OpenOCD exits when it encounters the failure. This sounds like a problem that has been addressed in riscv-openocd already, but that has not been merged upstream yet.

en-sc commented 7 months ago

@likewise, we do intend to upstream and align this repository with mainline OpenOCD. However, currently RISC-V OpenOCD is almost on top of mainline (the mainline version used is about a week old). Please, consider using RISC-V OpenOCD for now if you are connecting to RISC-V targets. On this note, can you please check if the issue is present in RISC-V OpenOCD?

riscv-collab / riscv-openocd

Unable to halt hart, then telnet reset, causes segmentation fault #630