pyocd / pyOCD

Open source Python library for programming and debugging Arm Cortex-M microcontrollers
https://pyocd.io
Apache License 2.0
1.13k stars 484 forks source link

gdb drops connection (invalid hex digit 120) #1000

Open patrislav1 opened 4 years ago

patrislav1 commented 4 years ago

I can't get pyocd gdbserver to work. When I try to connect with arm-none-eabi-gdb, gdb says Reply contains invalid hex digit 120 and drops the connection. Since debugging the same application with openocd works, I assume the problem is somewhere in pyocd. The strange thing is that gdb complains about the invalid hex digit when pyocd sends b'$Text=0;Data=0;Bss=0#04' which is a constant so it's not clear to see where this invalid value comes from.

pyocd version is the current dev version (5a52ccc). Full pyocd log is attached. pyocd.log

$ arm-none-eabi-gdb path/to/application.elf

GNU gdb (GNU Arm Embedded Toolchain 9-2020-q2-update) 8.3.1.20191211-git
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-linux-gnu --target=arm-none-eabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from dmmc-stamp-breakout/dmmc-stamp-breakout.elf...
(gdb) tar ext :3333
Remote debugging using :3333
Reply contains invalid hex digit 120
(gdb) 
$ pyocd gdbserver
...
0012986:DEBUG:packet_io:--<<<< GDB send 101 bytes: b'$l<?xml version="1.0"?><!DOCTYPE feature SYSTEM "threads.dtd"><threads><thread id="1" /></threads>#41'
0012987:DEBUG:packet_io:-->>>> GDB read 13 bytes: b'$qAttached#8f'
0012987:DEBUG:gdbserver:GDB received query: [b'Attached#8f']
0012987:DEBUG:packet_io:--<<<< GDB send 5 bytes: b'$1#31'
0012988:DEBUG:packet_io:-->>>> GDB read 8 bytes: b'$Hc-1#09'
0012988:DEBUG:packet_io:--<<<< GDB send 6 bytes: b'$OK#9a'
0012989:DEBUG:packet_io:-->>>> GDB read 12 bytes: b'$qOffsets#4b'
0012989:DEBUG:gdbserver:GDB received query: [b'Offsets#4b']
0012990:DEBUG:packet_io:--<<<< GDB send 23 bytes: b'$Text=0;Data=0;Bss=0#04'
0012991:DEBUG:packet_io:GDB packet thread: other side closed connection
0012991:DEBUG:packet_io:GDB packet thread stopping
0013090:INFO:gdbserver:Client disconnected!
patrislav1 commented 4 years ago

Here's a Wireshark capture of the pyocd <---> gdb traffic. Can't see anything that isn't already shown in the pyocd log. If the 120 value gdb complains about, is not hex, but decimal, then the x in $Text=0;Data=0;Bss=0#04 could be the offending character. However in gdb remote protocol this looks like a valid response to qOffsets.

pyocd-gdbserver-fail.zip

patrislav1 commented 4 years ago

Fun Fact: OpenOCD uses the very same reply string to answer gdb's qOffsets command, but its session doesn't get shut down. I see zero difference. Maybe it uses a different GDB session setup?

image

patrislav1 commented 4 years ago

The watchdog timer of the SAM4L MCU was still active (you can configure it to stop while debugging, but I forgot that), so when pyocd connected and halted the chip, it ran into timeout. Before I could connect gdb to pyocd, it already went through watchdog reset, losing the connection to the debugger.

When I disable the watchdog, pyocd gdbserver works as expected.

Losing connection on watchdog reset could still be an interesting edge case for pyocd to catch and display a more helpful error message.

flit commented 4 years ago

Hi @patrislav1 , thanks for all the logs! The "Reply contains invalid hex digit 120" reported by gdb is definitely a bug somewhere. Sometimes gdb reports errors for packets prior to the most recently received one.

Why would the debug connection be lost on a watchdog reset? That should be a warm reset, which shouldn't reset debug logic according the the v7-M specification. But it's not uncommon for silicon vendors to mess up reset implementation on MCU. I don't know enough about the SAM4L to know whether this is the case here.

flit commented 4 years ago

Reopening this issue due to the invalid hex digit error.

patrislav1 commented 4 years ago

Why would the debug connection be lost on a watchdog reset? That should be a warm reset, which shouldn't reset debug logic according the the v7-M specification. But it's not uncommon for silicon vendors to mess up reset implementation on MCU.

I can only speculate that the watchdog reset works differently than other resets on that chip. For example, if I reset the chip in software, using CMSIS NVIC_SystemReset(), the debugger keeps working. Maybe it's not a breaking connection - that was just my ad-hoc interpretation. Maybe something different is happening that makes pyocd choke, leading to the invalid hex thing. I'm ready to collect more data or do other tests if that helps (I guess you can't reproduce this without that particular chip).

flit commented 4 years ago

When I get a chance, I'll check the SAM4L reference manual to see what it says about the reset hierarchy. I'll also dig into the logs you supplied, and try to think of additional data to gather. Thanks providing such detailed info!

flit commented 4 years ago

I just encountered this same issue on an nRF5340, and figured out what the root cause is. The short answer is that the core is not halted when gdb connected.

When gdb first connects to a target, it requests the target status (? command with T stop reply packet response). For Cortex-M, this means that the core must be halted since you can only read core registers in that state. When pyocd can't read a core register when gdb asks for it, it returns a "not available" value of all "x"s instead of hex digits. Normally this is a good thing, since gdb reports to the user that the register's value is not currently available. However, the code in gdb that parses the very first stop reply packet (of course it's not the same as for other stop reply packets…) doesn't handle the "not available" value. So when it tries to parse the register's value in hex, it bails when it sees an x character.

In the trace you posted above, it's packet 51 that is the culprit:

$T1107:xxxxxxxx;0d:xxxxxxxx;0e:xxxxxxxx;0f:xxxxxxxx;thread:1;#ce

The reason the error isn't reported until the qOffsets command is (I believe) that gdb caches the command replies on connect and then parses multiple replies at once.

So, there are two things to do here:

  1. Figure out why your SAM4L is not halted when gdb connects. Note that pyocd will always attempt to halt the target when the gdbserver starts up.
  2. Implement a fix for pyocd so it returns a fake value (like 0) for the cases where gdb can't handle the "not available" register value in stop reply packets.

(In my case with the nRF5340, the second M33 core fails to halt for some reason that I haven't had time to investigate yet.)

patrislav1 commented 4 years ago
  1. Figure out why your SAM4L is not halted when gdb connects. Note that pyocd will always attempt to halt the target when the gdbserver starts up.

Remember this issue is caused by a watchdog reset (at least in my case). So the sequence probably is: 1) I start pyocd gdbserver, it halts the target 2) watchdog bites and resets the target 3) I start gdb, it connects and requests status, after target got kicked out of halt state by watchdog, et voilà ;)

ecstrema commented 4 months ago

FWIW, I can reproduce with OpenOCD and pyOCD, so GDB very probably is the culprit.

In my case, it runs for a while and randomly stops at some point, with the following error and log:

(This is happening inside a custom python function).

As you can see, the traceback and the error have nothing to do with the error, so this very probably happens async.

Traceback (most recent call last):
  File "../cortex-profiler/remote-profiler.py", line 335, in invoke
    function_writer.process(backtrace)
    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "../cortex-profiler/remote-profiler.py", line 184, in sample
    if key not in self.countmap:
                       ^^^^^^^^^^
  File "../cortex-profiler/remote-profiler.py", line 133, in get_frame
    print("\nStart traceback")
                              ^
gdb.error: Invalid hex digit 116