jeanthom / openocd-dirtyjtag

OpenOCD fork with DirtyJTAG support (WIP)
GNU General Public License v2.0
16 stars 5 forks source link

DirtyJTAG progress with the ESP32S2 #3

Open brainstorm opened 3 years ago

brainstorm commented 3 years ago

Intro

This is a followup on https://github.com/jeanthom/openocd-dirtyjtag/issues/2#issuecomment-769485719, focusing a working setup with DirtyJTAG+Espressif ESP32S2 Saola-1 dev board connected to its JTAG pins.

I am using a chinese knock-off of an STM32 (a.k.a CK32), please follow this link for instructions if you found that IC as well inside your STLink v2 clone... in other words: we are dealing with a clone of a clone here, so here be dragons & YMMV ;)

For this to work with Espressif's ICs, I had to merge their own fork of OpenOCD which is quite out-of-sync from upstream since 2017 and DirtyJTAG's one... not a fun nor clean git merge exercise. But it worked in the end despite being sleep deprived and just focusing on DirtyJTAG+Xtensa/Espressif to work.

Clone my dirty OpenOCD merge from here: https://github.com/brainstorm/openocd-esp32

Current status

Here's a photo:

IMG_1127

And here's what OpenOCD and GDB have to say:

$ ./src/openocd -f tcl/interface/dirtyjtag.cfg -f tcl/target/esp32s2.cfg
Open On-Chip Debugger 0.11.0-rc2+devv0.10.0-esp32-20201202-613-g436e659c-dirty (2021-01-30-01:08)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
Info : only one transport option; autoselect 'jtag'
adapter speed: 100 kHz

Warn : Transport "jtag" was already selected
Info : FreeRTOS creation
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : clock speed 100 kHz
Info : JTAG tap: esp32s2.cpu tap/device found: 0x120034e5 (mfg: 0x272 (Tensilica), part: 0x2003, ver: 0x1)
Info : starting gdb server for esp32s2 on 3333
Info : Listening on port 3333 for gdb connections
Info : accepting 'gdb' connection on tcp/3333
Info : esp32s2: Target halted, PC=0x40091BEA, debug_reason=00000000
Info : Detected ESP32-S2 chip
Warn : No symbols for FreeRTOS!
Info : esp32s2: Target halted, PC=0x400314BA, debug_reason=00000001
Info : Flash mapping 0: 0x10020 -> 0x3f000020, 25 KB
Info : Flash mapping 1: 0x20020 -> 0x40080020, 73 KB
Info : esp32s2: Target halted, PC=0x400314BA, debug_reason=00000001
Info : Auto-detected flash bank 'esp32s2.flash' size 4096 KB
Info : Using flash bank 'esp32s2.flash' size 4096 KB
Info : esp32s2: Target halted, PC=0x400314BA, debug_reason=00000001
Info : Flash mapping 0: 0x10020 -> 0x3f000020, 25 KB
Info : Flash mapping 1: 0x20020 -> 0x40080020, 73 KB
Info : Using flash bank 'esp32s2.irom' size 76 KB
Info : esp32s2: Target halted, PC=0x400314BA, debug_reason=00000001
Info : Flash mapping 0: 0x10020 -> 0x3f000020, 25 KB
Info : Flash mapping 1: 0x20020 -> 0x40080020, 73 KB
Info : Using flash bank 'esp32s2.drom' size 28 KB
Warn : keep_alive() was not invoked in the 1000 ms timelimit. GDB alive packet not sent! (92079 ms). Workaround: increase "set remotetimeout" in GDB
Warn : Prefer GDB command "target extended-remote 3333" instead of "target remote 3333"
Warn : keep_alive() was not invoked in the 1000 ms timelimit. GDB alive packet not sent! (2090 ms). Workaround: increase "set remotetimeout" in GDB

The flash enumeration takes several seconds, so it's not performant and bottlenecking on remote bitbang OpenOCD protocol on the host (computer), more on that point below...

$ /Users/romanvg/.espressif/tools/xtensa-esp32s2-elf/esp-2020r3-8.4.0/xtensa-esp32s2-elf/bin/xtensa-esp32s2-elf-gdb -q /Users/romanvg/dev/esp32s2_can/twai_network_slave/build/twai_network_slave.elf
Reading symbols from /Users/romanvg/dev/esp32s2_can/twai_network_slave/build/twai_network_slave.elf...done.
(gdb) target remote :3333
Remote debugging using :3333
0x40091bea in esp_pm_impl_waiti () at /Users/romanvg/dev/esp-idf/components/esp_pm/pm_impl.c:533
533     asm("waiti 0");
(gdb) bt
#0  0x40091bea in esp_pm_impl_waiti () at /Users/romanvg/dev/esp-idf/components/esp_pm/pm_impl.c:533
#1  0x400870b6 in esp_vApplicationIdleHook () at /Users/romanvg/dev/esp-idf/components/esp_common/src/freertos_hooks.c:63
#2  0x4002713d in prvIdleTask (pvParameters=0x0) at /Users/romanvg/dev/esp-idf/components/freertos/tasks.c:3835
#3  0x4002802c in vPortTaskWrapper (pxCode=0x40027134 <prvIdleTask>, pvParameters=0x0)
    at /Users/romanvg/dev/esp-idf/components/freertos/port/xtensa/port.c:168
(gdb) l
528          */
529         esp_pm_impl_idle_hook();
530         s_skipped_light_sleep[core_id] = false;
531     }
532 #else
533     asm("waiti 0");
534 #endif // CONFIG_FREERTOS_USE_TICKLESS_IDLE
535 }
536
537 #if CONFIG_FREERTOS_USE_TICKLESS_IDLE
(gdb)

Unfortunately, after the initial connection, breakpoints do no seem to work (yet):

(gdb) b app_main
Breakpoint 1 at 0x400852d4: file ../main/twai_network_example_slave_main.c, line 122.
(gdb) c
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.
keep_alive() was not invoked in the 1000 ms timelimit. GDB alive packet not sent! (1099 ms). Workaround: increase "set remotetimeout" in GDB

And there are several timeout issues on the OpenOCD side:

(...)
Info : esp32s2: Target halted, PC=0x400314BA, debug_reason=00000001
Info : Flash mapping 0: 0x10020 -> 0x3f000020, 25 KB
Info : Flash mapping 1: 0x20020 -> 0x40080020, 73 KB
Info : Using flash bank 'esp32s2.drom' size 28 KB
Warn : keep_alive() was not invoked in the 1000 ms timelimit. GDB alive packet not sent! (64461 ms). Workaround: increase "set remotetimeout" in GDB
Warn : keep_alive() was not invoked in the 1000 ms timelimit. GDB alive packet not sent! (1411 ms). Workaround: increase "set remotetimeout" in GDB
Warn : keep_alive() was not invoked in the 1000 ms timelimit. GDB alive packet not sent! (1099 ms). Workaround: increase "set remotetimeout" in GDB

Oscilloscope

@jeanthom, I hope this complements your tests with the cheap FX2 probe! To my untrained eye, there doesn't seem to be significant electrical or signal integrity issues, clean signal without extra efforts, as opposed to my experiments with Glasgow:

RigolDS47 RigolDS44 RigolDS43

Conclusions

At this point DirtyJTAG performs a bit worse than my remote bitbang Glasgow board experiments but on the upside, none of the pullup issues I encountered show up here since the STLinkv2 clone already handles that well and rise times look good out of the box without modification.

Given the screenshots above, TCK seems to max out at 1MHz even if the configured adapter speed is much higher (I tested from 100kHz to 26MHz with same results).

I suspect that this is a case where we should be optimizing things in the right place, which is OpenOCD remote bitbang driver: see where OpenOCD's remote bitbang driver is bottlenecking here.

What needs to be done (or not)

I understand the reasons and I usually follow that advice myself, but:

  1. This is a (weird yet fun) hobby for me :)
  2. Also a way to (hopefully) contribute to more accessible and affordable programmers. What can be more convenient than a fully functional and performant stlinkv2 clone of a clone with unbeatable sub-dollar-shipping-included price? ... and no, I'll not design a new board for that, REUSE, which brings me to the next point:
  3. Avoid waste and e-pollution: People "needing" to buy yet another programmer board just because the software and accompanying toolchain(s) are not good enough, even if the hardware is perfectly capable and performant, is a big shame: let's fix that.

Here’s my plea to all Embedded Developers: Please stop using Bit Banging! I hope this article has given you plenty of reasons. (And this article has wasted your precious time, since you wouldn’t be reading it if OpenOCD were using SPI already)

I bow before that last statement... so perhaps the best idea is to give DJTAG2 PR a go w/ DirtyJTAG+ESP32S2 board :) ... and/or perhaps try to merge and use @lupyuen's https://github.com/lupyuen/openocd-spi if I'm mashochistic enough to merge Espressif's fork again with it, we shall see ;)

brainstorm commented 3 years ago

Hm, bummer, after reading some code and accompanying discussions it seems like the stlinkv2 clones are not suitable for the DJTAG2/SPI boost:

The other boards are not compatible because the SPI cannot use the pins defined for TCK, TDI, TDO.

:/

EDIT: Hold on @phdussud... on a LFPQ48 package which is what STLinkv2 clones (usually?) have:

  1. PB12 is SPI2_(NSS) on pin 25. Which corresponds to TDO/SWDIO.
  2. PB13 is SPI2_(SCK) on pin 26. Which corresponds to CLK/SWCLK.
  3. PB8 is TIM4_CH4 on pin 46. Which corresponds to TDI/SWIM.

... can't all those functions also be remapped via software? What am I missing here, some obscure datasheet footnote or STM32 gotcha?

phdussud commented 3 years ago

Actually, SPI isn't the most significant factor for the speedup. I think it is around 10-15% over the other improvements. I would think that with a non SPI device, you will get around 8x performance increase over the oldest release. And no I don't think you can remap PB8 to become any SPI MOSI and PB12 to become any SPI MISO

phdussud commented 3 years ago

Another note, to get this speedup you need to change the way you drive the DJTAG. Adding the NOREAD bit to the CMD_XFER command when you don't need to read TDO will make a big difference.

brainstorm commented 3 years ago

Thanks for the insight!

Another note, to get this speedup you need to change the way you drive the DJTAG. Adding the NOREAD bit to the CMD_XFER command when you don't need to read TDO will make a big difference.

Can you elaborate a bit more on that point? I'm a bit new to the codebase and JTAG protocol itself :-S

With the changes present right now on master, the probe is not fast enough for general use on the ESP32: GDB times out and so does the debug adapter on IDEs (i.e VSCode's DAP). I've traced this issue to remote_bitbang (used when issuing target extended-remote :3333) bottlenecks as I detailed above so I think I'll be attacking that next. A SiFive debug engineer I contacted about this last issue noted that:

(...) add an interface to jtag/core.c that allows the target to tell the JTAG code whether it is going to take immediate action on the returning values, or whether those values can come later.

E.g. in RISC-V memory read code we assemble a batch to all scan at once, and we don't actually need any of the return values until after the whole batch has been scanned.

And at Espressif they actually implemented two custom alternatives to remote bitbang as workarounds, I'll read those implementations today and gather more insight into all this.

phdussud commented 3 years ago

It all depends on the JTAG protocol of the device you are trying to program but most devices do not return anything on the TDO line while you are feeding the code through TDI. In this case, sampling TDO and sending it back to the host through USB slows down the programming because during this time, the programming process is suspended. DJTAG1 always return the TDO sampled while the programming data is fed to the device through TDI(CMD_XFER command for DJTAG). If you use the CMD_XFER | NO_READ defined in DJTAG2, it will avoid sending back the sampled TDO across USB to the host.

brainstorm commented 3 years ago

I see what you mean now, thanks!

I'm actually not using the probe for programming the IC's flash instead (sending data through TDI) since I upload the firmware via UART instead (CP2102 IC connected to another USB hub port on the dev board). I guess your PR does include that improvement, right? I'm using the current master right now, I'll merge that PR and see if things improve.

For now I'd like to just use the adapter as a JTAG debugger, not necessarily flashing, but hw breakpoints do not seem to be working... yet.

jeanthom commented 3 years ago

Hi @brainstorm, thanks for your thorough write-up! As @phdussud pointed out, bitbanging JTAG is not necessarly slowing down things here, it's more about optimizing the DJTAG1 USB protocol.

The openocd-dirtyjtag repository currently only supports the DJTAG1 protocol, so even you were to use a DJTAG2-compatible dongle with you'd still end up being bottlenecked by the USB latency.

brainstorm commented 3 years ago

bitbanging JTAG is not necessarly slowing down things here, it's more about optimizing the DJTAG1 USB protocol.

The openocd-dirtyjtag repository currently only supports the DJTAG1 protocol, so even you were to use a DJTAG2-compatible dongle with you'd still end up being bottlenecked by the USB latency.

Are you sure about that? Did you profile it? My profiling shows that OpenOCD's remote_bitbang is what might be bottlenecking here since the GlasgowEmbedded probe (ICE40 FPGA+FX2 USB frontend) was slow on the same JTAG operation(s).

See this speedup from 3 years ago on OpenOCD:

http://openocd.zylin.com/#/c/4312/

According to OSX's Instruments (UNIX gprof equivalent-ish), remote_bitbang is taking ~14 seconds during the initial target remote :3333 phase, and most of that time is spent on OpenOCD's queues... I thus believe it's the host side of things that's bottlenecking at this point?

phdussud commented 3 years ago

If you can hack the DJTAG interface of OpenOCD, you can set the DJTAG2 speed to 15000Khz which will set the bitbang speed to max (no wait state between pin toggles). Also if you know that you don't need to read back, don't issue the CMD_GETSIG command and don't issue a USB read at that point. You also can see if you can buffer as many as CMD_SETSIG as you can before issuing a CMD_STOP. CMD_STOP is a boundary for USB packets (DJTAG will not read past a CMD_STOP in a USB buffer, so the next command will come from another USB packet). If you can collapse as many USB sends into one (up to 64 bytes for DJTAG2) you will minimize the USB overhead, which can be very significant because the USB latency is not small and the packets length are very small (<= 64 bytes).

phdussud commented 3 years ago

I just looked at the profile mentioned earlier and it seems that all of the time is spent in I/O latency which includes the time of execution of a command on the DJTAG side. I don't know what bitbang_scan but I suspect that it is also an IO read from DJTAG. I may be wrong...

portasynthinca3 commented 3 years ago

Wow! Turns out I'm not the only one who's trying to make DirtyJTAG work with ESP32 :laughing: After fixing a couple of build errors, I finally got it to work! Partially... Here's what the output looks like:

Open On-Chip Debugger 0.11.0-rc2+devv0.10.0-esp32-20201202-632-gfc16ceb2 (2021-02-15-20:00)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
Info : only one transport option; autoselect 'jtag'
adapter speed: 1000 kHz

Warn : Transport "jtag" was already selected
Info : FreeRTOS creation
Info : FreeRTOS creation
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : clock speed 1000 kHz
Info : JTAG tap: esp32.cpu0 tap/device found: 0x120034e5 (mfg: 0x272 (Tensilica), part: 0x2003, ver: 0x1)
Info : JTAG tap: esp32.cpu1 tap/device found: 0x120034e5 (mfg: 0x272 (Tensilica), part: 0x2003, ver: 0x1)
Warn : target esp32.cpu1 examination failed
Info : starting gdb server for esp32.cpu0 on 3333
Info : Listening on port 3333 for gdb connections
Info : accepting 'gdb' connection on tcp/3333
Info : esp32.cpu0: Target halted, PC=0x400801CB, debug_reason=00000000
Info : Set GDB target to 'esp32.cpu0'
Warn : No symbols for FreeRTOS!
Info : esp32.cpu0: Debug controller was reset.
Info : esp32.cpu0: Core was reset.
Error: timed out while waiting for target halted / 4 - 2
Info : esp32.cpu0: Target halted, PC=0x400801DA, debug_reason=00000000
Error: xtensa_wait_algorithm: not halted 0, pc 0x400801da, ps 0x60020
Error: Failed to wait algorithm (-302)!
Error: Algorithm run failed (-302)!
/* *** around 500 lines of "flash mappings" that actually appear to be random numbers. It's probably too much to post here, so: https://pastebin.com/GRMKzBpL *** */

I tried frequencies in the range of 100 to 5000 kHz. Every one of them leads to this. Also, the output above is produced when I don't connect DJAG's TRST pin to the EN pin. When I do, it looks like this:

Open On-Chip Debugger 0.11.0-rc2+devv0.10.0-esp32-20201202-632-gfc16ceb2 (2021-02-15-20:00)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
Info : only one transport option; autoselect 'jtag'
adapter speed: 100 kHz

Warn : Transport "jtag" was already selected
Info : FreeRTOS creation
Info : FreeRTOS creation
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : clock speed 100 kHz
Error: JTAG scan chain interrogation failed: all ones
Error: Check JTAG interface, timings, target power, etc.
Error: Trying to use configured scan chain anyway...
Error: esp32.cpu0: IR capture error; saw 0x1f not 0x01
Warn : Bypassing JTAG setup events due to errors
Warn : target esp32.cpu0 examination failed
Warn : target esp32.cpu1 examination failed
Info : starting gdb server for esp32.cpu0 on 3333
Info : Listening on port 3333 for gdb connections
portasynthinca3 commented 3 years ago

Oh, by looking at the UART output, I can see that the system gets reset by the RTC WDT. Let me disable it and test again

portasynthinca3 commented 3 years ago

No. Now it crashes with a

Fatal exception (0): IllegalInstruction
epc1=0x40090912, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000

in the console

zoobab commented 2 years ago

Ordered now an ESP32-S2 board to try it out...