Open brainstorm opened 3 years ago
Hm, bummer, after reading some code and accompanying discussions it seems like the stlinkv2 clones are not suitable for the DJTAG2/SPI boost:
The other boards are not compatible because the SPI cannot use the pins defined for TCK, TDI, TDO.
:/
EDIT: Hold on @phdussud... on a LFPQ48 package which is what STLinkv2 clones (usually?) have:
... can't all those functions also be remapped via software? What am I missing here, some obscure datasheet footnote or STM32 gotcha?
Actually, SPI isn't the most significant factor for the speedup. I think it is around 10-15% over the other improvements. I would think that with a non SPI device, you will get around 8x performance increase over the oldest release. And no I don't think you can remap PB8 to become any SPI MOSI and PB12 to become any SPI MISO
Another note, to get this speedup you need to change the way you drive the DJTAG. Adding the NOREAD bit to the CMD_XFER command when you don't need to read TDO will make a big difference.
Thanks for the insight!
Another note, to get this speedup you need to change the way you drive the DJTAG. Adding the NOREAD bit to the CMD_XFER command when you don't need to read TDO will make a big difference.
Can you elaborate a bit more on that point? I'm a bit new to the codebase and JTAG protocol itself :-S
With the changes present right now on master
, the probe is not fast enough for general use on the ESP32: GDB times out and so does the debug adapter on IDEs (i.e VSCode's DAP). I've traced this issue to remote_bitbang (used when issuing target extended-remote :3333
) bottlenecks as I detailed above so I think I'll be attacking that next. A SiFive debug engineer I contacted about this last issue noted that:
(...) add an interface to jtag/core.c that allows the target to tell the JTAG code whether it is going to take immediate action on the returning values, or whether those values can come later.
E.g. in RISC-V memory read code we assemble a batch to all scan at once, and we don't actually need any of the return values until after the whole batch has been scanned.
And at Espressif they actually implemented two custom alternatives to remote bitbang as workarounds, I'll read those implementations today and gather more insight into all this.
It all depends on the JTAG protocol of the device you are trying to program but most devices do not return anything on the TDO line while you are feeding the code through TDI. In this case, sampling TDO and sending it back to the host through USB slows down the programming because during this time, the programming process is suspended. DJTAG1 always return the TDO sampled while the programming data is fed to the device through TDI(CMD_XFER command for DJTAG). If you use the CMD_XFER | NO_READ defined in DJTAG2, it will avoid sending back the sampled TDO across USB to the host.
I see what you mean now, thanks!
I'm actually not using the probe for programming the IC's flash instead (sending data through TDI) since I upload the firmware via UART instead (CP2102 IC connected to another USB hub port on the dev board). I guess your PR does include that improvement, right? I'm using the current master
right now, I'll merge that PR and see if things improve.
For now I'd like to just use the adapter as a JTAG debugger, not necessarily flashing, but hw breakpoints do not seem to be working... yet.
Hi @brainstorm, thanks for your thorough write-up! As @phdussud pointed out, bitbanging JTAG is not necessarly slowing down things here, it's more about optimizing the DJTAG1 USB protocol.
The openocd-dirtyjtag repository currently only supports the DJTAG1 protocol, so even you were to use a DJTAG2-compatible dongle with you'd still end up being bottlenecked by the USB latency.
bitbanging JTAG is not necessarly slowing down things here, it's more about optimizing the DJTAG1 USB protocol.
The openocd-dirtyjtag repository currently only supports the DJTAG1 protocol, so even you were to use a DJTAG2-compatible dongle with you'd still end up being bottlenecked by the USB latency.
Are you sure about that? Did you profile it? My profiling shows that OpenOCD's remote_bitbang is what might be bottlenecking here since the GlasgowEmbedded probe (ICE40 FPGA+FX2 USB frontend) was slow on the same JTAG operation(s).
See this speedup from 3 years ago on OpenOCD:
http://openocd.zylin.com/#/c/4312/
According to OSX's Instruments (UNIX gprof equivalent-ish), remote_bitbang is taking ~14 seconds during the initial target remote :3333
phase, and most of that time is spent on OpenOCD's queues... I thus believe it's the host side of things that's bottlenecking at this point?
If you can hack the DJTAG interface of OpenOCD, you can set the DJTAG2 speed to 15000Khz which will set the bitbang speed to max (no wait state between pin toggles). Also if you know that you don't need to read back, don't issue the CMD_GETSIG command and don't issue a USB read at that point. You also can see if you can buffer as many as CMD_SETSIG as you can before issuing a CMD_STOP. CMD_STOP is a boundary for USB packets (DJTAG will not read past a CMD_STOP in a USB buffer, so the next command will come from another USB packet). If you can collapse as many USB sends into one (up to 64 bytes for DJTAG2) you will minimize the USB overhead, which can be very significant because the USB latency is not small and the packets length are very small (<= 64 bytes).
I just looked at the profile mentioned earlier and it seems that all of the time is spent in I/O latency which includes the time of execution of a command on the DJTAG side. I don't know what bitbang_scan but I suspect that it is also an IO read from DJTAG. I may be wrong...
Wow! Turns out I'm not the only one who's trying to make DirtyJTAG work with ESP32 :laughing: After fixing a couple of build errors, I finally got it to work! Partially... Here's what the output looks like:
Open On-Chip Debugger 0.11.0-rc2+devv0.10.0-esp32-20201202-632-gfc16ceb2 (2021-02-15-20:00)
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : only one transport option; autoselect 'jtag'
adapter speed: 1000 kHz
Warn : Transport "jtag" was already selected
Info : FreeRTOS creation
Info : FreeRTOS creation
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : clock speed 1000 kHz
Info : JTAG tap: esp32.cpu0 tap/device found: 0x120034e5 (mfg: 0x272 (Tensilica), part: 0x2003, ver: 0x1)
Info : JTAG tap: esp32.cpu1 tap/device found: 0x120034e5 (mfg: 0x272 (Tensilica), part: 0x2003, ver: 0x1)
Warn : target esp32.cpu1 examination failed
Info : starting gdb server for esp32.cpu0 on 3333
Info : Listening on port 3333 for gdb connections
Info : accepting 'gdb' connection on tcp/3333
Info : esp32.cpu0: Target halted, PC=0x400801CB, debug_reason=00000000
Info : Set GDB target to 'esp32.cpu0'
Warn : No symbols for FreeRTOS!
Info : esp32.cpu0: Debug controller was reset.
Info : esp32.cpu0: Core was reset.
Error: timed out while waiting for target halted / 4 - 2
Info : esp32.cpu0: Target halted, PC=0x400801DA, debug_reason=00000000
Error: xtensa_wait_algorithm: not halted 0, pc 0x400801da, ps 0x60020
Error: Failed to wait algorithm (-302)!
Error: Algorithm run failed (-302)!
/* *** around 500 lines of "flash mappings" that actually appear to be random numbers. It's probably too much to post here, so: https://pastebin.com/GRMKzBpL *** */
I tried frequencies in the range of 100 to 5000 kHz. Every one of them leads to this. Also, the output above is produced when I don't connect DJAG's TRST pin to the EN pin. When I do, it looks like this:
Open On-Chip Debugger 0.11.0-rc2+devv0.10.0-esp32-20201202-632-gfc16ceb2 (2021-02-15-20:00)
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : only one transport option; autoselect 'jtag'
adapter speed: 100 kHz
Warn : Transport "jtag" was already selected
Info : FreeRTOS creation
Info : FreeRTOS creation
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : clock speed 100 kHz
Error: JTAG scan chain interrogation failed: all ones
Error: Check JTAG interface, timings, target power, etc.
Error: Trying to use configured scan chain anyway...
Error: esp32.cpu0: IR capture error; saw 0x1f not 0x01
Warn : Bypassing JTAG setup events due to errors
Warn : target esp32.cpu0 examination failed
Warn : target esp32.cpu1 examination failed
Info : starting gdb server for esp32.cpu0 on 3333
Info : Listening on port 3333 for gdb connections
Oh, by looking at the UART output, I can see that the system gets reset by the RTC WDT. Let me disable it and test again
No. Now it crashes with a
Fatal exception (0): IllegalInstruction
epc1=0x40090912, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
in the console
Ordered now an ESP32-S2 board to try it out...
Intro
This is a followup on https://github.com/jeanthom/openocd-dirtyjtag/issues/2#issuecomment-769485719, focusing a working setup with DirtyJTAG+Espressif ESP32S2 Saola-1 dev board connected to its JTAG pins.
I am using a chinese knock-off of an STM32 (a.k.a CK32), please follow this link for instructions if you found that IC as well inside your STLink v2 clone... in other words: we are dealing with a clone of a clone here, so here be dragons & YMMV ;)
For this to work with Espressif's ICs, I had to merge their own fork of OpenOCD which is quite out-of-sync from upstream since 2017 and DirtyJTAG's one... not a fun nor clean git merge exercise. But it worked in the end despite being sleep deprived and just focusing on DirtyJTAG+Xtensa/Espressif to work.
Clone my dirty OpenOCD merge from here: https://github.com/brainstorm/openocd-esp32
Current status
Here's a photo:
And here's what OpenOCD and GDB have to say:
The flash enumeration takes several seconds, so it's not performant and bottlenecking on remote bitbang OpenOCD protocol on the host (computer), more on that point below...
Unfortunately, after the initial connection, breakpoints do no seem to work (yet):
And there are several timeout issues on the OpenOCD side:
Oscilloscope
@jeanthom, I hope this complements your tests with the cheap FX2 probe! To my untrained eye, there doesn't seem to be significant electrical or signal integrity issues, clean signal without extra efforts, as opposed to my experiments with Glasgow:
Conclusions
At this point DirtyJTAG performs a bit worse than my remote bitbang Glasgow board experiments but on the upside, none of the pullup issues I encountered show up here since the STLinkv2 clone already handles that well and rise times look good out of the box without modification.
Given the screenshots above, TCK seems to max out at 1MHz even if the configured adapter speed is much higher (I tested from 100kHz to 26MHz with same results).
I suspect that this is a case where we should be optimizing things in the right place, which is OpenOCD remote bitbang driver: see where OpenOCD's remote bitbang driver is bottlenecking here.
What needs to be done (or not)
I understand the reasons and I usually follow that advice myself, but:
I bow before that last statement... so perhaps the best idea is to give DJTAG2 PR a go w/ DirtyJTAG+ESP32S2 board :) ... and/or perhaps try to merge and use @lupyuen's https://github.com/lupyuen/openocd-spi if I'm mashochistic enough to merge Espressif's fork again with it, we shall see ;)