Closed alexian79 closed 2 months ago
Hi, first of all, I don't think the issue is clock speed related. Currently the Orbtrace supports clock speeds up to 25MHz, and requests for higher values will be rejected. If the host side software doesn't check the return code on the call to set clock speed, this might make it appear to work at a higher speed than it is actually running at. The STM32F7 mule is tested to work at all supported speeds up to and including 25MHz.
I suspect this is a corner case between how IAR is driving CMSIS-DAP and how we're handling it, potentially coupled with the fact that it's using CMSIS-DAP v1. The most common softwares nowadays supports v2, so v1 is seeing much less testing.
You say you've also tried PyOCD and OpenOCD, how did they behave?
Hi Alexian,
Thanks for the excellent logging, that is really helpful on an initial bug report!
First things first - let me ask the dumb question; Is your device correctly powered up, and where is it getting power from? It's essential that VTRef on the probe is powered from somewhere otherwise you can see these kinds of effects...Its easy to forget about that. Assuming your target is 3.3v then you can enable power to VTRef with a command like orbtrace -p vtref,3.3 -e vtref,on
. Given that you're using the mule chances are you've got a second usb plugged in, or perhaps already powering the target directly. It's unlikely any of this is your problem, but let's rule it out before we chase the more complicated stuff. The mule displays one green LED when powered on. You might reasonably expect that the thing wouldn't work at all without power, but we've seen these sorts of symptoms in the past, and it's very easy to get confused by them.
If that renders no benefit then we need to understand if the problem is your specific hardware (its shouldn't be, we checked the production test logs already for your device and it was perfectly fine) or something in the interaction between the IAR CMSIS-DAP driver and the probe. We can reasonably confidently rule out a 'general' problem with SWD because it's been working well for pretty much everyone for the past year, although they do tend to use v2. Can we try connecting using pyocd (you may need to download specific drivers for the target with pyocd pack install STM32F730V8Tx
) and erasing the chip directly with something like;
$ orbtrace -p vtref,3.3 -e vtref,on
$ pyocd erase --chip -t stm32f730v8
0000987 I Erasing chip... [eraser]
0016737 I Chip erase complete [eraser]
$
That's directly from my machine right now, with the latest 1.3.1 gateware.
Please report back on this. Meantime I will check the CMSIS-DAPv1 support since we don't really use that much now and it's possible there's some edge problem in there.
(EDIT TO ADD: I note that your IAR driver is trying to enable SWO/ETM via the probe...we don't support that because it would be too slow so it's marked as unavailable by the driver. The SWO and parallel trace are managed via the orbtrace and orbuculum tools out of band. Of course, that might change in future).
DAVE
I've tested CMSIS-DAP v1 and v2 using the following config file;
source [find interface/cmsis-dap.cfg]
cmsis_dap_vid_pid 0x1209 0x3443
cmsis_dap_backend usb_bulk <OR hid>
transport select swd
source [find target/stm32f7x.cfg]
adapter speed 25000
$_TARGETNAME configure -event gdb-detach { shutdown }
I have no issues in either case, so I don't think we're looking at a problem in the core code of the probe. Assuming that it isn't a power problem or something similar then I suspect an edge case in the communication between the probe and the IAR driver...is there a way we can get a log of that?
DAVE
Hi zyp, hi Dave very touched by your responses. I fully understand that this HW is probably more oriented for use with open source toolchain, but wanted to give it a go with what I use every day, and I saw in the description that you are welcome feedback for some non standard use cases:). First you actually already answered one of my question I having to myself about SWO: "(EDIT TO ADD: I note that your IAR driver is trying to enable SWO/ETM via the probe...we don't support that because ...."
About powering the f7 mule board: I was using this command 'orbtrace -p vtpwr,3.3 -e vtpwr,on'. I will do some testing with external USB cable as well. I have one other board with stm32f405 that has trace populated. I also can get my hands on IJET-TRACE to compare results, as a last resort I can do USB side com trace if needed.
About my adventures with PyOCD and OpenOCD. I did full circle, I started playing with CMSIS-DAP it did not work well, then I moved to GDB driver with PyOCD. And things started to be very strange. At first PyOCD timed out on IAR side on first 2 gdb requests: logs from both sides cspycomm_failed.log pyocd_failed_log.log After that I enable full logging on PyOCD side, and you know what? counterintuitively PyOCD started to work faster! and it worked somehow. cspycomm_pyocd_success.log cspycomm_pyocd_success.log For this to work I had to disable flash download from IAR, I was doing FW flashing directly from PyOCD. Working command in the end is something like that: python -m pyocd gdbserver -t stm32f730v8 -f24M --persist -L"gdbserver=debug" By trial and error I have a feeling that iar has 2 tier timeouts. 10ms for initial '+' and 100ms for full packet, but I don't know how long it will wait from receiving '+' to rest of the command. I did filed questions with IAR regarding their GDB driver behavior and I did some comparisons with JLINK gdb server and IAR, that one was pretty stable.
And then I jumped to OpenOCD that one stuck pretty fast. It looks like timeout is not an issue with OpenOcd but register support for M4, M7 is. version 11 was stuck on ### Write register: 0x0D first, then I recompiled v12. that one stuck on ### Write register: 0x19 what I was able to figure out comparing pyOCD and openOCD it is one of the FPU registers. + another issue ### Read all CPU registers command is not returning full 67 internal registers as I understood in Openodc
cspycomm_pyocd_success.log cspycomm_openocd_faile_on_reg_0x19.log
Now you see that this rabbits hole goes very deep, and I decided to go back to CMSSIS-DAP driver and posting traces here regarding software, other bugs might be not related to orbrace at all :)
We are digging in at this end (more specifically, zyp has downloaded IAR :-) ). We are more than happy to support anything that is using CMSIS-DAP, so we would like to get to the bottom of this if we can.
Can you please have a go at the pyocd erase one-liner above from a 'clean' state and let's see if that works so we can rule out a hardware issue? Your power sequence is fine; You're powering via the 20-pin VTpwr connection which will go via the on-mule 3v3 regulator to provide VTref, so that's all good.
Probe speeds are (currently) set by 50/n MHz where n>1, and they round down, so setting 24MHz would actually result in 16.66MHz. TBH on most chips there is little advantage (beyond marketing) in much faster than 10MHz when you're working with the flash...you just end up with more WAITs.
Sorry no luck even after erasing, tried at 10Mhz ❯ python -m pyocd erase --chip -t stm32f730v8 0003632 I Erasing chip... [eraser] 0020150 I Chip erase complete [eraser]
That's OK, the test wasn't to see if IAR would behave any better, it was mostly just to ensure that the erase worked correctly....that exercises the CMSIS-DAP link quite hard with continuous polls of the chip, so the fact that it's worked correctly gives us some confidence that your hardware (probe, mule, connecting leads, PSU etc) are all OK.
We will pick this up again tomorrow ... all the obvious things have checked out so it's time to have a bit of a hard state at the IAR driver.
Oh, thank you I am pretty sure it works fine! I was using it for programming from pyOCD
I've installed IAR here and given it a test, and so far it appears to connect fine regardless of speed setting. Here's a log of it attaching at 25MHz: https://paste.jvnv.net/view/Nez3V
What I haven't tried yet is building, loading and running some actual code through it. I don't know my way around IAR, so it'd be very helpful if you could provide an example project that I could just open, load and run.
Here is the code I was playing with lately. It was generated by cubeMX. https://github.com/alexian79/mule-f7 You will need to set debugger to DAP yourself; this configuration is not saved. Another option is to create a new project with the menu, but that one will lack proper clock settings.
My version is the following:
If you confirm that you don't see the issue I might download the freshest one. Thank you very much.
PS. you will need to open .eww file to open workspace
Hi,
When I compare the logs I see that the 'Emulation Layer' for zyp's working example is 5.21 and your non-working one is 4.97....and the problems only seem to start after the point where that is printed out.
Is it possible IAR found some hiccups in there and fixed them in the updated version (although I don't see any mention of this change in the release notes for 9.30, 9.32 or 9.40) ?
I will try with 9.40.1, thanks Update: I did the test with new IAR. and I face the same behavior. cspycomm_success_at_2MHZ.log cspycomm_failed_at_25MHZ.log
Update2: looks like even for successful connect at 2Mhz, it fails first time, and there is a small difference in startup marked in blue
Well, this gets odd! Can you please try slower, perhaps 500KHz? Looking at the source code you should be able to get down as far as 200KHz so that could be worth trying. Trying to see if there's perhaps a race in the IAR handler code somewhere...if there is then a slower machine or configuration might expose it (i.e. it'll start working correctly or break totally). Sorry to be running you around, but while we can't create the problem here it's difficult to make progress without involving you in the loop.
Absolutely no problem, I am enjoying this as "a project not as a product" :) Tried at 500k, it works time to time it shows 1 error, or no errors 100k always fail. cspycomm_always_fail_at_100k.log cspycomm_4th_time.log cspycomm_at_500kHz_third_time.log cspycomm_at_500kHz_second_time.log cspycomm_at_500kHz.log
Will there be benefit to get USB side trace to see timings there? I can attempt to do scope tracing on SWD side at some point.
I did tried that on 2 machines, laptop and pc, both had problems.
Something a bit unrelated: do you have a reference bit file? I was a bit scared to reprogram the probe with rebuild gateware. By the way is it enough to just build with --no-compile-software? not yet was able to rebuild with bios. Thanks.
Hi. 100K is below the minimum limit of 200K, so that will indeed fail.
USB side trace is indeed what we need .... If you can generate that it would be perfect. We don't need SWD trace at the moment (depending on what the USB one tells us).
There is gateware and instructions for programming it at the ORBTrace GitHub repository...the version that is already on your probe is in the 'Releases'. Indeed you don't need the software side to do your own compilation (we aren't using the RISC-V at the moment).
Thanks for the help
Dave
On Tue, 11 Jul 2023, 01:51 Oleksiy Bondarenko, @.***> wrote:
Absolutely no problem, I am enjoying this as "a project not as a product" :) Tried at 500k, it works time to time it shows 1 error, or no errors 100k always fail. cspycomm_always_fail_at_100k.log https://github.com/orbcode/orbtrace/files/12009575/cspycomm_always_fail_at_100k.log cspycomm_4th_time.log https://github.com/orbcode/orbtrace/files/12009576/cspycomm_4th_time.log cspycomm_at_500kHz_third_time.log https://github.com/orbcode/orbtrace/files/12009577/cspycomm_at_500kHz_third_time.log cspycomm_at_500kHz_second_time.log https://github.com/orbcode/orbtrace/files/12009578/cspycomm_at_500kHz_second_time.log cspycomm_at_500kHz.log https://github.com/orbcode/orbtrace/files/12009579/cspycomm_at_500kHz.log
Will there be benefit to get USB side trace to see timings there? I can attempt to do scope tracing on SWD side at some point.
I did tried that on 2 machines, laptop and pc, both had problems.
Something a bit unrelated: do you have a reference bit file? I was a bit scared to reprogram the probe with rebuild gateware. By the way is it enough to just build with --no-compile-software? not yet was able to rebuild with bios. Thnaks.
— Reply to this email directly, view it on GitHub https://github.com/orbcode/orbtrace/issues/18#issuecomment-1629932222, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJTBD2WUUKAFIIENTP4M5LXPSPQ7ANCNFSM6AAAAAA2CPDQQ4 . You are receiving this because you commented.Message ID: @.***>
Hi, I did capture traces for failed connected and connected with warning. I created a bundle that includes viewing software, if you are not happy with that view I can try to parse it in python or process it some other way.
It looks like it cannot classify and parse data for CMSIS-DAP class device.
https://www.dropbox.com/sh/m3npxi9s1pzindf/AADC8xgeqx2IeFlWa4hBxo5_a?dl=0
Ah, you've got a Beagle, that's great. 🙂 I already have the software installed and know my way around it, so the files are fine as they are, but I'd like to also have the corresponding logs from IAR from the same runs, so I can correlate what it's logging with what's going on on the bus.
Sorry guys, I left probe with 5V PWR on thru the Beagle to the board on and left for an hour for dinner and get back and Beagle let the smoke out with one on the hub ports. :( It is all we have for today. I can do trace with another sniffer but that will not be today, and it won't be beagle. Looks like orbtrace is fine. Just couple days ago I ordered Cynthion(Luna), who would've kne I will need it soon).
Update: looks like hub is OK after all it went to protection. But this is second beagle died similar way, one died similarly about 8 yars ago :)
Ouch, that's a shame.
I had a quick glance through one of the logs, and see that IAR is poking various stuff and doing several successful transfers before the first failing one. One thing it's doing is setting waitRetry
to a much lower setting than our default (100 vs 4096), so a theory is that we might need a larger number of retries than other probes because we're scheduling them faster. I'm not sure I believe in that theory, but it's an easy thing to test. I couldn't find a config option to set it in IAR, so I've simply made a gateware build that ignores the setting and keeps our default.
Nice, looks like you are on the right path! It works every time now, up untill 25Mhz!!! And shows new error about flash loading at 25.
cspycomm_works_at_20Mhz!!!.log cspycomm_25MHZ_new_fail.log
Update: it looks like there might be some ways to configure with config files: Is this something like low level setup?
or something like this
or even like this
Ah, that was a good late-night-hunch then. We couldn't quite figure out how a DAP access would be subject to that retries setting, but if it works then let's not dig too much further :-)
We did figure there might be some deep setting for this kind of thing, and that looks like it might be what you've found. The setting you are looking for is TransferConfigure ( See https://arm-software.github.io/CMSIS_5/latest/DAP/html/group__DAP__TransferConfigure.html ), which is command 0x04. In the default gateware that is set to 4096, but we allow the host to override it (the patch zyp sent ignores those overrides).
Your problem at 25MHz looks slightly different but could still be the same issue since we get though 4096 retries pretty quickly at 25MHz...OTOH I would have expected to see that on someone else's device in the real world by now, and we haven't, so we may need to be looking for a different problem for that one.
Anyway, let's see if we can find a way for IAR to set the retries via config, that we can hunt down this 25MHz issue once we've done that if we find a high retries setting doesn't make it go away.
Sounds like a plan.
Is there any chance you could have a hack at it? Neither of us are 'Windows guys' by preference, and we don't really want to go changing the code on the probe as a permanent solution to address shortfalls in implementations (we already established our position in that regard with the whole CMSIS-DAP-name-in-an-interface thing). If we get a clear/clean work-around then it could form the basis of something to present to IAR to get them to put a fixup into their shipping code, but unfortunately we don't see the problem at this end :-/
Sure, I will try to figure this out with configs, first myself, and if I stuck I will ask IAR for help. I also have some questions to IAR, regarding GDB driver implementation. It is to restrictive as well with default config. I am having IDCODE error, similar to cmsis-dap one sometimes.
One more thing though, you are not able to reproduce this issue with that template project I have created? PS, there is also IAR for Linux now, they are coming:) did not had a chance to use it though.
Hi Guys, I have interesting update on this case. I did submit case with IAR about configuring retry count, and while was waiting I wanted to recreate behavior's I was able to see and it looks like you are not able to recreate with your IAR installation (at least you did not confirmed this). So I programmed release FW from github release page 1.3.1 and issue went back, Ok. I looked at the code base and wanted to recreate Zyp's FW I found the line where 0x4 transfer configure request is handled, but before changing anything I just rebuild v1.3.1 as is an programmed it(size of the FW was different from that of the release page build). But now it works fine in IAR up to 20Mhz like with custom FW. So I see the same behavior as you guys, no connection error. I can see 2 possible reasons: yosis used is different from mine. 3rd party submodules for usb etc. are different. What will be nice is you program version 1.3.1 from release page and try connect with IAR. Thanks.
Update, I tried other versions from release page: 1.3.0 works fine up to 20Mhz 1.2.0 works up to 2 Mhz
Hi, I haven't had time to look further into this yet. It's still on my TODO list, but I've got a few other tasks I need to finish up first, so I probably won't get to it for another couple of days.
Different behaviors between different builds starts to sound like some sort of timing issue. Timings are going to be slightly different even between builds from identical sources, since the PnR process is based on randomness.
"builds from identical sources, since the PnR process is based on randomness" did not expect that, good to know, from my previous experience for Zynq in Vavado, system build from the same sources was repeatable, checked on repeating build on different machines.
One other sign that it might be not retry count, In pyocd source default value for it is 0x50, even smaller than in IAR,
Not pushing anyone, just sharing what I did :) I tried to understand CMSIS-DAP protocol and analyze those USB logs that I shared earlier, comparing failed 10mhz, 2mhz success and 2Mhz with warning. It looks like this is related to random NO-Ack for target transfer, and actually those situations then it fails in the right spot throw warnings or fail, at it looks like some timing related like you say.
I also tried to move on to that second issue related to FW download at 25Mhz. but dont recovered my USB sniffer yet but the plan is to try to reproduce this issue with sending same sequence from python script and try to see if I can reproduce something.
I also have noticed one strange behavior on the scope at 2Mhz only: it does not happen on other frequencies.
Will keep you posted.
Decoding your scope capture, that's a write of 0x1e
to DP_ABORT, i.e. what would be generated by the CMSIS-DAP command 08 00 1E 00 00 00
.
The rising slope you see is the turnaround cycle between the ACK from the target and the data to write from the host. The line is undriven during this cycle and weakly pulled up, so the way it looks is normal/expected behavior. At higher clock rates it'll have less time to rise, so it's harder to notice.
Oh, thank you for explanation, I've just looked up ARM debug interface v5.2 to get more details about this.
Is cmsis_test.py is command sequences that are sent thru USB to real device for test purpose?
Yes, that utility will send canned messages to the probe and assess the responses for equivalence to the last field. It's not pretty and it's very clunky, but it does work OK...and it's very useful for that job.
There is no 'nice set' of messages in that utility though, I tend to hack at them as/when I need to test something across the interface.
DAVE
Update, I tried other versions from release page: 1.3.0 works fine up to 20Mhz 1.2.0 works up to 2 Mhz
This is really interesting. Sounds like we've got something unconstrained somewhere which is sometimes happening to be OK and sometimes not. I don't think we've seen any target failing all the way to 25MHz...SWD is async and just issues more WAITs if it can't keep up.
I did some more digging on both issues, NO_ACKs and flash loader error at 25Mhz. First of all on my own rebuild FW I had no NO_ACKs responses at 25Mhz, not a single one! So this is great. I have tried to parse data from 25 Mhz download, I remembered that I can use Wireshark to dump USB packets! If I am parsing correctly after first block download to memory we read DP CTRL/STAT reg and in OK case we have 0xf8000040 response with READOK bit set, in other case we have 0xf80000C0 with both READOK and WDATAERR bits set, any idea what it means? 25mhz_flash_loader_error.json.csv.txt success at 2Mhz.csv.txt
WDATAERR is set when the DP detects a parity error in the data payload of a write command.
This thing here? Hm, so it can happen in any of those blocks right? Error happens to stable always on first set of data sent :( I looks like I can write script to stress it with TRANSFER_BLOCK and check that bit after each block transfer. UPDATE maybe even use scope to see
Since the data is sent after the status field, it is too late to signal a FAULT for the write that caused the parity error, but since the WDATAERR flag is sticky, the following AP access (or certain DP accesses) in the sequence will get a FAULT status code until the error bit is cleared through the DP_ABORT register. I don't see that happening in your log, so at a glance it appears that either the parity error is happening on the very last transfer in the sequence, or we're not handling/reporting errors properly in DAP_TRANSFER_BLOCK.
Today wanted to compare IAR to pyocd comm then writing same tiny flash loader to internal SRAM. did monitor the difference between DAP V2 bulk and V1 HID. Much different, for some reason PyOCD does mix of block transfer write and huge transfer write, interleaving them, 1 block 1 huge transfer 1 block 1 huge transfer ...., a bit strange. Very rare DP STATUS checks. Noticed strange INFO behavior, for some reason sending extra bytes in some of them, same on V1 and V2
IAR for some reason is asking for SWO buffer size even though it is not supported and gets something wrong because it is not implemented code in orbtrace.
I did progressed a little bit further, I was able to create script that replays back Wireshark json usbpcap trace, host side and collects new set of replies. After that I check where that trace deviates. Based on that json file I have crafted one that sends DP STATUS register more often after each block transfer, and right before first block transfer. It looks like error happens on first block transfer.
Just to verify that there is no issue with mule itself, I did used another trace probe I-jet to see what speed it will connect: It did 50Mhz SWD reliably and about 50 for trace I will try to program debug sync strobe on orbtrace then it executes block transfer and try to capture this on the scope. will also try to reduce block transfer + replay this sequence over CMSIS-DAP v2 protocol.
Gentlemen, I did implemented capture of block transfer on the scope and there is something strange happening , I am comparing 2 transfers top one is 20Mhz setting that is working and bottom one that 25 Mhz broken one, I am capturing first block transfer for ram writes to address 0x20010000
and here is a bit more from same packet. Is it something related with wait responses? Will try to parse those scope traces to dap exchange
Here is scope data with python viewer if you are interested scope.zip
Yes, the shorter sequences are commands terminated by WAIT responses.
Command bits are 1101110 followed by 11 for park and turnaround. This decodes to Write AP 0xc. Next three bits are response, either 100 for ACK or 010 for WAIT. In the latter case we clock out a few bits to let the DP state machine finish, idle for a moment, and then retry the command, which succeeds the second time.
0xc would be pointing at the DRW register of the AP, and I expect before this capture, CSW and TAR were written to configure the AP for writes starting at the given address with autoincrement. A series of writes to DRW back to back is then what we'd expect to see, and when we're writing data faster than the AP can keep up with, it's also normal to get WAIT responses as a method of flow control backpressure.
Could you share the replay script? Does it reliably reproduce the issue?
I actually suspect this is the same issue as blackmagic-debug/blackmagic#1117 (fixed by blackmagic-debug/blackmagic#1182).
Refer to the section in ADIv5.2 on write buffering:
The short story is that when you read CTRL/STAT, if there's any pending writes, they'll be cancelled and the WDATAERR flag set. This explains why despite there not being any FAULT responses, you still see a WDATAERR as soon as you check for it; it's because checking for it is what causes the error. At lower speeds the writes have time to finish before the CTRL/STAT read is performed, and therefore aren't cancelled.
Thank you, I did experimented a bit more so if I craft block transfer that sends 1 word next STATUS read shows no error. Then I craft block transfer with 2 words status read shows error but more over next block transfer return FAIL, so it detects error on next transfer, 3 words same thing, detects error. original with 14 words does not detect error. parse_orbtrace.zip replay stuff is attached, for replay I was messing with this file
I suggest adding a dummy read of RDBUFF in front of every CTRL/STAT read. You could either add a 05 00 01 0e
in front of each 05 00 01 06
, or replace each with 05 00 02 0e 06
to do both with a single command.
I will definitely try but if wee look at the scope it stalls with wait exactly one time for each word (we are writing to ram here) and we can assume there is space for 2 words, no stalls for 2 words. And then I can see on the scope next transfer, block transfer in my case and turnaround on USB is way way bigger than those stalls. I can agree that that issue can happen if we will use huge transfer command and inject DP status read at the end of multiple AP 3 writes, isn't it?
I did try, and surprise surprise it somehow improved things. It read OK status on 3 words transfer. Maybe it needs more active clocks after transfer ? Don't know what is the reason actually
Yeah, I guess it could be an issue with not running enough idle cycles after a transfer so that a write gets stuck pending until the next command resumes the clock.
I'm counting four or five in your capture, and ADIv5.2 calls for eight:
Hi, first, thanks for the great project, just received my new orbtrace-mini with M7 mule.
Trying to use probe in IAR, with different drivers, GDB(different backends: PyOCD, OpenOCD) and CMSYS-DAP. At first I was not successful with CMSYS-DAP, but later was able to make it work with poking some settings. After some trial and error It looks like it mostly worked with 2Mhz clock. It does not work with autodetect. I think One time it worked at 30Mhz!!!
I have an impression if it connected at 2Mhz you can increase clock higher next time. but not wise versa. I have attached some logs of communication for CMSIS-DAP driver. Maybe you can spot something important there. It will be nice to make this HW flawless:)
cspycomm_success_with_download_2Mhz.log cspycomm_success_with_some_errors.log cspycomm_failed_at_autodetect.log cspycomm_success_connect_at_2MHz.log cspycomm_failed_with_enabled_flash_load.log