Closed andresv closed 3 years ago
Does this issue also occur with rtt-target
+ cargo-embed
? If yes, then this could be an upstream issue and should be reported to the probe-rs issue tracker.
I'm only familiar with the low level details of CMSIS-DAP + SWD (I'm not sure which probe you are using. st-link?) but RTT running in parallel to device-side Flash operations seems like it should be allowed at the architecture (ARM Cortex-M) level. Now if the stm32 chips in particular disallow this I wouldn't know.
Hi @andresv,
Does this problem persist when using the v0.2.0
of probe-run
and defmt
?
I am currently running into this with v0.1.x. I'll try to upgrade my stack to v0.2.0, and check if it still persists.
I would expect so though, as i had a chat with the probe.rs guys, who told me that it might be an ST-link v2 issue, that cannot be resolved by them.
Confirmed, this is still an issue with
probe-run --version
0.2.1
supported defmt version: 0.2
and defmt = "0.2.0"
Okay. Good to know it still persists.
I would expect so though, as i had a chat with the probe.rs guys, who told me that it might be an ST-link v2 issue, that cannot be resolved by them.
@MathiasKoch: Could you please quickly summarize what the probe-rs people told you? And if you think this could be resolved on our side?
MathiasKoch
(HOST) DEBUG Scanning at exact address: 0x20000000 └─ probe_rs_rtt::rtt @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-rtt-0.10.1/src/rtt.rs:163 (HOST) DEBUG Read mem 8 bit, address=20000000, length=24 └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:859 (HOST) DEBUG Read mem 8 bit, address=20000018, length=24 └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:859 (HOST) DEBUG Read mem 8 bit, address=08008dfc, length=64 └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:859 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN too many retries, giving up └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:715 (HOST) DEBUG Read mem 32 bit, address=e0002000, length=4 └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:793 (HOST) DEBUG Current device mode: Jtag └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:402 Error: Error communicating with probe: An error with the usage of the probe occured Caused by: 0: An error with the usage of the probe occured 1: An error specific to a probe type occured 2: Command failed with status SwdDpWait
dirbaio
https://github.com/probe-rs/probe-rs/pull/370/commits/e889c664ca4dda039c153b27aad51c1f125065d0 fixed in 0.10
yatekii
yep, as you can see, new probe-rs retries :) but apparently not long enough
dirbaio
still fails? O_o @yatekii:matrix.org yeah, see log =)
MathiasKoch
for attempt in 0..13 { why 13 ? :p @yatekii:matrix.org
MathiasKoch
That changed it a bit
(HOST) DEBUG Scanning at exact address: 0x20000000 └─ probe_rs_rtt::rtt @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-rtt-0.10.1/src/rtt.rs:163 (HOST) DEBUG Read mem 8 bit, address=20000000, length=24 └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:859 (HOST) DEBUG Read mem 8 bit, address=20000018, length=24 └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:859 (HOST) DEBUG Read mem 8 bit, address=08008dfc, length=64 └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:859 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN too many retries, giving up └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:715 (HOST) DEBUG Read mem 32 bit, address=e0002000, length=4 └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:793 (HOST) DEBUG Current device mode: Jtag └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:402 Error: Error communicating with probe: An error with the usage of the probe occured
Caused by: 0: An error with the usage of the probe occured 1: An error specific to a probe type occured 2: Command failed with status SwdDpWait
this one
dirbaio
it does exponential backoff it's supposed to wait for up to ~1s very strange that the core stays busy for THAT long
MathiasKoch
feels like they are instant.. defo not 1 sec between the last two log statements atleast
dirbaio
maybe it's buggy, or maybe what's failing is in another code path? ah no, it is retrying is it not sleeping?
MathiasKoch
the uC? it basically happens in
fn wait(&mut self) -> Result<(), Error> { while self.sr.sr().read().bsy().bit_is_set() {} self.status() }
dirbaio
no, probe-rs https://github.com/probe-rs/probe-rs/blob/master/probe-rs/src/probe/stlink/mod.rs#L714
MathiasKoch
doesn't feel like it.. Will try to patch probe-rs and have a check
dirbaio
the last sleep is 100<<11 which is ~0.2s the total sleeps should add up to ~0.4s
MathiasKoch
ahh, in that case it might be sleeping, not sure i can distinguish at that level ;)
dirbaio
maybe increase the counter :P the retry limit* in that case it'll be obvious?
MathiasKoch
i'll se it that helps
dirbaio
or maybe sleeping is not the right approach? maybe it's better to retry in a hot loop? like, if the core is busy 99% of the time retrying in a tight loop maximizes the chances of hitting the 1%
MathiasKoch
doesn't seem like it will ever get ok? now i have increased it to 20, and there are VERY long between.. Still retrying
dirbaio
:( try making it a hot loop?
MathiasKoch
Same result
for attempt in 0..250000 { self.device.write(cmd, write_data, read_data, timeout)?; match Status::from(read_data[0]) { Status::JtagOk => return Ok(()), Status::SwdDpWait => { // log::warn!("send_jtag_command {} got SwdDpWait, retrying", cmd[0]) } Status::SwdApWait => { // log::warn!("send_jtag_command {} got SwdApWait, retrying", cmd[0]) } status => { // log::warn!("send_jtag_command {} failed: {:?}", cmd[0], status); return Err(From::from(StlinkError::CommandFailed(status))); } } // Sleep with exponential backoff. std::thread::sleep(Duration::from_micros(100)); }
too many retries, giving up after 25 seconds
Guess the issue lies elsewhere?
yatekii
yeah I mean the problem is that we have no contro9l over the API here the ST-Link just tells us some error code and we have to trust it and believe that it fixes the issue MathiasKoch** it basically happens in
fn wait(&mut self) -> Result<(), Error> { while self.sr.sr().read().bsy().bit_is_set() {} self.status() } what if your read here never resolves to bit set and thus it is in waiting state forever?
MathiasKoch
Yeah, that might be a valid issue.. But would that result in RTT crashing? i would have expected it to just be kinda idle?
dirbaio
yeah the mcu hanging somewhere shouldn't result in probe-rs exploding
MathiasKoch
Well, commenting out that wait doesn't seem to change anything
yatekii
well, that is something the toplevel user has to handle :) that's why we return an error := also if it does not resolve that wait for that long, it might as well just shut down and display a hint I'll def regard that in cargo embed v2 :)
dirbaio
the DP getting stuck is something the user has to handle? how should the user handle it? also https://developer.arm.com/documentation/ddi0413/c/debug-access-port/sw-dp/protocol-description Normally, when a debugger receives a WAIT response it retries the same operation. This enables it to process data as quickly as possible. However, if several retries have been attempted, and time permitted for a slow interconnection and memory system to respond, if appropriate, the debugger might write to the ABORT register. This signals to the active access port that it must terminate the transfer that it is currently attempting. An access port implementation might be unable to terminate a transfer on its ASIC interface. However, on receiving an ABORT request the access port must free up the SWD interface. we're doing the retry thing, but not the ABORT thing
MathiasKoch
How would i go about debugging this further? My knowledge of debugger probes is zero to none btw :)
dirbaio
the DP getting stuck is something the user has to handle? well, how should we handle it? ;) I mean we can return an error "DP stuck". but what if it indeed takes longer? no way to get around. not nice!
dirbaio
pyocd does do ABORT https://github.com/pyocd/pyOCD/blob/master/pyocd/probe/stlink/stlink.py#L72 https://github.com/pyocd/pyOCD/blob/db238bb515f8cc4d5d8825ade67cec015b1377a3/pyocd/coresight/dap.py#L798
yatekii
well, the thing is that the HLA of the ST-Link should already do this. we do it for J-Link already. so maybe we have to do it for ST-Link too, I am not sure it's a blckbox
dirbaio
yeah it's horrible
yatekii
I mean, some registers litterally can't be written via ST-Link. We should definitely try if we can fix it like this :) but I do not believe in success tbh. because for the J-Link the wait/fault responses were crazy amounts and it didn't work before tiwalun implemented all the bits to handle them properly. we never had that problem with ST-Link/CMSIS-DAP because they do it for us already so I do not have high hopes but it's worth a shot
dirbaio
you think the stlink itself already does the retry/abort?
yatekii
yap
dirbaio
and if it returns SwdDpWait to us it means it already did retries and gave up? hmm :S the retry code I added did fix issues for me though with nrf52+stlinkv2 haven't personally seen a SwdDpWait error since
yatekii
I mean I was always in the belief that it works like this. I wont rule out I am dead wrong lol
dirbaio
https://github.com/pyocd/pyOCD/issues/714 stuff's broken on pyocd too lol
dirbaio
let's port hs-probe-firmware to the chips used in stlinks, and tell people to flash them 😜 no more stlink nonsense
yatekii
^ I already discussed that with tiwalun a few times :) I mean there is no good way unfortunately :D what sets me off about this error is this: flashing works fine without issue but that uses the flash too so I feel like this is really some kind of user error that keeps the bus busy forever. then again idk at all :D
dirbaio
you mean something like: the fw running inside the mcu is spamming the bus, so swd never has a chance to do stuff? could be 🤷♂️
yatekii
that is my assumption
dirbaio
it's horrible design if the mcu has higher bus priority than swd though O_o
yatekii
yeah, but there is different layers in the SWD layer too :) I believe some have prio over others the thing is: we might evenm be able to abort the flash procedure from SWD but do we want that? that'd lead to absolutely unpredictable FW weirdness in some cases involving the debugger
MathiasKoch
Hmm.. Updated the stlink FW.. No luck
dirbaio
I thought the ABORT aborts ap/dp transaction, not mcu transactions
yatekii
ah, I didn't mean that. I mean we could with some other command.
dirbaio
halting the core? but probe-rs does that already right?
dirbaio
I thought the ABORT aborts ap/dp transaction, not mcu transactions ABORT does not really help if the CPU locks forever. so yeah ^^
dirbaio
halting the core? but probe-rs does that already right? yeah but at points where specifically instructed to do so. not just randomly on some timeout ^^ or rather: halting is okay but not altering the control flow at some random point imagine trying to use flash from your FW and the debugger always cancels the transfer because it takes a split second too long :D I would become mad nuts :D
dirbaio
yeah it sucks 💩 I'd try the ABORT in the retry loop though, it's something we haven't tried and the ARM docs explicitly say to do that
yatekii
yes, ofc, I agree it's worth a try!
So the consensus is to try the ABORT
, but more generally just not use this debugger?
Basically, the ST-Links are unpredictable because they do not have a documented interface. The issues we have in probe-rs are issues with other libs sometimes too and sometimes they handle things better. It's something that will get better over time with more time thrown onto the problem. As long as probe-rs is unpaid work, I doubt anyone is willing to invest substantial amounts of time for the remaining 5% of issues when one can use a properly documented probe and invest that time into solving actually unsolved problems.
We can try the ABORT fix; it should not be hard. We'll see.
Thanks for your answer @Yatekii! To help people stumbling upon this issue, could you maybe tell what probe would you recommend from a user experience and developer experience perspective?
Else I'd close this here and defer to probe-rs
. Please send a small update in case there is progress!
So, hardware wise, the ST-Link is not bad. The same goes for the J-Link. But they both have a closed source interface which was partially reverse engineered. The third probe type of the ones we support at the moment is the CMSIS-DAP. There is v1 and v2. v1 is limited in speed by USB HID (~20kBit actual transfer speed). v2 is actually really fast and can, if the chip allows it go to a few hundred kbit/s. Then it hits the limit of SWD and the on chip flash. The cool thing about CMSIS-DAP is that it is open source. So we decide about every detail of the dependency chain basically :) The problem with CMSIS-DAP is that there is more or less only "devkit-probes" which kind of require you to flash the firmware yourself and are a barebones PCB. For most people this is not an issue.
At this point I'd like tho throw in the hs-probe: https://docs.google.com/forms/d/1id0tnkM3Ail0QE4CNcp19bbWmHBvWrO-tu5cQtQ8Azc/ into the ring. We are currently finalizing the orders for the first batch which got only held up by Chinese NY. The hs-probe features a 100% Rust firmware with an implementation of CMSIS-DAP v1 and v2. It can also be used to program FPGAs. Currently we make no money of this (we have a little bit of safety margin, in just in case which would go into the probe-rs fund), so I hope it's ok to advertise this.
Thank you for your thoughts on the probes!
And hs-probe
seems like an awesome project 🤯 Props for this 👏🏾
All props to @korken89, @disasm and @adamgreig :)
defmt
version:overlap-fix#0ceade69
I am using stm32g081. In my app first couple of flash pages are allocated to bootloader and configuration. When using
defmt
and at the same time stm32 tries to write to configuration page, communication with the probe will be lost:Maybe it is possible to reattach to the target without flashing it again? Currently I run it as
cargo rrb um8
Edit:
Unfortunately currently it is impossible:
probe-run um8 --chip STM32G081KBUxN --defmt --no-flash