knurling-rs / defmt

Efficient, deferred formatting for logging on embedded systems
https://defmt.ferrous-systems.com/
Apache License 2.0
804 stars 73 forks source link

probe-rs loses connection with stm32 if stm performs flash write #140

Closed andresv closed 3 years ago

andresv commented 4 years ago

defmt version: overlap-fix#0ceade69

I am using stm32g081. In my app first couple of flash pages are allocated to bootloader and configuration. When using defmt and at the same time stm32 tries to write to configuration page, communication with the probe will be lost:

0.000000 INFO app info: v0.9.2 hw: 8 ts: 0
Error: A core architecture specific error occured

Caused by:
    0: Failed to read register DRW at address 0x0000000c because: An error specific to a probe type occured
    1: An error specific to a probe type occured
    2: Command failed with status SwdDpWait

Maybe it is possible to reattach to the target without flashing it again? Currently I run it as cargo rrb um8

Edit:

Unfortunately currently it is impossible: probe-run um8 --chip STM32G081KBUxN --defmt --no-flash

error: The argument '--defmt' cannot be used with '--no-flash'
japaric commented 4 years ago

Does this issue also occur with rtt-target + cargo-embed? If yes, then this could be an upstream issue and should be reported to the probe-rs issue tracker.

I'm only familiar with the low level details of CMSIS-DAP + SWD (I'm not sure which probe you are using. st-link?) but RTT running in parallel to device-side Flash operations seems like it should be allowed at the architecture (ARM Cortex-M) level. Now if the stm32 chips in particular disallow this I wouldn't know.

Urhengulas commented 3 years ago

Hi @andresv, Does this problem persist when using the v0.2.0 of probe-run and defmt?

MathiasKoch commented 3 years ago

I am currently running into this with v0.1.x. I'll try to upgrade my stack to v0.2.0, and check if it still persists.

I would expect so though, as i had a chat with the probe.rs guys, who told me that it might be an ST-link v2 issue, that cannot be resolved by them.

MathiasKoch commented 3 years ago

Confirmed, this is still an issue with

probe-run --version
0.2.1
supported defmt version: 0.2

and defmt = "0.2.0"

Urhengulas commented 3 years ago

Okay. Good to know it still persists.

I would expect so though, as i had a chat with the probe.rs guys, who told me that it might be an ST-link v2 issue, that cannot be resolved by them.

@MathiasKoch: Could you please quickly summarize what the probe-rs people told you? And if you think this could be resolved on our side?

MathiasKoch commented 3 years ago

MathiasKoch

 (HOST) DEBUG Scanning at exact address: 0x20000000
└─ probe_rs_rtt::rtt @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-rtt-0.10.1/src/rtt.rs:163
(HOST) DEBUG Read mem 8 bit, address=20000000, length=24
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:859
(HOST) DEBUG Read mem 8 bit, address=20000018, length=24
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:859
 (HOST) DEBUG Read mem 8 bit, address=08008dfc, length=64
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:859
 (HOST) WARN  send_jtag_command 242 got SwdDpWait, retrying
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700
 (HOST) WARN  send_jtag_command 242 got SwdDpWait, retrying
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700
 (HOST) WARN  send_jtag_command 242 got SwdDpWait, retrying
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700
 (HOST) WARN  send_jtag_command 242 got SwdDpWait, retrying
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700
 (HOST) WARN  send_jtag_command 242 got SwdDpWait, retrying
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700
 (HOST) WARN  send_jtag_command 242 got SwdDpWait, retrying
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700
 (HOST) WARN  send_jtag_command 242 got SwdDpWait, retrying
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700
 (HOST) WARN  send_jtag_command 242 got SwdDpWait, retrying
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700
 (HOST) WARN  send_jtag_command 242 got SwdDpWait, retrying
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700
 (HOST) WARN  send_jtag_command 242 got SwdDpWait, retrying
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700
(HOST) WARN  send_jtag_command 242 got SwdDpWait, retrying
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700
 (HOST) WARN  send_jtag_command 242 got SwdDpWait, retrying
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700
 (HOST) WARN  send_jtag_command 242 got SwdDpWait, retrying
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:700
 (HOST) WARN  too many retries, giving up
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:715
 (HOST) DEBUG Read mem 32 bit, address=e0002000, length=4
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:793
 (HOST) DEBUG Current device mode: Jtag
└─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs->0.10.1/src/probe/stlink/mod.rs:402
Error: Error communicating with probe: An error with the usage of the probe occured

Caused by:
   0: An error with the usage of the probe occured
   1: An error specific to a probe type occured
   2: Command failed with status SwdDpWait

dirbaio

https://github.com/probe-rs/probe-rs/pull/370/commits/e889c664ca4dda039c153b27aad51c1f125065d0 fixed in 0.10

yatekii

yep, as you can see, new probe-rs retries :) but apparently not long enough

dirbaio

still fails? O_o @yatekii:matrix.org yeah, see log =)

MathiasKoch

for attempt in 0..13 { why 13 ? :p @yatekii:matrix.org

MathiasKoch

That changed it a bit

(HOST) DEBUG Scanning at exact address: 0x20000000 └─ probe_rs_rtt::rtt @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-rtt-0.10.1/src/rtt.rs:163 (HOST) DEBUG Read mem 8 bit, address=20000000, length=24 └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:859 (HOST) DEBUG Read mem 8 bit, address=20000018, length=24 └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:859 (HOST) DEBUG Read mem 8 bit, address=08008dfc, length=64 └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:859 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN send_jtag_command 242 got SwdDpWait, retrying └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:700 (HOST) WARN too many retries, giving up └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:715 (HOST) DEBUG Read mem 32 bit, address=e0002000, length=4 └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:793 (HOST) DEBUG Current device mode: Jtag └─ probe_rs::probe::stlink @ /home/mathias/.cargo/registry/src/github.com-1ecc6299db9ec823/probe-rs-0.10.1/src/probe/stlink/mod.rs:402 Error: Error communicating with probe: An error with the usage of the probe occured

Caused by: 0: An error with the usage of the probe occured 1: An error specific to a probe type occured 2: Command failed with status SwdDpWait

this one

dirbaio

it does exponential backoff it's supposed to wait for up to ~1s very strange that the core stays busy for THAT long

MathiasKoch

feels like they are instant.. defo not 1 sec between the last two log statements atleast

dirbaio

maybe it's buggy, or maybe what's failing is in another code path? ah no, it is retrying is it not sleeping?

MathiasKoch

the uC? it basically happens in

fn wait(&mut self) -> Result<(), Error> {
   while self.sr.sr().read().bsy().bit_is_set() {}
   self.status()
}

dirbaio

no, probe-rs https://github.com/probe-rs/probe-rs/blob/master/probe-rs/src/probe/stlink/mod.rs#L714

MathiasKoch

doesn't feel like it.. Will try to patch probe-rs and have a check

dirbaio

the last sleep is 100<<11 which is ~0.2s the total sleeps should add up to ~0.4s

MathiasKoch

ahh, in that case it might be sleeping, not sure i can distinguish at that level ;)

dirbaio

maybe increase the counter :P the retry limit* in that case it'll be obvious?

MathiasKoch

i'll se it that helps

dirbaio

or maybe sleeping is not the right approach? maybe it's better to retry in a hot loop? like, if the core is busy 99% of the time retrying in a tight loop maximizes the chances of hitting the 1%

MathiasKoch

doesn't seem like it will ever get ok? now i have increased it to 20, and there are VERY long between.. Still retrying

dirbaio

:( try making it a hot loop?

MathiasKoch

Same result

for attempt in 0..250000 {
    self.device.write(cmd, write_data, read_data, timeout)?;

    match Status::from(read_data[0]) {
        Status::JtagOk => return Ok(()),
        Status::SwdDpWait => {
            // log::warn!("send_jtag_command {} got SwdDpWait, retrying", cmd[0])
        }
        Status::SwdApWait => {
            // log::warn!("send_jtag_command {} got SwdApWait, retrying", cmd[0])
        }
        status => {
            // log::warn!("send_jtag_command {} failed: {:?}", cmd[0], status);
            return Err(From::from(StlinkError::CommandFailed(status)));
        }
    }

    // Sleep with exponential backoff.
    std::thread::sleep(Duration::from_micros(100));
}

too many retries, giving up after 25 seconds

Guess the issue lies elsewhere?

yatekii

yeah I mean the problem is that we have no contro9l over the API here the ST-Link just tells us some error code and we have to trust it and believe that it fixes the issue MathiasKoch** it basically happens in

fn wait(&mut self) -> Result<(), Error> { while self.sr.sr().read().bsy().bit_is_set() {} self.status() } what if your read here never resolves to bit set and thus it is in waiting state forever?

MathiasKoch

Yeah, that might be a valid issue.. But would that result in RTT crashing? i would have expected it to just be kinda idle?

dirbaio

yeah the mcu hanging somewhere shouldn't result in probe-rs exploding

MathiasKoch

Well, commenting out that wait doesn't seem to change anything

yatekii

well, that is something the toplevel user has to handle :) that's why we return an error := also if it does not resolve that wait for that long, it might as well just shut down and display a hint I'll def regard that in cargo embed v2 :)

dirbaio

the DP getting stuck is something the user has to handle? how should the user handle it? also https://developer.arm.com/documentation/ddi0413/c/debug-access-port/sw-dp/protocol-description Normally, when a debugger receives a WAIT response it retries the same operation. This enables it to process data as quickly as possible. However, if several retries have been attempted, and time permitted for a slow interconnection and memory system to respond, if appropriate, the debugger might write to the ABORT register. This signals to the active access port that it must terminate the transfer that it is currently attempting. An access port implementation might be unable to terminate a transfer on its ASIC interface. However, on receiving an ABORT request the access port must free up the SWD interface. we're doing the retry thing, but not the ABORT thing

MathiasKoch

How would i go about debugging this further? My knowledge of debugger probes is zero to none btw :)

dirbaio

the DP getting stuck is something the user has to handle? well, how should we handle it? ;) I mean we can return an error "DP stuck". but what if it indeed takes longer? no way to get around. not nice!

dirbaio

pyocd does do ABORT https://github.com/pyocd/pyOCD/blob/master/pyocd/probe/stlink/stlink.py#L72 https://github.com/pyocd/pyOCD/blob/db238bb515f8cc4d5d8825ade67cec015b1377a3/pyocd/coresight/dap.py#L798

yatekii

well, the thing is that the HLA of the ST-Link should already do this. we do it for J-Link already. so maybe we have to do it for ST-Link too, I am not sure it's a blckbox

dirbaio

yeah it's horrible

yatekii

I mean, some registers litterally can't be written via ST-Link. We should definitely try if we can fix it like this :) but I do not believe in success tbh. because for the J-Link the wait/fault responses were crazy amounts and it didn't work before tiwalun implemented all the bits to handle them properly. we never had that problem with ST-Link/CMSIS-DAP because they do it for us already so I do not have high hopes but it's worth a shot

dirbaio

you think the stlink itself already does the retry/abort?

yatekii

yap

dirbaio

and if it returns SwdDpWait to us it means it already did retries and gave up? hmm :S the retry code I added did fix issues for me though with nrf52+stlinkv2 haven't personally seen a SwdDpWait error since

yatekii

I mean I was always in the belief that it works like this. I wont rule out I am dead wrong lol

dirbaio

https://github.com/pyocd/pyOCD/issues/714 stuff's broken on pyocd too lol

dirbaio

let's port hs-probe-firmware to the chips used in stlinks, and tell people to flash them 😜 no more stlink nonsense

yatekii

^ I already discussed that with tiwalun a few times :) I mean there is no good way unfortunately :D what sets me off about this error is this: flashing works fine without issue but that uses the flash too so I feel like this is really some kind of user error that keeps the bus busy forever. then again idk at all :D

dirbaio

you mean something like: the fw running inside the mcu is spamming the bus, so swd never has a chance to do stuff? could be 🤷‍♂️

yatekii

that is my assumption

dirbaio

it's horrible design if the mcu has higher bus priority than swd though O_o

yatekii

yeah, but there is different layers in the SWD layer too :) I believe some have prio over others the thing is: we might evenm be able to abort the flash procedure from SWD but do we want that? that'd lead to absolutely unpredictable FW weirdness in some cases involving the debugger

MathiasKoch

Hmm.. Updated the stlink FW.. No luck

dirbaio

I thought the ABORT aborts ap/dp transaction, not mcu transactions

yatekii

ah, I didn't mean that. I mean we could with some other command.

dirbaio

halting the core? but probe-rs does that already right?

dirbaio

I thought the ABORT aborts ap/dp transaction, not mcu transactions ABORT does not really help if the CPU locks forever. so yeah ^^

dirbaio

halting the core? but probe-rs does that already right? yeah but at points where specifically instructed to do so. not just randomly on some timeout ^^ or rather: halting is okay but not altering the control flow at some random point imagine trying to use flash from your FW and the debugger always cancels the transfer because it takes a split second too long :D I would become mad nuts :D

dirbaio

yeah it sucks 💩 I'd try the ABORT in the retry loop though, it's something we haven't tried and the ARM docs explicitly say to do that

yatekii

yes, ofc, I agree it's worth a try!

Urhengulas commented 3 years ago

So the consensus is to try the ABORT, but more generally just not use this debugger?

Yatekii commented 3 years ago

Basically, the ST-Links are unpredictable because they do not have a documented interface. The issues we have in probe-rs are issues with other libs sometimes too and sometimes they handle things better. It's something that will get better over time with more time thrown onto the problem. As long as probe-rs is unpaid work, I doubt anyone is willing to invest substantial amounts of time for the remaining 5% of issues when one can use a properly documented probe and invest that time into solving actually unsolved problems.

We can try the ABORT fix; it should not be hard. We'll see.

Urhengulas commented 3 years ago

Thanks for your answer @Yatekii! To help people stumbling upon this issue, could you maybe tell what probe would you recommend from a user experience and developer experience perspective?

Else I'd close this here and defer to probe-rs. Please send a small update in case there is progress!

Yatekii commented 3 years ago

So, hardware wise, the ST-Link is not bad. The same goes for the J-Link. But they both have a closed source interface which was partially reverse engineered. The third probe type of the ones we support at the moment is the CMSIS-DAP. There is v1 and v2. v1 is limited in speed by USB HID (~20kBit actual transfer speed). v2 is actually really fast and can, if the chip allows it go to a few hundred kbit/s. Then it hits the limit of SWD and the on chip flash. The cool thing about CMSIS-DAP is that it is open source. So we decide about every detail of the dependency chain basically :) The problem with CMSIS-DAP is that there is more or less only "devkit-probes" which kind of require you to flash the firmware yourself and are a barebones PCB. For most people this is not an issue.

At this point I'd like tho throw in the hs-probe: https://docs.google.com/forms/d/1id0tnkM3Ail0QE4CNcp19bbWmHBvWrO-tu5cQtQ8Azc/ into the ring. We are currently finalizing the orders for the first batch which got only held up by Chinese NY. The hs-probe features a 100% Rust firmware with an implementation of CMSIS-DAP v1 and v2. It can also be used to program FPGAs. Currently we make no money of this (we have a little bit of safety margin, in just in case which would go into the probe-rs fund), so I hope it's ok to advertise this.

Urhengulas commented 3 years ago

Thank you for your thoughts on the probes!

And hs-probe seems like an awesome project 🤯 Props for this 👏🏾

Yatekii commented 3 years ago

All props to @korken89, @disasm and @adamgreig :)