knurling-rs / probe-run

Run embedded programs just like native ones
Apache License 2.0
642 stars 75 forks source link

ST-Link v2.1 crashing? #116

Closed ianrrees closed 3 years ago

ianrrees commented 3 years ago

Describe the bug I'm using a ST-Link v2.1 with an ATSAMD21, this combination has been working fine for a few weeks with probe-run and defmt. The first time I do probe-run --verbose --chip ATSAMD21G18AU path/to/firmware, it flashes and successfully opens up the defmt-rtt console successfully. After about 20 seconds, probe-run exits with:

<normal defmt RTT output>
RTT error: Error communicating with probe: An error with the usage of the probe occured
Error: An error with the usage of the probe occured

Caused by:
    0: An error specific to a probe type occured
    1: Command failed with status JtagNoDeviceConnected

Once this failure has happened, subsequent probe-run invocations give:

  (HOST) DEBUG RAM region: 0x20000000-0x20007FFF
└─ probe_run @ src/main.rs:144
  (HOST) WARN  insufficient DWARF info; compile your program with `debug = 2` to enable location info
└─ probe_run @ src/main.rs:169
  (HOST) DEBUG section `.data` is in RAM at 0x20000000-0x20000147
└─ probe_run @ src/main.rs:204
  (HOST) DEBUG section `.bss` is in RAM at 0x20000150-0x20004C07
└─ probe_run @ src/main.rs:204
  (HOST) DEBUG section `.uninit` is in RAM at 0x20004C08-0x20005007
└─ probe_run @ src/main.rs:204
  (HOST) DEBUG vector table: VectorTable { location: 0, initial_sp: 20008000, reset: b1, hard_fault: 55a9 }
└─ probe_run @ src/main.rs:268
  (HOST) DEBUG found 1 probes
└─ probe_run @ src/main.rs:298
  (HOST) DEBUG opened probe
└─ probe_run @ src/main.rs:303
Error: An error with the usage of the probe occured

Caused by:
    0: An error specific to a probe type occured
    1: Command failed with status JtagNoDeviceConnected

To get a successful flash, disconnecting and reconnecting the USB to the ST-Link seems necessary.

Expected and observed behavior I'd expect the session to not fail, but if it did I would expect to get some more detailed logging out with the --verbose flag specified, to be able to dig deeper in to the issue.

config.toml

[build]
target = "thumbv6m-none-eabi"

[target.thumbv6m-none-eabi]
runner = "probe-run --verbose --chip ATSAMD21G18AU"
rustflags = [
  "-C", "link-arg=-Tlink.x",
  "-C", "link-arg=-Tdefmt.x"
]

Probe details ST-Link v2.1, just updated to firmware v2j37m26 which appears to be the latest, upgrade didn't change symptoms.

probe-rs-cli list
The following devices were found:
[0]: STLink V2-1 (VID: 0483, PID: 3752, Serial: 0670FF505055877267123015, STLink)

Operating System: Ubuntu 20.04.1 LTS

japaric commented 3 years ago

After about 20 seconds, probe-run exits with:

are you using the WFI (Wait For Interrupt) or WFE (Wait For Event) instructions in your program?

Those instructions are available in the cortex-m crate as cortex_m::asm::wfi and cortex_m::asm::wfe. If you are using the RTIC framework, WFI is automatically called when no idle background task is defined.

I ask because I have read before that some STM32 chips have a debug register to enable / disable debugging when the device is in sleep mode (WFI / WFE) and the default value of the register is "disable". which may be related to:

To get a successful flash, disconnecting and reconnecting the USB to the ST-Link seems necessary.


is your program writing to Flash memory?

we have had reports before saying that flash write interfere with the probe operation. see knurling-rs/defmt#140


this combination has been working fine for a few weeks with probe-run and defmt

last week we released probe-run 0.1.6 which uses probe-rs 0.10.0. The previous version of probe-run (0.1.5) used probe-rs 0.8.0. probe-rs did a full rewrite of their st-link backend between 0.9.0 and 0.10.0; that may be related.

could you try probe-run 0.1.5 and report if the problems persist with that version?

ianrrees commented 3 years ago

Thanks @japaric , sorry I haven't had a chance to follow up today but can say that the program is not writing to flash. I'll check for WFI/WFE tomorrow, and try probe-run 0.1.5.

ianrrees commented 3 years ago

are you using the WFI (Wait For Interrupt) or WFE (Wait For Event) instructions in your program?

No

could you try probe-run 0.1.5 and report if the problems persist with that version?

SAMD21 support was added to probe-rs after version 0.9.0; I had been using commit 21b26ad020cf07cd797593c503a7c8f3a8530fca successfully, but now rebuilding probe-run 0.1.5 (or latest main) with that version of probe-rs (and probe-rs-rtt pointing at it as well), I'm seeing the same problem. So, the cause of the failure after ~20 seconds is a mystery...

Does JtagNoDeviceConnected mean that the target chip is not connected, or the probe is not connected? If I disconnect the target chip when the ST-Link is in a good state then probe-run, I see JtagGetIdcodeError, however once the probe is in the bad state, probe-run results in JtagNoDeviceConnected regardless of whether the target chip is connected or not. The chip seems to be running fine through all of this.

@Dirbaio shared a modification to get more logging out of probe-rs, with that in place and the ST-Link in the bad state, I ran probe-run --verbose /path/to/bin and got https://gist.github.com/ianrrees/47fbfb8e0915ed1f0c0825464eddeec9

ianrrees commented 3 years ago

I've borrowed a ST-Link v2, it's working fine with the same target. So, it seems there is either some hardware issue with my probe, or a software/firmware problem with the ST-Link v2.1. Will try to investigate a bit further.

japaric commented 3 years ago

So, the cause of the failure after ~20 seconds is a mystery...

Does it always fail after 20 seconds regardless of the program behavior? e.g. if your program does loop { asm::nop() } does it still fail after 20 seconds? If yes, this would help us discard the possibility of the error being triggered by the device itself.

Does JtagNoDeviceConnected mean that the target chip is not connected, or the probe is not connected?

"NoDevice" sounds like the (ST-LINK) probe did not find the STM32 micro. If the probe was not detected then you would probably get a USB related error early on.

"Jtag" sounds a bit suspicious. If the ST-LINK is connected to the STM32 with 2 wires (+ground): SWIO and SWCLK then only SWD communication is possible between the two -- JTAG requires more wires. It could also be that probe-rs tries SWD first and then falls back and tries JTAG if SWD didn't work.

I see JtagGetIdcodeError, however once the probe is in the bad state

iirc, fetching the IDCODE of the device occurs after initialization so in this case the probe seems to detect the STM32 chip but its SWD subsystem is not responding accordingly to SWD commands issued by the probe. It could be that probe-rs is not clearing the error state on the probe itself during initialization; that would explain why running probe-run again still fails and power cycling the probe fixes the issue.

I've borrowed a ST-Link v2, it's working fine with the same target.

Would be interesting to get logs from this successful setup and compare them to the logs from the setup that fails.

So, it seems there is either some hardware issue with my probe, or a software/firmware problem with the ST-Link v2.1.

There are a lot of ST-Link clones out there so I wouldn't be surprised if they are many version of the ST-Link firmware and each behaves slightly differently ..

ianrrees commented 3 years ago

Today, I learned that a colleague has another ST Nucleo dev board, still in the original packaging, identical to the one that my problematic probe came from. I intend to borrow it next week, to see if there might be something physically wrong with this one.

Does it always fail after 20 seconds regardless of the program behavior?

Yes, though the 20 seconds is not exact at all. I have two other debuggers (a ST-Link v2 and a J-Link Ultra+) that work perfectly with the same probe-run, target hardware, and firmware.

Would be interesting to get logs from this successful setup and compare them to the logs from the setup that fails.

I'll attach some here - they were collected with probe-run bb4b9c716876b40438e12045ff87f4e7a6ec64dd , except src/logger.rs modified to display all logging in verbose mode. I've also included a log of OpenOCD failing in a similar way with the problematic probe logs.tar.gz

There are a lot of ST-Link clones out there

Understood, however I am reasonably confident that these are both genuine ST probes, and I have updated their firmware to the latest on ST's site.

ianrrees commented 3 years ago

Apologies for the slow follow-up. The second ST-Link v2.1 (AFAICT identical to the first) works perfectly, so it seems I must've somehow damaged the electronics in the first one.

My first attempts with the second probe didn't connect, --verbose said that the probe was in mass storage mode, I used the ST link007 upgrade tool to put the probe in to firmware upgrade mode, closed the program, then retried and it went in to Jtag mode as desired.

fralalonde commented 2 years ago

I've been experiencing a very similar issue, also with SAMD21 (Adafruit Trinket M0). I fixed it twice, once by switching to a new STLINKv2 clone, and the other time by building my own probe from a Bluepill board (with latest STLINKv2 firmware). Both replacement also failed eventually. I can still flash my binary and get RTT log output for a few (~5) seconds before the link fails, after which the probe needs to be re-plugged into USB to be detectable by cargo-embed.

I've been powering the Trinket with 5V from the STLINK as I am using it to power an external USB device (a small MIDI controller). I've noticed that openocd will often report target voltage being in the 3.0-3.2V before a crash and being 2.4V after a crash. I've don't know enough about the probe hardware to guess how damage could occur in such conditions.

While I understand this may not be a cargo-embed issue per se, I see no other place than here to track it for now. I would like to keep collecting more data from others. It might be something SAMD21-specific since I've had no such issue with STM32 boards.