pyocd / pyOCD

Open source Python library for programming and debugging Arm Cortex-M microcontrollers
https://pyocd.io
Apache License 2.0
1.13k stars 483 forks source link

Problems flashing a TI LP MSPM0G3507 via XDS110 #1585

Closed davidjohnsummers closed 1 year ago

davidjohnsummers commented 1 year ago

Hi, I've a new Texas Instruments LP MSPM0G3507 board, that uses the XDS110 chip to program the M0+.

I'm using pyocd 0.35.1 it write the first 35% of the code OK, but then barfs

pyocd flash ./mspm0_sdk_1_00_00_04/examples/nortos/LP_MSPM0G3507/cookbooks/pwm_led_driver/gcc/pwm_led_driver.hex --pack=TexasInstruments.MSPM0G_DFP.1.1.0.pack -t mspm0g3507 -u MG350001
0004561 I Loading /home/summers/ti/mspm0_sdk_1_00_00_04/examples/nortos/LP_MSPM0G3507/cookbooks/pwm_led_driver/gcc/pwm_led_driver.hex [load_cmd]
[=================                                 ]  35%Exception in thread Thread-2 (rx_task):
Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.11/site-packages/pyocd/probe/pydapaccess/interface/pyusb_backend.py", line 156, in rx_task
    read_data = self.ep_in.read(self.ep_in.wMaxPacketSize,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0015462 E Error during board uninit: [session]
  File "/usr/lib/python3.11/site-packages/usb/core.py", line 423, in read
    return self.device.read(self, size_or_buffer, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/usb/core.py", line 1029, in read
    ret = fn(
          ^^^
  File "/usr/lib/python3.11/site-packages/usb/backend/libusb1.py", line 864, in intr_read
    return self.__read(self.lib.libusb_interrupt_transfer,
0015466 E Probe error during disconnect: [session]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/usb/backend/libusb1.py", line 954, in __read
    _check(retval)
  File "/usr/lib/python3.11/site-packages/usb/backend/libusb1.py", line 602, in _check
    raise USBTimeoutError(_strerror(ret), ret, _libusb_errno[ret])
usb.core.USBTimeoutError: [Errno 110] Operation timed out
0015481 C Device MG350001 read thread exited [__main__]

Now the Texas Instruments command for flash via uniflash come line also produces and error, but does flash the device

/home/summers/ti/uniflash_8.3.0/deskdb/content/TICloudAgent/linux/ccs_base/DebugServer/bin/DSLite flash --config=MSPM0G3507.ccxml ../mspm0_sdk_1_00_00_04/examples/nortos/LP_MSPM0G3507/cookbooks/pwm_led_driver/gcc/pwm_led_driver.hex --verbose
DSLite version 12.3.0.3041
Configuring Debugger (may take a few minutes on first launch)...
    Initializing Register Database...
    Initializing: CS_DAP_0
    Executing Startup Scripts: CS_DAP_0
    Initializing: CORTEX_M0P
    Executing Startup Scripts: CORTEX_M0P
    Initializing: SEC_AP
    Executing Startup Scripts: SEC_AP
Connecting...
CORTEX_M0P: GEL Output: Memory Map Initialization Complete
Loading Program: ../mspm0_sdk_1_00_00_04/examples/nortos/LP_MSPM0G3507/cookbooks/pwm_led_driver/gcc/pwm_led_driver.hex
    Preparing ... 
    0 of 1280 at 0x0
error: CORTEX_M0P: Flash Programmer: Device init failed
    Finished
    Setting PC to entry point.
Success
flit commented 1 year ago

That's a USB transfer error. For some reason, reading from the XDS110 probe fails with a USB timeout.

What host are you running on? If it's a Raspberry Pi, people have reported various issues with the RPi USB stack (not just with pyocd).

Also, I think I remember someone reporting a previous issue with XDS110 USB communication?

To really debug this would require a USB bus trace. Unfortunately, I don't have an MSP dev board with an XDS110, so I can't debug it myself.

davidjohnsummers commented 1 year ago

I'm running it on a 15 year old computer AMD E350 device - so yes its an old low powered machine. But it programs an NXP LPC845-BRK device via pyocd fine. So its only the XDS110 device that has problems.

Yes TI only seems to have started using the XDS110 programming chips on the new Cortex M0+ chips they are moving to. The development boards have only started becoming available a few weeks ago, that means you can get an XDS110 program device for $15 or so, whereas for an external XDS110 programming device, its over $100 IIRC.

Is there any way of increasing the USB timeout? As a whole I prefer the pyocd programming a device than the TI version, although that isn't bad.

davidjohnsummers commented 1 year ago

If it helps here is the usbmon output when the timeout happens. What is the best program to read it with for the information you require. 5.mon.out.gz

flit commented 1 year ago

I just ordered an LP-MSPM0G3507 board, so hopefully it will be here within a week or so.

For usbmon, it seems Wireshark might be able to read the output? You should also be able to record directly within Wireshark. Although, I've never tried this for USB. If I have a chance this weekend, I'll try to review it, but there are several other issues I'm working on, too.

It's not possible to change the USB timeout from the command line, though there is a constant in the code.

https://github.com/pyocd/pyOCD/blob/dc9f2c00f761e36527ff96598bb6a7f698307e62/pyocd/probe/pydapaccess/interface/interface.py#L25

However, it's already set to 10 seconds… If the probe isn't responding within this time, then there's something seriously wrong.

flit commented 1 year ago

Something to try: you can set the cmsis_dap.limit_packets session option to prevent multiple outstanding commands (eg via -Ocmsis_dap.limit_packets). This is sometimes required in a VM.

davidjohnsummers commented 1 year ago

Thanks Chris, I'll try those. A timeout of 10s is huge! I've never known anything to take that time! But still when pyocd gives an error, it is at least several second - I didn't think it as long as 10s, but I change that to test.

Thanks for ordering the Ti MSPM0G3507 LP board, like all TI launch pads, its a nice little development board, with plenty of pin outs etc. The one unusual bit is going to the XDS110 programming chip - it means the programming chip is twice as large as the cortex M0+ (guess the XDS110 is a QFP 144 pin, and the M0+ a QFP 48 pin). Guess you ordered direct from TI (seems only place available right now) - for me it meant the board was delivered from Singapore (board made in China) so shipping was five days or so, but very hassle free.

Just tried the -Ocmsis_dap.limit_packets option - that ran immediately, and without error - but didn't seem to write anything!

pyocd flash ./mspm0_sdk_1_00_00_04/examples/nortos/LP_MSPM0G3507/cookbooks/pwm_led_driver/gcc/pwm_led_driver.hex --pack=TexasInstruments.MSPM0G_DFP.1.1.0.pack -t mspm0g3507 -u MG350001 -Ocmsis_dap.limit_packets
0006691 I Loading /home/summers/ti/mspm0_sdk_1_00_00_04/examples/nortos/LP_MSPM0G3507/cookbooks/pwm_led_driver/gcc/pwm_led_driver.hex [load_cmd]
[==================================================] 100%
0007764 I Erased 0 bytes (0 sectors), programmed 0 bytes (0 pages), skipped 2048 bytes (2 pages) at 1.96 kB/s [loader]
flit commented 1 year ago

The timeout is used in this case mostly to prevent hangs from a stuck device. It could very well be that the Linux kernel has a shorter timeout for USB than 10 seconds.

By default, pyocd will read the target's memory and compare with the data being programmed to prevent unnecessary flash erase/write cycles. In this case, you were reprogramming the same image so it didn't need to do anything. That's why it says "skipped 2048 bytes".

To force it to always program flash, pass -Osmart_flash=0.

davidjohnsummers commented 1 year ago

Bingo! And yes the -Osmart_flash=0 worked! Yes I was passing the same .hex file that I had flashed before, so I could compare the TI software to pyocd!

So problem was the multiple outstanding commands. Oh yes looked at the XDS110 part of the card closer yesterday, its just an MSP432 chip, so an older TI arm processor. If it helps the software on the XDS11/MSP432 device is the 03.00.00.25 CMSIS-DAP software; so I guess that is where the issue lies.

flit commented 1 year ago

Wow, I'm somewhat surprised that worked! I'll add a note to the docs.

Is this ok?

XDS110 firmware version 03.00.00.25 is known to have an issue when using multiple outstanding packets (the default setting). To work around this, set the cmsis_dap.limit_packets session option, e.g. -Ocmsis_dap.limit_packets=1 on the command line. Earlier firmware versions most likely exhibit the issue; it is unknown whether it is fixed in more recent versions.

Also, is it ok to close this issue now? Thanks!

davidjohnsummers commented 1 year ago

Yes - happy for you to close the issue, as you found the cause of the problem, and have a work around. So yes its a fault with XDS110, I wonder if we should escale the problem with Texas Instruments, so they know the issue - and will maybe solve with a future update?

Don't know, maybe it caused by the MSP432 chip that provides the XDS110 connection, maybe its just under powered?

davidjohnsummers commented 1 year ago

And Closing the ticket

flit commented 1 year ago

Thanks for closing.

There's no reason the MSP432 can't handle the task. We are able to run DAPLink (our open source CMSIS-DAP implementation) on far less powerful chips. Someone even has a custom CMSIS-DAP implementation running on an ultra cheap 8-bit 8051-compatible MCU! So it's almost certainly just a software bug.

Reporting the issue would be good, even if they don't take action. When I get my board, I'll try to find time to write a simple reproducer script that sends a valid command sequence which causes the device to hang.

davidjohnsummers commented 1 year ago

I've started a thread on the TI forum: https://e2e.ti.com/support/microcontrollers/msp-low-power-microcontrollers-group/msp430/f/msp-low-power-microcontroller-forum/1250134/xds110-programming-device-msp432-based-on-03-00-00-25-cmsis-dap-software-can-t-handle-multiple-requests

The TI staff are pretty responsive there, so suspect it will be rapidly passed onto the right person.

flit commented 1 year ago

That's refreshing to hear!

flit commented 1 year ago

Well, I got the LP-MSPM0G3507 board… and immediately bricked it. Actually, it's almost certainly just locked, but in any case pyocd can't connect anymore. This happened because I did a chip erase, and now the DebugDeviceUnlock sequence from the CMSIS-Pack fails to run. Seems like the device got locked, and the access port for the CPU is disabled, causing the connection to fail. 🤦🏽

davidjohnsummers commented 1 year ago

When you say bricked it, do you mean the MSP432 or the MSPM0G3507?

The MSPM0G3507 can go into a low power mode, and then its hard to talk to. The TI flashing program can write to it though, and that brings it back to life. (IIRC if you hit the reset button at the same time as trying to write with pyocd, then pyocd catches the mspm0g3507 on the boot up, when it can be talked to!)

The MSP432 think it goes into an old style flash disk you can copy new firmware to, and again the TI software for the XDS110 can connect and upgrade the MSP432 firmware.

I'm off to work now, but I'll post more when I get back.

flit commented 1 year ago

The MSPM0G3507 is bricked. It's not in a low power mode… It responds to SWD. But the AP for the CPU is missing, indicating it's in a debug-protected secure state. Anyway, it can probably be recovered using the TI tools. It's just a pain to get everything installed and understand it when I only use it for testing.

Btw, you should be able to use --connect=under-reset to wake the target from sleep in order to connect.

davidjohnsummers commented 1 year ago

Sorry to hear that Chris, sound a bit strange that you can still do SWD, but not enough to write to the flash - so must be something strange. Haven't read that the TI MSPM0G3507 has a protected state in the TRM, e.g. it doesn't seem like the NXP devices where you can lock them down hard ...

The TI tool I use to write the device is uniflash, its mainly a GUI, has similar functionality to pyocd, but not everything (pyocd is more transparent to use). Uniflash has always be able to connect to the MSPM0G3507 in my experience, even when I was "bricking" it when I first got the device. That said though as soon as I've got my own code on the device, its been very reliable with either pyocd or uniflash - my code is current simple, but evolving - so ends up in a big look just flashing an led; just while I get understanding of the Ti SDK.

Oh yes, in post above, where I reported to TI, the link got garbled, and wasn't showing - I've added it back. TI say this isn't an issue (I think because pyocd isn't a supported application) - bit sad, (and I tried pushing) but is what it is. Don't know if you at ARM have other routes into TI you can use.

Have think it would be nice to get DAPLink working on the MSP432 - as it sets everything up as common, and would move CMSIS-DAP onto version 2. Then again don't know how DAPLink compares to the TI software, and as a whole the TI software & SDK are pretty good, better than most of other arm producers, and better support. So may be TI are wedded to their own software ...

davidjohnsummers commented 1 year ago

Oh yes, meant to ask. Did you try typing the pyocd command, but not hit return. Then push and hold S3 next to the USB cable on the LP-MSPM0G3507, hit return, and immediately release S3. That usually got into the device for me.

flit commented 1 year ago

Just tried the reset button, as well as trying --connect=under-reset (which should have the same effect). Neither worked. It's not a big deal, I can try using the TI tools to recover it if needed. And I can test the XDS110 using another target via a cable.

Honestly, it doesn't surprise me that TI won't look at the issue. They seem to still have the old-school silicon vendor mindset regarding tools.

DAPLink should compare pretty well against XDS110. There don't appear to be any proprietary features in TI's probe. Just CMSIS-DAPv1 plus a serial port.

davidjohnsummers commented 1 year ago

Pity about that. Yes last night I was reading the TRM slau846 and 1.4 does say that you can switch off SWD, but BSL may be open on the UART. However if you do factory reset, you need to rewite the NONMAIN data structure - otherwise it my kill the machine.

Now I guess as your chip erase, it must have over written this (and everything else) with 0xff. Strange thing is TRM says that you need a password to change anything, and the password is never 0xff; so if it isn't allowed by hardware - how did you write it. Its suggests that SWD bypasses the check, and writes 0xff; and effect from that is what you found.

Does this mean a problem with TexasInstruments.MSPM0G_DFP.1.1.0.pack shouldn't have allowed you to do an erase there?

flit commented 1 year ago

Yep, I figured it was something similar to that. I just ran pyocd erase --chip. It appears the MSPM0G flash algo in the CMSIS pack doesn't have a chip erase entry point. When pyocd sees this condition, it just performs a sector erase over all the flash regions—including the config area in this case.

It actually looks like there might be some issues with pyocd's generation of memory regions from the pack data. There is a default flag, which when set to 0 should prevent the region from being created (it's meant to then allow the user to enable the region in a GUI). The MSPM0G pack sets default=0 for the NONMAIN flash region. I was certain that pyocd checked this flag, but it could be broken somehow… ☹️

flit commented 1 year ago

There is a bug, but it's more in the chip erase functionality which currently doesn't check the default flag before erasing a region automatically.

Pyocd will go ahead and create non-default flash regions as long as there is a memory region + flash algo defined together (memory regions and flash algos are separate definitions in CMSIS-Packs). The intent is to allow the user access to anything defined in the pack that doesn't cause conflicts.

It's arguable whether it would be better to create a memory region without a flash algo (so it can't be programmed) if the algo is marked as non-default. Unfortunately, there's not a strictly correct way to handle it since vendors all have slightly different interpretations and intended usages.

Thanks for helping me work through this! 😄

davidjohnsummers commented 1 year ago

Yes - and its good that we understand that the XDS110 can only handle one command, and that it isn't safe to erase the NONMAIN memory region. And as you say, you still have XDS110 probe, and so can still use that - at at least it has an 8pin output IIRC.

When I finish my current project with the MSPM0G I may look at converting the MSP432 to DAPLink - but won't do this until project is finished.

Oh yes, went through the TRM to see what effect writing 0xff everwhere in the NONMAIN region, its only the first few bytes that are important. Basically both SWD and BSL are disable unless two demi words have exactly the right bits - and 0xFF isn't correct so both BSL and SWD are disabled. You'll probably also have a CRC fault, which kicks off slightly different route through the code. There is a slim chance you may be able to do a SWD factory reset at power on, but probably as SWD is disabled even that not allowed.

So my reading is its a total brick now. Quite a surprise, but does say really should be no writing to NONMAIN.

flit commented 1 year ago

Wow, it's surprising how dangerous erasing NONMAIN is!

If you do get around to working on an MSP432 port of DAPLink, there's a DAPLink channel on the pyOCD Slack workspace. The join link in on the home page of pyocd.org.

davidjohnsummers commented 1 year ago

Oh pooh - think I may have bricked my mspm0g3507. When trying the unlock command. Afterwards it has just frozen, and nothing (even TI software) can connect. Here is the cmd session:

pyocd> reset Resetting target pyocd> unlock pyocd> reset Resetting target 0151137 W Core #0 is not accessible after reset [cortex_m] Transfer failed: Memory transfer fault @ 0xe000edf0-0xe000edf3 pyocd> list

Probe/Board Unique ID Target


0 NXP Semiconductors LPC11U3x CMSIS-DAP v1.0.7 0F009015 n/a

1 Texas Instruments XDS110 (03.00.00.25) Embed with CMSIS-DAP MG350001 n/a
pyocd> halt Transfer failed: Memory transfer fault

So looks like I'll get to try DAPLink sooner rather than later, and need to take a view on ordering a second MSPM0G3507. These board seem to brick easily.

and just read https://pyocd.io/docs/security.html Oh pooh - unlocking does a mass erase - and as we have learnt in this thread, unlike most devices where that unlocks a device, for the MSPM0G3507 it bricks the device ...

flit commented 1 year ago

Yes, that's the same symptom I have. Really sorry to hear that!

I'd double check with TI support to make sure there's not a way to recover a device which had its NONMAIN flash erased accidentally using their tools. If not, complain vigorously that it's a broken design. 😉

From my own experience working on MCU architecture… having the default flash state be secured is a Very Bad Idea. The NXP Kinetis devices work like this, although in that case the mass erase does unlock the device by writing the unlocked bit pattern into flash for you (it's the normal chip erase that enables security). Still dangerous and troublesome at best.

I'll also work more on making pyocd honour the default flash for flash. Unfortunately, that will make some other devices harder to use, but it's probably tradeoff worth making.

davidjohnsummers commented 1 year ago

Yes I've started a thread on the ti forums:

https://e2e.ti.com/support/microcontrollers/msp-low-power-microcontrollers-group/msp430/f/msp-low-power-microcontroller-forum/1253733/lp-mspm0g3507-lauchpad-in-funny-state-how-to-reset

and following your security page, have confirmed I can access the addition access ports (guess you found the same).

TI has at least now given information on what AP=1-4 do; not clear if one can enable SWD again. If I get SWD alive, I can write the bytes I need to to NONMAIN.

The logic on the words in NONMAIN controlling SWD and BSL is a bit strange, there is a word for enabled, and another word for disabled. But so that accidental bit fips don't give access to a locked machine, if any other word value is set, then SWD/BSL are also disabled. The net effect of this is that there are two or three words in NONMAIN that must have specific values.

Anyway that I can access other Access Points, shows the boot rom is still booting, and does give some access - only question is what can be done with it. The list posted on the ti site - and none of the Access Points seem to give access to resetting the machine; but lets see what TI says ....

The list of AP points given are:

AP=0; AHB-AP; MCPUSS debug access port; Debug of the processor and peripherals AP=1; CFG-AP; Configuration access port; Access device type information AP=2; SEC-AP; Security access port; Access the debug mailbox (DSSM) AP=3; ET-AP; Energy Trace technology access port; Read the power state data from power aware debug AP=4; PWR-AP; Power access port; Configure the device power states (interfaces with PMCU/SYSCTL)

Gut feeling with pyocd erasing NONMAIN, it probably at least a warning should be given that NONMAIN is being erased, and ask for confirmation (as default = 0). That then should at least work with other machines as well, e.g. you can force the erase of a default =0 sector, but need to make an active choice to do that. e.g. what caused my problem was the command "unlock", that only in the documentation does it mention that means a mass erase, so is easily missed.

Thanks though for your time on this.

flit commented 1 year ago

Having all values except certain ones cause a locked state is a common way to prevent things like physical glitching attacks from forcibly unlocking an intentionally-protected device. (This is why security for the NXP Kinetis devices works similarly.) But there are far better ways to do handle this using things like a redundant lifecycle state setting.

What really sucks is that, unless there's a way to use the SEC-AP to unlock the device, then there is no way for customers to temporarily lock a device for production testing and then recover it to continue development.

Agreed regarding the erase subcommand. I was thinking of changing it to only erase default regions unless an --nondefault option is provided. Perhaps also only erase the boot memory unless an "--all" or similar option is provided. And report which regions are erased or not.

davidjohnsummers commented 1 year ago

Hi Chris,

After a bit of moaning - TI has said they will send another MSPM0G3507 - which is good of them; so can still continue to write code.

Thought a bit about the above, and much reading. Yes algorithm default=0 doesn't mean you can't use it, its only default=1 that means that you have to ( https://open-cmsis-pack.github.io/Open-CMSIS-Pack-Spec/main/html/pdsc_family_pg.html#element_algorithm )! So probably the main problem is name "unlock" - as it unlocks some machines, but locks TI MSPM0. How about rename the command to "masserase" - then it describes what it does, and users then have to know what that means. I think thats the simplest way of making people aware.

I've dug into:

AP=1; CFG-AP; Configuration access port; Access device type information AP=2; SEC-AP; Security access port; Access the debug mailbox (DSSM)

which are still running. CFG-AP just give information so

pyocd> readap 1 0x1000000 AP register 0x1000000 = 0x1bb8802f

tells you about what CPU is there, its says version=1, which is inserting the same info in flash says version=2!

SEC-AP is more difficult, it write to DSSM, and that sents mail to the CPU. So:

writeap 2 0x2000000 0x020

should take back to where you started, but:

pyocd> readap 2 0x2000004 AP register 0x2000004 = 0x00000001

Means you have sent mail, but the CPU hasn't yet picked it up. It goes back to 0, when picked up.

AP Register 0x2000000 = DSSM Register TX_DATA AP Register 0x2000004 = DSSM Register TXCTL AP Register 0x2000008 = RX_DATA AP Register 0x200000A = RXCTL

08 is for messages back to you, and 0c that you have read it.

Hope this helps,

David

flit commented 1 year ago

Good to hear they'll send another!

Probably the best change would be to remove the fallback to erasing all regions when there isn't a special unlock routine. You're right that it's not really an unlock. The issues comes from "unlock" falling back to "mass erase" which falls back to "erase all regions" which erases non-default regions.

Thanks again for your help discussing and working through this!

davidjohnsummers commented 1 year ago

Hi Chris, am just closing down the thread on the TI site, and in write up realised something. I guess you use MSPM0G_NONMAIN.FLM for doing the mass erase of NONMAIN, specifically calling EraseSector command in the algorithm.

Just wondering, what if those functions did nothing and returned status=1 or failure. Could you still write to the chip with ProgramPage?

Now this kinda makes sense, one should never erase NONMAIN on these boards, and only ever write occasional words.

The other beauty is that MSPM0G_NONMAIN.FLM is maintained by TI, its a TI executable; so its TI that sets the policy on how you write to the chips, and how they require you to using the chip. So like this approach - as its clear that TI is responsible for both.

Does this make sense?

[Edit - just disassembled MSPM0G_NONMAIN.FLM - sod me, it contains what you are meant to set the NONMAIN memory to in nonMainDefaultBCR @ 0x564 - but can't see it referenced in the code ...]

[Edit - and reading above, realise you have have also said something similar about the pack algorithm, EraseSector exists - but not EraseChip entry point - which I also saw when decompiling the .flm algorithm - so guess you have already had the same thoughts. Oh yes think I've work out how to program the msp432 - so getting close to try DAPLink on the device]