grant-h / omap_loader

A USB BootROM uploading utility for TI ARM OMAP3 family processors. Rewrite of omap3_usbload. Supports USBLOAD functionality in TI's X-Loader.
GNU General Public License v2.0
6 stars 5 forks source link

Tool may fail on some machines #1

Open grant-h opened 9 years ago

grant-h commented 9 years ago

There is a particularly annoying issue where omap_loader will fail to upload beyond the first stage (x-loader) as x-loader communication fails. I'm not sure what the issue is or where in the stack it lies, but this needs to be investigated as it only affects certain machines.

The failure that occurs is a USB PIPE or TIMEOUT error. On some machines the issue is complete reproducable. On others it may only happen every 1 in 5 times. The testing target - a Nest Thermostat - has a 10-15 second restart time which requires a button to be held down. In order to track down this bug, debug output from the x-loader needs to be printed and the error needs to be reproducible 100% of the time.

Checkout and compile off of branch fix-no-upload https://github.com/grant-h/omap_loader/tree/fix-no-upload To continue where I left off...

dmitry-bodnya commented 8 years ago

Hello grant-h!

I have several machines on which my board doesn't want to boot at all(100% times). Regular output is: $ sudo ./omap_loader -v -f x-load.bin OMAP Loader 1.0.0 File 'x-load.bin' at 0x40200000, size 15200 [+] scanning for USB device matching 0451:d00e... [+] successfully opened 0451:d00e (Texas Instruments OMAP3630) [+] got ASIC ID - Num Subblocks [05], Device ID Info [01050136300707], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [15090113bf3eef00000000] [-] fatal transfer error (BULK_OUT) for 15200 bytes (0 made it): LIBUSB_ERROR_PIPE [-] failed to send file 'x-load.bin' (size 15200) [-] failed to transfer the first stage file 'x-load.bin'

Are you interested to investigate this issue?

dmitry-bodnya commented 8 years ago

I'd recompiled with staging changes:

$ sudo ./omap_loader -v -f x-load.bin OMAP Loader 1.0.0 [+] Will load 'x-load.bin' at 0x40200000 with size of 15200 bytes [+] scanning for USB device matching 0451:d00e... [D] using configuration 1 [+] successfully opened 0451:d00e (Texas Instruments OMAP3630) [+] got ASIC ID - Num Subblocks [05], Device ID Info [01050136300707], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [15090113bf3eef00000000] [+] WARNING: BULK_OUT transfer timed out. Progress 0/4. Try 1/2 [+] WARNING: BULK_OUT transfer timed out. Progress 0/4. Try 2/2 [+] WARNING: BULK_OUT transfer timed out. Progress 0/4. Try 3/2 [-] device timed out while transfering out 4 bytes (0 made it) [-] failed to length specifier of 15200 to OMAP BootROM [-] failed to transfer the first stage file 'x-load.bin'

grant-h commented 8 years ago

Thanks for looking in to this issue. I dont have a device I can test with anymore (unfortunately). This whole issue seems limited to certain USB ports / computers. I really can't put my finger on it.

With the fix-no-upload branch compiled, please add a -d to your command line arguments. Then go ahead repost your output in a code block (start and end with ```)

dmitry-bodnya commented 8 years ago

$ sudo ./omap_loader -d -v -f x-load.bin OMAP Loader 1.0.0 [+] Will load 'x-load.bin' at 0x40200000 with size of 15200 bytes [+] scanning for USB device matching 0451:d00e... [D] USB device 0451:d00e info [D] USB BCD 0x0210 [D] Endpoint 0 Max Pkt Size 64 [D] Number of configurations: 1 [D] Configuration Descriptor 0 [D] Interfaces 1 [D] Interface Descriptor 0 [D] Alternate Setting 0 [D] using configuration 1 [D] auto detaching kernel driver on interface claim [D] interface successfully claimed [+] successfully opened 0451:d00e (Texas Instruments OMAP3630) [+] got ASIC ID - Num Subblocks [05], Device ID Info [01050136300707], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [15090113bf3eef00000000] [D] sent peripheral boot command f0030002... [D] sent length of 'x-load.bin' (15200)... [-] fatal transfer error (BULK_OUT) for 15200 bytes (0 made it): LIBUSB_ERROR_PIPE [-] failed to send file 'x-load.bin' (size 15200) [-] failed to transfer the first stage file 'x-load.bin'

dmitry-bodnya commented 8 years ago

I found that if I am loading only one file - sometimes it succeeds(rare). But when I specify several files, somehow even first failed to load.

grant-h commented 8 years ago
  1. Can you give me the output of lspci -v on your test machine?
  2. Can you use https://wireshark.org/ to perform a USB capture of a case where a file succeeds and one that fails?
  3. What device are you testing with?
  4. What is the exact model of the chip you are testing with?
  5. What type of code are you trying to upload to the device?

Unfortunately, without a chip to program myself, it will be difficult to fix this issue. Packet captures may help, but without a chip, this will be hard to debug.

dmitry-bodnya commented 8 years ago

$ sudo lspci -v 00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: bus master, fast devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=0c <?> Kernel driver in use: hsw_uncore

00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 0000e000-0000efff Memory behind bridge: f7e00000-f7efffff Prefetchable memory behind bridge: 00000000e0000000-00000000efffffff Capabilities: [88] Subsystem: ASUSTeK Computer Inc. Device 8534 Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [a0] Express Root Port (Slot+), MSI 00 Capabilities: [100] Virtual Channel Capabilities: [140] Root Complex Link Capabilities: [d94] #19 Kernel driver in use: pcieport

00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05) (prog-if 30 [XHCI]) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: bus master, medium devsel, latency 0, IRQ 25 Memory at f7f00000 (64-bit, non-prefetchable) [size=64K] Capabilities: [70] Power Management version 2 Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+ Kernel driver in use: xhci_hcd

00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: bus master, fast devsel, latency 0, IRQ 28 Memory at f7f1b000 (64-bit, non-prefetchable) [size=16] Capabilities: [50] Power Management version 3 Capabilities: [8c] MSI: Enable+ Count=1/1 Maskable- 64bit+ Kernel driver in use: mei_me

00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05) (prog-if 20 [EHCI]) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: bus master, medium devsel, latency 0, IRQ 20 Memory at f7f18000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features Kernel driver in use: ehci-pci

00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 05) Subsystem: ASUSTeK Computer Inc. Device 8576 Flags: bus master, fast devsel, latency 0, IRQ 29 Memory at f7f10000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 2 Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Virtual Channel Kernel driver in use: snd_hda_intel

00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 I/O behind bridge: 00002000-00002fff Memory behind bridge: f0100000-f02fffff Prefetchable memory behind bridge: 00000000f0300000-00000000f04fffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Device 8534 Capabilities: [a0] Power Management version 3 Kernel driver in use: pcieport

00:1c.2 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 (rev d5) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 I/O behind bridge: 0000d000-0000dfff Memory behind bridge: f7d00000-f7dfffff Prefetchable memory behind bridge: 00000000f0000000-00000000f00fffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Device 8534 Capabilities: [a0] Power Management version 3 Kernel driver in use: pcieport

00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d5) (prog-if 01 [Subtractive decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=04, subordinate=05, sec-latency=0 Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Device 8534 Capabilities: [a0] Power Management version 3

00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05) (prog-if 20 [EHCI]) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: bus master, medium devsel, latency 0, IRQ 23 Memory at f7f17000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features Kernel driver in use: ehci-pci

00:1f.0 ISA bridge: Intel Corporation B85 Express LPC Controller (rev 05) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: bus master, medium devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=0c <?> Kernel driver in use: lpc_ich

00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode](rev 05) (prog-if 01 [AHCI 1.0]) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 26 I/O ports at f070 [size=8] I/O ports at f060 [size=4] I/O ports at f050 [size=8] I/O ports at f040 [size=4] I/O ports at f020 [size=32] Memory at f7f16000 (32-bit, non-prefetchable) [size=2K] Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [70] Power Management version 3 Capabilities: [a8] SATA HBA v1.0 Kernel driver in use: ahci

00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: medium devsel, IRQ 14 Memory at f7f15000 (64-bit, non-prefetchable) [size=256] I/O ports at f000 [size=32]

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM](prog-if 00 [VGA controller]) Subsystem: PC Partner Limited / Sapphire Technology Device e204 Flags: bus master, fast devsel, latency 0, IRQ 30 Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at f7e20000 (64-bit, non-prefetchable) [size=128K] I/O ports at e000 [size=256] Expansion ROM at f7e00000 [disabled] [size=128K] Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [150] Advanced Error Reporting Kernel driver in use: radeon

01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI Audio [Radeon HD 6400 Series] Subsystem: PC Partner Limited / Sapphire Technology Radeon HD 6450 1GB DDR3 Flags: bus master, fast devsel, latency 0, IRQ 31 Memory at f7e40000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [150] Advanced Error Reporting Kernel driver in use: snd_hda_intel

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) Subsystem: ASUSTeK Computer Inc. Device 8554 Flags: bus master, fast devsel, latency 0, IRQ 27 I/O ports at d000 [size=256] Memory at f7d00000 (64-bit, non-prefetchable) [size=4K] Memory at f0000000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Endpoint, MSI 01 Capabilities: [b0] MSI-X: Enable- Count=4 Masked- Capabilities: [d0] Vital Product Data Capabilities: [100] Advanced Error Reporting Capabilities: [140] Virtual Channel Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00 Capabilities: [170] Latency Tolerance Reporting Kernel driver in use: r8169

04:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 03) (prog-if 01 [Subtractive decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=04, secondary=05, subordinate=05, sec-latency=32 Capabilities: [c0] Subsystem: ASUSTeK Computer Inc. Device 8489

dmitry-bodnya commented 8 years ago

I am testing with my custom board. The processor is DM3725CBP(800MHz version, POP memory). Also I have SBC8140 board and I see similar behavior. I am loading x-loader. Compiled from x-loader-usb found in internet when searching omap usb load. Sources are modified for my board.

I have a hardware USB sniffer. Right now I have only log for failed loading. It is in specific format so I am attaching a picture.

dmitry-bodnya commented 8 years ago

capture2

dmitry-bodnya commented 8 years ago

And on machine where usb load is mostly successful: $ lspci -v 00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09) Subsystem: Hewlett-Packard Company Device 1846 Flags: bus master, fast devsel, latency 0 Capabilities:

00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 00003000-00003fff Memory behind bridge: d0900000-d09fffff Prefetchable memory behind bridge: 00000000b0000000-00000000bfffffff Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp

00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09) (prog-if 00 [VGA controller]) Subsystem: Hewlett-Packard Company Device 17f4 Flags: bus master, fast devsel, latency 0, IRQ 49 Memory at d0000000 (64-bit, non-prefetchable) [size=4M] Memory at c0000000 (64-bit, prefetchable) [size=256M] I/O ports at 4000 [size=64] Expansion ROM at [disabled] Capabilities: Kernel driver in use: i915 Kernel modules: i915

00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04) (prog-if 30 [XHCI]) Subsystem: Hewlett-Packard Company Device 1846 Flags: bus master, medium devsel, latency 0, IRQ 45 Memory at d0a00000 (64-bit, non-prefetchable) [size=64K] Capabilities: Kernel driver in use: xhci_hcd

00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04) Subsystem: Hewlett-Packard Company Device 1846 Flags: bus master, fast devsel, latency 0, IRQ 48 Memory at d0a14000 (64-bit, non-prefetchable) [size=16] Capabilities: Kernel driver in use: mei Kernel modules: mei

00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04) (prog-if 20 [EHCI]) Subsystem: Hewlett-Packard Company Device 1846 Flags: bus master, medium devsel, latency 0, IRQ 16 Memory at d0a19000 (32-bit, non-prefetchable) [size=1K] Capabilities: Kernel driver in use: ehci-pci

00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04) Subsystem: Hewlett-Packard Company Device 1846 Flags: bus master, fast devsel, latency 0, IRQ 50 Memory at d0a10000 (64-bit, non-prefetchable) [size=16K] Capabilities: Kernel driver in use: snd_hda_intel Kernel modules: snd-hda-intel

00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 Memory behind bridge: d0800000-d08fffff Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp

00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 3 (rev c4) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 Memory behind bridge: d0700000-d07fffff Prefetchable memory behind bridge: 00000000af200000-00000000af2fffff Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp

00:1c.3 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 4 (rev c4) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=04, subordinate=04, sec-latency=0 Memory behind bridge: d0600000-d06fffff Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp

00:1c.5 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 6 (rev c4) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=05, subordinate=05, sec-latency=0 I/O behind bridge: 00002000-00002fff Memory behind bridge: d0500000-d05fffff Prefetchable memory behind bridge: 00000000d0400000-00000000d04fffff Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp

00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04) (prog-if 20 [EHCI]) Subsystem: Hewlett-Packard Company Device 1846 Flags: bus master, medium devsel, latency 0, IRQ 16 Memory at d0a18000 (32-bit, non-prefetchable) [size=1K] Capabilities: Kernel driver in use: ehci-pci

00:1f.0 ISA bridge: Intel Corporation HM76 Express Chipset LPC Controller (rev 04) Subsystem: Hewlett-Packard Company Device 1846 Flags: bus master, medium devsel, latency 0 Capabilities: Kernel driver in use: lpc_ich Kernel modules: lpc_ich

00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode](rev 04) (prog-if 01 [AHCI 1.0]) Subsystem: Hewlett-Packard Company Device 1846 Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 46 I/O ports at 4068 [size=8] I/O ports at 4074 [size=4] I/O ports at 4060 [size=8] I/O ports at 4070 [size=4] I/O ports at 4040 [size=32] Memory at d0a17000 (32-bit, non-prefetchable) [size=2K] Capabilities: Kernel driver in use: ahci Kernel modules: ahci

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Thames [Radeon HD 7550M/7570M/7650M](prog-if 00 [VGA controller]) Subsystem: Hewlett-Packard Company Radeon HD 7650M Flags: bus master, fast devsel, latency 0, IRQ 51 Memory at b0000000 (64-bit, prefetchable) [size=256M] Memory at d0900000 (64-bit, non-prefetchable) [size=128K] I/O ports at 3000 [size=256] Expansion ROM at d0920000 [disabled] [size=128K] Capabilities: Kernel driver in use: fglrx_pci Kernel modules: fglrx_experimental_13, radeon

03:00.0 System peripheral: JMicron Technology Corp. SD/MMC Host Controller (rev 30) Subsystem: Hewlett-Packard Company Device 17f6 Flags: bus master, fast devsel, latency 0, IRQ 18 Memory at d0703000 (32-bit, non-prefetchable) [size=256] Expansion ROM at af200000 [disabled] [size=64K] Capabilities: Kernel driver in use: sdhci-pci Kernel modules: sdhci-pci

03:00.2 SD Host controller: JMicron Technology Corp. Standard SD Host Controller (rev 30) (prog-if 01) Subsystem: Hewlett-Packard Company Device 17f6 Flags: fast devsel, IRQ 18 Memory at d0702000 (32-bit, non-prefetchable) [size=256] Capabilities: Kernel modules: sdhci-pci

03:00.3 System peripheral: JMicron Technology Corp. MS Host Controller (rev 30) Subsystem: Hewlett-Packard Company Device 17f6 Flags: bus master, fast devsel, latency 0, IRQ 18 Memory at d0701000 (32-bit, non-prefetchable) [size=256] Capabilities: Kernel driver in use: jmb38x_ms Kernel modules: jmb38x_ms

04:00.0 Network controller: Ralink corp. RT3290 Wireless 802.11n 1T/1R PCIe Subsystem: Hewlett-Packard Company Device 18ec Flags: bus master, fast devsel, latency 0, IRQ 19 Memory at d0610000 (32-bit, non-prefetchable) [size=64K] Capabilities: Kernel driver in use: rt2800pci Kernel modules: rt2800pci

04:00.1 Bluetooth: Ralink corp. RT3290 Bluetooth Subsystem: Hewlett-Packard Company Device 18ec Flags: bus master, fast devsel, latency 0, IRQ 10 Memory at d0600000 (32-bit, non-prefetchable) [size=64K] Capabilities:

05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 07) Subsystem: Hewlett-Packard Company Device 1846 Flags: bus master, fast devsel, latency 0, IRQ 47 I/O ports at 2000 [size=256] Memory at d0404000 (64-bit, prefetchable) [size=4K] Memory at d0400000 (64-bit, prefetchable) [size=16K] Capabilities: Kernel driver in use: r8169 Kernel modules: r8169

grant-h commented 8 years ago

Thanks for this info. Sorry, but I won't have much time to figure out this issue. I'd need to buy some more hardware to play with before I can try and debug.

grant-h commented 8 years ago

@dmitry-bodnya Hello again, do you know which generation Nest hardware you have? If you can find a version number or check out the information file when you plug in via USB, that'd be great.

dmitry-bodnya commented 8 years ago

@grant-h Hello. Sorry, but I don't have any kind of Nest hardware. I am working with our own board(with DM3725), which boots over usb for FW upgrading. And we have PCs with 100% times boot failed. It failed to boot on the very first stage, so I think it is something inside processor ROM, which is hard to investigate and impossible(for us) to change.

dmitry-bodnya commented 7 years ago

@grant-h Hi. May be it will be useful. I moved your code to Windows and after some changes(no getopt, no basename, and others) ran it. I met this issue again - ASIC ID received, x-loader load ok and then no "USBf"(file req) cmd receive. I put Sleep(500) in place, where you wrote about race condition in comments. After that, I got stable file-loading process(tested up to 6 files). Perhaps 500 ms is too much. But, just in theory, it is time for x-loader to boot and to write something to fifo before host side send IN token. If host is sending IN token before x-loader ready - then there is no successful exchange following. So, I guess delay is neccessary and transfer_other_files is not robust enough to work without delay. And also, when I began to transfer large files(uImage - 2MB, fs.bin - 25MB) I got usb timeouts. So I added timeout multiplier, which depends of tx size. Also, to work with OMAP on Windows, you should install driver(I found .inf file in internet, is uses WinUsb).

Anyway, I still have computer which fails to load even first stage.

grant-h commented 7 years ago

Dmitry, that's good news! I've tried adding sleeps all over the code for some machines, but even then they still manage to not boot. Also, would you be willing to consolidate your changes that allowed this tool to work on Windows and share them in a pull request? It would be nice to have this tool be cross platform as it's only real dependency is libusb.

dmitry-bodnya commented 7 years ago

Grant, I'm not good in git.. I can send you files, if You tell me where.

grant-h commented 7 years ago

@dmitry-bodnya Here you go. My email is encoded here. Decode it and send me an attachment.

dougg3 commented 5 months ago

I'm way late to the party here, but sharing my solution.

I've been tinkering with an OMAP3530 BeagleBoard lately, and was running into a bunch of the same issues described in this thread when trying to boot it through USB. The problem only happens on one of my computers. Both of my computers are running Ubuntu 22.04. The working one is an older Intel machine with only USB 2.0 support. The broken one is a newer AMD machine with USB 3.0 support. I have a USB analyzer so I was able to compare the raw USB traffic between them. The command I'm running is:

./omap_loader -p 0xd009 -f x-load.bin -f u-boot.bin -a 0x80800000 -j 0x80800000

I'm still confused about some of the results I saw, but here are the changes I've made that have helped this program work on my AMD computer:

  1. Add usleep(1000) before both libusb_bulk_transfer() calls
  2. In omap_usb_write, limit writeAmt to a max of 512 per loop iteration (Edit: This doesn't actually seem necessary at all)

The two changes above will allow x-loader to load properly, but I still couldn't get it to load secondary files until I added a usleep(20000); after the "Note: this is a race between the target's processor getting X-loader running" comment. I can't explain why an arbitrary delay fixes it for me, but it does.

None of these changes seem to hurt anything on the older Intel computer that works with the stock code, by the way. So that's good...

I am struggling to fully understand why these changes fix it. It just seems like doing things to slow down and avoid NAKs/NYETs in the raw traffic helps with compatibility. As for the 20 ms delay: without it, I saw some very interesting (and repeatable) behavior when it tried to read from X-loader for the first time after loading it. No traffic appeared on the bus at all. A usbmon trace showed the bulk IN URB failed after about 5 ms with "-EPROTO". Then there was the 2 second delay, and then in my analyzer I actually saw an IN/DATA1/ACK on the bus with data from x-loader on the second attempt. But instead of reporting this data back to libusb, the host just kept doing more IN packets (which were NAKed because x-loader had nothing left to return). Eventually the URB was canceled after a second and the data was lost. For some reason the 20 ms delay completely fixes this issue.

I'm not sure if it's a bug in the Linux kernel or the host controller or what. I don't think it's a bug in the OMAP or x-loader. When a short data packet is received after the IN, shouldn't the Linux machine be saying "okay we got some data" and bailing, rather than trying to grab more data forever? If someone understands USB transactions in Linux better than me, I would love to deep dive further.

Edit: I tested another Intel laptop which doesn't need the usleep(1000) before libusb_bulk_transfer() but does need the usleep(20000). I also ran into an interesting problem on the Intel laptop where libusb doesn't detect the device being attached at all because of how quickly it disables USB. It seems to be an issue some people run into with libusb-1.0, which I solved by compiling a custom build of libusb with udev disabled (uses netlink instead).

grant-h commented 5 months ago

@dougg3 Awesome investigative work! It's cool that you have an USB analyzer at hand to actually get down to the PHY layer to figure what is going on. Do you happen to have a "smoking gun" screen shot of the protocol traces between the working and failing devices? What USB analyzer are you using by the way?

As for why this is happening, maybe since the target's USB stack is in flux during the initiated BULK transfer in combination with some flaky Linux USB stack code, things are going out of sync. Probably best to not race either stack. Since the changes are just sleep related I'm happy to accept a diff that adds them and get that commited.

dougg3 commented 5 months ago

Thanks @grant-h! I'm definitely excited to potentially improve the compatibility of this excellent utility and luckily I have a lot of computers to test with. I'm using Alex Taradov's open source USB sniffer which I built myself. It's really neat because it outputs files compatible with Wireshark. It's reliable for the most part, although sometimes I have to restart captures because it loses communication when I hotplug stuff. I think I might be using cables a little too long. I'm happy to show more detailed info from the sniffer here. I'm also happy to submit a PR to fix a few things (when I have a bit more free time).

Computer 1 (Core i3-2120)

Here is what my one perfectly working computer with the current codebase, a Core i3-2120, does. It doesn't support USB 3.0 at all, it has an EHCI host controller. It looks like when running this utility completely unmodified:

$ sudo ./omap_loader -p 0xd009 -f x-load.bin -f u-boot.bin -a 0x80800000 -j 0x80800000 -v
OMAP Loader 1.0.0
File 'x-load.bin' at 0x40200000, size 26956
File 'u-boot.bin' at 0x80800000, size 777760
[+] scanning for USB device matching 0451:d009...
[+] successfully opened 0451:d009 (Texas Instruments OMAP3430)
[+] got ASIC ID - Num Subblocks [05], Device ID Info [01050134300757], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [150901f7488f2800000000]
[+] uploading 'u-boot.bin' (size 777760) to 0x80800000
[+] jumping to address 0x80800000
[+] successfully transfered 2 files

Here's the start of the interesting part of the USB sniffing, after all of the descriptors are grabbed by Linux.

image

The length = 72 DATA0 is the initial info from the OMAP with the ASIC ID. You can see directly afterward, we do two OUT transactions with 7 bytes each (4 bytes of actual data in them). That's the boot from peripheral command, followed by the length. Finally we start sending the data in 512-byte packets. You'll notice that the first packet is ACKed, then the next packet is NAKed so we ping until getting an ACK, and then do the next OUT packet and so on. This continues until X-loader has been fully loaded.

image

The 335-byte packet here is the final packet, you can see it was NAKed the first time and had to be repeated. Then we start doing IN transactions after about 17 ms and they are NAKed for a while. I think this is while X-loader is still getting ready to go. Eventually we get a 16-byte response with DATA1 -- this is the USBffile req initial response from X-loader, which then kicks off the rest of the transmit process. It all continues great from that point on, in a pretty similar fashion. I won't bother with another screenshot, but it's more of the same stuff including lots of PING/NAKs.

So these traces seem to show pretty normal USB 2.0 high speed behavior. It handles the PINGs, NAKs, etc. just fine on this machine and happily finishes the whole thing from start to finish.

Computer 2 (Ryzen 5 3600)

Next, I'm going to move onto a Ryzen 5 3600. This has an XHCI host controller and supports USB 3.0 (same with all the remaining computers I'm testing). Here's the output I get when I run unmodified omap_loader on it:

$ sudo ./omap_loader -p 0xd009 -f x-load.bin -f u-boot.bin -a 0x80800000 -j 0x80800000 -v
OMAP Loader 1.0.0
File 'x-load.bin' at 0x40200000, size 26956
File 'u-boot.bin' at 0x80800000, size 777760
[+] scanning for USB device matching 0451:d009...
[+] successfully opened 0451:d009 (Texas Instruments OMAP3430)
[+] got ASIC ID - Num Subblocks [05], Device ID Info [01050134300757], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [150901f7488f2800000000]
[-] fatal transfer error (BULK_OUT) for 26956 bytes (0 made it): LIBUSB_ERROR_PIPE
[-] failed to send file 'x-load.bin' (size 26956)
[-] failed to transfer the first stage file 'x-load.bin'

As you can see it doesn't even succeed at sending anything. Here's the relevant part of the USB sniffer's trace:

image

What's going on here is we get the initial 72-byte ASIC ID. In return we send the two 7-byte (4-byte data) responses. The second one gets delayed a little bit with NAKs, but it eventually succeeds and we move onto sending data. The first chunk goes out fine, then we try the next chunk and it's NAKed. Eventually it returns a STALL packet. Wireshark does a poor job of showing it's a STALL, but that's what is going on, and that's why I highlighted it so you can see in the bottom left.

It's interesting that as soon as a new frame started, our first PING resulted in a STALL. It's almost like the OMAP bootloader isn't expecting a PING after a new frame has begun and is thinking we'll try sending a new packet instead. This almost does point to some kind of an edge case bug in the OMAP's on chip bootloader's USB implementation. Anyway, adding the 1 ms delay before libusb_bulk_transfer adds enough of a delay to allow X-loader to fully load.

However, now it starts failing on loading the next file:

$ sudo ./omap_loader -p 0xd009 -f x-load.bin -f u-boot.bin -a 0x80800000 -j 0x80800000 -v
OMAP Loader 1.0.0
File 'x-load.bin' at 0x40200000, size 26956
File 'u-boot.bin' at 0x80800000, size 777760
[+] scanning for USB device matching 0451:d009...
[+] successfully opened 0451:d009 (Texas Instruments OMAP3430)
[+] got ASIC ID - Num Subblocks [05], Device ID Info [01050134300757], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [150901f7488f2800000000]
[-] fatal transfer error (BULK_IN) for 512 bytes (got 0): LIBUSB_ERROR_IO
[-] device timed out while transfering in 512 bytes (got 0)
[-] device timed out while transfering in 512 bytes (got 0)
[-] failed to read command from X-Loader
[-] failed to transfer the additional files in to memory

Here's the USB trace where the failure starts happening:

image

It's stuck in an infinite loop of IN/NAKs for a while. Interestingly at some point we do an IN that gets no response at all and we try IN again. Still no response, I think this is when libusb bails. Then we wait for 2 seconds. Then we try again, and successfully do an IN/DATA1/ACK that gives us the USBffile req response from X-loader, but then it just seems to get hung up forever doing IN/NAK until we time out and bail on it.

image

I'm not sure whether to blame the device, or the XHCI driver, or both. Sending an IN packet and getting no response at all seems fishy and I'm tempted to blame the device for that. Maybe it's happening right when X-Loader is first starting up and the OMAP's USB controller is being reconfigured. The weird part is that clearly the device eventually gives us 16 bytes of data, but for some reason the host controller just continues asking for more IN data immediately even though we got a short packet. I think it should be returning the 16 byte packet to libusb at that point, but libusb never sees it. I did some tracing on this and I don't think it's libusb's fault. The kernel never reports it to libusb. So I think that part smells kind of like an issue in the XHCI driver or controller, but I'm not totally sure.

Adding the 20 ms delay before attempting to read from X-loader fixes this issue and it fully transmits. Here's the same spot in the trace with 20 ms of delay added before we try the IN transaction to read from X-loader:

image

Notice that it doesn't immediately do another IN after the ACK for the 16 byte DATA1 packet, which makes way more sense than what we saw earlier. From this point on there are a ton of IN/NAK and OUT/NAK wait delays, but everything transmits fine. I see PINGs at the start of a frame with no problem, so that is more evidence that the STALL issue we saw before the 1ms delay was added is likely some kind of on-chip bootloader issue.

Computer 3 (Ryzen 7 5700G)

The result is pretty similar to computer 2's original failure, although it happens earlier.

$ sudo ./omap_loader -p 0xd009 -f x-load.bin -f u-boot.bin -a 0x80800000 -j 0x80800000 -v
OMAP Loader 1.0.0
File 'x-load.bin' at 0x40200000, size 26956
File 'u-boot.bin' at 0x80800000, size 777760
[+] scanning for USB device matching 0451:d009...
[+] successfully opened 0451:d009 (Texas Instruments OMAP3430)
[+] got ASIC ID - Num Subblocks [05], Device ID Info [01050134300757], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [150901f7488f2800000000]
[-] device timed out while transfering out 4 bytes (0 made it)
[-] failed to length specifier of 26956 to OMAP BootROM
[-] failed to transfer the first stage file 'x-load.bin'

In this case the OMAP responds with NYET when we try to send the first response with the "boot from peripheral" command.

image

So this means the OMAP bootloader accepted the "boot from peripheral" packet but isn't ready for another packet. From this point on it gets stuck and never ACKs another packet even though we try brand new OUT packets later. I think this maybe also points to a bug in the OMAP on-chip bootloader? I don't see anything wrong with what the computer is trying to do here. The OMAP just gets hung up forever.

The 1 ms delay fixes that issue and then I get a completely different result than computer 2 when it starts trying to talk to X-loader:

[-] fatal transfer error (BULK_IN) for 512 bytes (got 0): LIBUSB_ERROR_IO
[-] device timed out while transfering in 512 bytes (got 0)
[-] device timed out while transfering in 512 bytes (got 0)
[-] failed to read command from X-Loader
[-] failed to transfer the additional files in to memory

image

It looks like we're sitting there doing nothing for 2 seconds before we finally read the "KUSBffile req" from X-loader, but I can confirm that we did try to read from it earlier. libusb returned an error. This seems to be a case where we asked the kernel to send out an IN request, but it refused and errored out instead. After the 2 second retry it finally sent an IN packet and got a response, but then it got stuck in a IN/NAK loop afterward and the data was never returned. Just like computer 2, this feels like some sort of kernel driver or XHCI design bug, because after we got the short data packet I don't think it should have continued trying to send IN requests. Simply adding a 20 ms delay before the IN causes the IN packet to show up correctly the first time, rather than a timeout with no packet at all -- which also seems like another kernel/XHCI bug.

image

While testing this out I realized that the reason I thought I needed the 512 byte limiter in my first comment (later crossed out) is because when I'm running my USB sniffer at the same time on the same computer, I end up with LIBUSB_ERROR_NO_MEM when I try to transmit u-boot unless I break it up into smaller transfers. This last USB log was done with an independent computer. So there might still be some value in breaking up large transfers into smaller pieces just to be safe. It doesn't seem to hurt anything, and possibly also limits the chances of a bunch of NAKs showing up between each chunk -- which has been demonstrated to cause issues in at least one place in these logs.

Computer 4 (Core i3-7100U laptop)

I'm running out of steam to keep showing USB traces, but this one has an interesting problem I have to bring up. libusb doesn't even detect the OMAP at all. I guess it takes too long before the OMAP decides nothing has tried to talk with it and disconnects. This seems to be caused by the issue brought up here on the libusb project. I can work around it by building a custom libusb 1.0 with udev disabled, so it uses netlink instead. Not sure what we can do to work around that other than bundling a custom libusb or bypassing libusb altogether for initial detection of the device.

Once that problem is out of the way, the rest of the testing is straightforward. The 1 ms delay isn't needed on this computer (but it doesn't hurt anything to add it). I do need the 20 ms delay though. Without it, I run into the same issue computer 3 where it sits there waiting without actually sending out a packet.

On this laptop, I also occasionally see the STALL issue that I saw on computer 2. The 1 ms delay seems to fix that. Further evidence that the 1 ms delay is probably a good idea in general. I think limiting outgoing transfers to 512 bytes at a time is also a good idea so that we're really cutting back on the NAKs/NYETs which seem to be problematic based on my captures.

Conclusion

I went pretty deep here. It would be interesting to see if Windows' USB stack has the same issues or not. I am tempted to put some of the blame on Linux and/or the USB host controller though -- I don't understand why waiting 20 ms suddenly allows the IN packet to actually go out successfully. This wouldn't be dependent at all on how far the OMAP X-loader -> u-boot transition has made it. That progress wouldn't be able to control whether or not the host can put an IN packet onto the bus.

In conclusion, here are the changes I think are good ideas to add:

I can definitely submit a PR for most of these. Whew, hope this info isn't overwhelming! It was definitely interesting to do all the USB tracing.