Open grant-h opened 9 years ago
Hello grant-h!
I have several machines on which my board doesn't want to boot at all(100% times). Regular output is: $ sudo ./omap_loader -v -f x-load.bin OMAP Loader 1.0.0 File 'x-load.bin' at 0x40200000, size 15200 [+] scanning for USB device matching 0451:d00e... [+] successfully opened 0451:d00e (Texas Instruments OMAP3630) [+] got ASIC ID - Num Subblocks [05], Device ID Info [01050136300707], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [15090113bf3eef00000000] [-] fatal transfer error (BULK_OUT) for 15200 bytes (0 made it): LIBUSB_ERROR_PIPE [-] failed to send file 'x-load.bin' (size 15200) [-] failed to transfer the first stage file 'x-load.bin'
Are you interested to investigate this issue?
I'd recompiled with staging changes:
$ sudo ./omap_loader -v -f x-load.bin OMAP Loader 1.0.0 [+] Will load 'x-load.bin' at 0x40200000 with size of 15200 bytes [+] scanning for USB device matching 0451:d00e... [D] using configuration 1 [+] successfully opened 0451:d00e (Texas Instruments OMAP3630) [+] got ASIC ID - Num Subblocks [05], Device ID Info [01050136300707], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [15090113bf3eef00000000] [+] WARNING: BULK_OUT transfer timed out. Progress 0/4. Try 1/2 [+] WARNING: BULK_OUT transfer timed out. Progress 0/4. Try 2/2 [+] WARNING: BULK_OUT transfer timed out. Progress 0/4. Try 3/2 [-] device timed out while transfering out 4 bytes (0 made it) [-] failed to length specifier of 15200 to OMAP BootROM [-] failed to transfer the first stage file 'x-load.bin'
Thanks for looking in to this issue. I dont have a device I can test with anymore (unfortunately). This whole issue seems limited to certain USB ports / computers. I really can't put my finger on it.
With the fix-no-upload
branch compiled, please add a -d
to your command line arguments. Then go ahead repost your output in a code block (start and end with ```)
$ sudo ./omap_loader -d -v -f x-load.bin OMAP Loader 1.0.0 [+] Will load 'x-load.bin' at 0x40200000 with size of 15200 bytes [+] scanning for USB device matching 0451:d00e... [D] USB device 0451:d00e info [D] USB BCD 0x0210 [D] Endpoint 0 Max Pkt Size 64 [D] Number of configurations: 1 [D] Configuration Descriptor 0 [D] Interfaces 1 [D] Interface Descriptor 0 [D] Alternate Setting 0 [D] using configuration 1 [D] auto detaching kernel driver on interface claim [D] interface successfully claimed [+] successfully opened 0451:d00e (Texas Instruments OMAP3630) [+] got ASIC ID - Num Subblocks [05], Device ID Info [01050136300707], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [15090113bf3eef00000000] [D] sent peripheral boot command f0030002... [D] sent length of 'x-load.bin' (15200)... [-] fatal transfer error (BULK_OUT) for 15200 bytes (0 made it): LIBUSB_ERROR_PIPE [-] failed to send file 'x-load.bin' (size 15200) [-] failed to transfer the first stage file 'x-load.bin'
I found that if I am loading only one file - sometimes it succeeds(rare). But when I specify several files, somehow even first failed to load.
lspci -v
on your test machine?Unfortunately, without a chip to program myself, it will be difficult to fix this issue. Packet captures may help, but without a chip, this will be hard to debug.
$ sudo lspci -v 00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: bus master, fast devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=0c <?> Kernel driver in use: hsw_uncore
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 0000e000-0000efff Memory behind bridge: f7e00000-f7efffff Prefetchable memory behind bridge: 00000000e0000000-00000000efffffff Capabilities: [88] Subsystem: ASUSTeK Computer Inc. Device 8534 Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [a0] Express Root Port (Slot+), MSI 00 Capabilities: [100] Virtual Channel Capabilities: [140] Root Complex Link Capabilities: [d94] #19 Kernel driver in use: pcieport
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05) (prog-if 30 [XHCI]) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: bus master, medium devsel, latency 0, IRQ 25 Memory at f7f00000 (64-bit, non-prefetchable) [size=64K] Capabilities: [70] Power Management version 2 Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+ Kernel driver in use: xhci_hcd
00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: bus master, fast devsel, latency 0, IRQ 28 Memory at f7f1b000 (64-bit, non-prefetchable) [size=16] Capabilities: [50] Power Management version 3 Capabilities: [8c] MSI: Enable+ Count=1/1 Maskable- 64bit+ Kernel driver in use: mei_me
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05) (prog-if 20 [EHCI]) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: bus master, medium devsel, latency 0, IRQ 20 Memory at f7f18000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features Kernel driver in use: ehci-pci
00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 05) Subsystem: ASUSTeK Computer Inc. Device 8576 Flags: bus master, fast devsel, latency 0, IRQ 29 Memory at f7f10000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 2 Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Virtual Channel Kernel driver in use: snd_hda_intel
00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 I/O behind bridge: 00002000-00002fff Memory behind bridge: f0100000-f02fffff Prefetchable memory behind bridge: 00000000f0300000-00000000f04fffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Device 8534 Capabilities: [a0] Power Management version 3 Kernel driver in use: pcieport
00:1c.2 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 (rev d5) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 I/O behind bridge: 0000d000-0000dfff Memory behind bridge: f7d00000-f7dfffff Prefetchable memory behind bridge: 00000000f0000000-00000000f00fffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Device 8534 Capabilities: [a0] Power Management version 3 Kernel driver in use: pcieport
00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d5) (prog-if 01 [Subtractive decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=04, subordinate=05, sec-latency=0 Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Device 8534 Capabilities: [a0] Power Management version 3
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05) (prog-if 20 [EHCI]) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: bus master, medium devsel, latency 0, IRQ 23 Memory at f7f17000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features Kernel driver in use: ehci-pci
00:1f.0 ISA bridge: Intel Corporation B85 Express LPC Controller (rev 05) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: bus master, medium devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=0c <?> Kernel driver in use: lpc_ich
00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode](rev 05) (prog-if 01 [AHCI 1.0]) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 26 I/O ports at f070 [size=8] I/O ports at f060 [size=4] I/O ports at f050 [size=8] I/O ports at f040 [size=4] I/O ports at f020 [size=32] Memory at f7f16000 (32-bit, non-prefetchable) [size=2K] Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [70] Power Management version 3 Capabilities: [a8] SATA HBA v1.0 Kernel driver in use: ahci
00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05) Subsystem: ASUSTeK Computer Inc. Device 8534 Flags: medium devsel, IRQ 14 Memory at f7f15000 (64-bit, non-prefetchable) [size=256] I/O ports at f000 [size=32]
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM](prog-if 00 [VGA controller]) Subsystem: PC Partner Limited / Sapphire Technology Device e204 Flags: bus master, fast devsel, latency 0, IRQ 30 Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at f7e20000 (64-bit, non-prefetchable) [size=128K] I/O ports at e000 [size=256] Expansion ROM at f7e00000 [disabled] [size=128K] Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [150] Advanced Error Reporting Kernel driver in use: radeon
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI Audio [Radeon HD 6400 Series] Subsystem: PC Partner Limited / Sapphire Technology Radeon HD 6450 1GB DDR3 Flags: bus master, fast devsel, latency 0, IRQ 31 Memory at f7e40000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [150] Advanced Error Reporting Kernel driver in use: snd_hda_intel
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) Subsystem: ASUSTeK Computer Inc. Device 8554 Flags: bus master, fast devsel, latency 0, IRQ 27 I/O ports at d000 [size=256] Memory at f7d00000 (64-bit, non-prefetchable) [size=4K] Memory at f0000000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Endpoint, MSI 01 Capabilities: [b0] MSI-X: Enable- Count=4 Masked- Capabilities: [d0] Vital Product Data Capabilities: [100] Advanced Error Reporting Capabilities: [140] Virtual Channel Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00 Capabilities: [170] Latency Tolerance Reporting Kernel driver in use: r8169
04:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI Bridge (rev 03) (prog-if 01 [Subtractive decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=04, secondary=05, subordinate=05, sec-latency=32 Capabilities: [c0] Subsystem: ASUSTeK Computer Inc. Device 8489
I am testing with my custom board. The processor is DM3725CBP(800MHz version, POP memory). Also I have SBC8140 board and I see similar behavior. I am loading x-loader. Compiled from x-loader-usb found in internet when searching omap usb load. Sources are modified for my board.
I have a hardware USB sniffer. Right now I have only log for failed loading. It is in specific format so I am attaching a picture.
And on machine where usb load is mostly successful:
$ lspci -v
00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
Subsystem: Hewlett-Packard Company Device 1846
Flags: bus master, fast devsel, latency 0
Capabilities:
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 00003000-00003fff
Memory behind bridge: d0900000-d09fffff
Prefetchable memory behind bridge: 00000000b0000000-00000000bfffffff
Capabilities:
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09) (prog-if 00 [VGA controller])
Subsystem: Hewlett-Packard Company Device 17f4
Flags: bus master, fast devsel, latency 0, IRQ 49
Memory at d0000000 (64-bit, non-prefetchable) [size=4M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
I/O ports at 4000 [size=64]
Expansion ROM at
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04) (prog-if 30 [XHCI])
Subsystem: Hewlett-Packard Company Device 1846
Flags: bus master, medium devsel, latency 0, IRQ 45
Memory at d0a00000 (64-bit, non-prefetchable) [size=64K]
Capabilities:
00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04)
Subsystem: Hewlett-Packard Company Device 1846
Flags: bus master, fast devsel, latency 0, IRQ 48
Memory at d0a14000 (64-bit, non-prefetchable) [size=16]
Capabilities:
00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04) (prog-if 20 [EHCI])
Subsystem: Hewlett-Packard Company Device 1846
Flags: bus master, medium devsel, latency 0, IRQ 16
Memory at d0a19000 (32-bit, non-prefetchable) [size=1K]
Capabilities:
00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
Subsystem: Hewlett-Packard Company Device 1846
Flags: bus master, fast devsel, latency 0, IRQ 50
Memory at d0a10000 (64-bit, non-prefetchable) [size=16K]
Capabilities:
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
Memory behind bridge: d0800000-d08fffff
Capabilities:
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 3 (rev c4) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
Memory behind bridge: d0700000-d07fffff
Prefetchable memory behind bridge: 00000000af200000-00000000af2fffff
Capabilities:
00:1c.3 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 4 (rev c4) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
Memory behind bridge: d0600000-d06fffff
Capabilities:
00:1c.5 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 6 (rev c4) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
I/O behind bridge: 00002000-00002fff
Memory behind bridge: d0500000-d05fffff
Prefetchable memory behind bridge: 00000000d0400000-00000000d04fffff
Capabilities:
00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04) (prog-if 20 [EHCI])
Subsystem: Hewlett-Packard Company Device 1846
Flags: bus master, medium devsel, latency 0, IRQ 16
Memory at d0a18000 (32-bit, non-prefetchable) [size=1K]
Capabilities:
00:1f.0 ISA bridge: Intel Corporation HM76 Express Chipset LPC Controller (rev 04)
Subsystem: Hewlett-Packard Company Device 1846
Flags: bus master, medium devsel, latency 0
Capabilities:
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode](rev 04) (prog-if 01 [AHCI 1.0])
Subsystem: Hewlett-Packard Company Device 1846
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 46
I/O ports at 4068 [size=8]
I/O ports at 4074 [size=4]
I/O ports at 4060 [size=8]
I/O ports at 4070 [size=4]
I/O ports at 4040 [size=32]
Memory at d0a17000 (32-bit, non-prefetchable) [size=2K]
Capabilities:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Thames [Radeon HD 7550M/7570M/7650M](prog-if 00 [VGA controller])
Subsystem: Hewlett-Packard Company Radeon HD 7650M
Flags: bus master, fast devsel, latency 0, IRQ 51
Memory at b0000000 (64-bit, prefetchable) [size=256M]
Memory at d0900000 (64-bit, non-prefetchable) [size=128K]
I/O ports at 3000 [size=256]
Expansion ROM at d0920000 [disabled] [size=128K]
Capabilities:
03:00.0 System peripheral: JMicron Technology Corp. SD/MMC Host Controller (rev 30)
Subsystem: Hewlett-Packard Company Device 17f6
Flags: bus master, fast devsel, latency 0, IRQ 18
Memory at d0703000 (32-bit, non-prefetchable) [size=256]
Expansion ROM at af200000 [disabled] [size=64K]
Capabilities:
03:00.2 SD Host controller: JMicron Technology Corp. Standard SD Host Controller (rev 30) (prog-if 01)
Subsystem: Hewlett-Packard Company Device 17f6
Flags: fast devsel, IRQ 18
Memory at d0702000 (32-bit, non-prefetchable) [size=256]
Capabilities:
03:00.3 System peripheral: JMicron Technology Corp. MS Host Controller (rev 30)
Subsystem: Hewlett-Packard Company Device 17f6
Flags: bus master, fast devsel, latency 0, IRQ 18
Memory at d0701000 (32-bit, non-prefetchable) [size=256]
Capabilities:
04:00.0 Network controller: Ralink corp. RT3290 Wireless 802.11n 1T/1R PCIe
Subsystem: Hewlett-Packard Company Device 18ec
Flags: bus master, fast devsel, latency 0, IRQ 19
Memory at d0610000 (32-bit, non-prefetchable) [size=64K]
Capabilities:
04:00.1 Bluetooth: Ralink corp. RT3290 Bluetooth
Subsystem: Hewlett-Packard Company Device 18ec
Flags: bus master, fast devsel, latency 0, IRQ 10
Memory at d0600000 (32-bit, non-prefetchable) [size=64K]
Capabilities:
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 07)
Subsystem: Hewlett-Packard Company Device 1846
Flags: bus master, fast devsel, latency 0, IRQ 47
I/O ports at 2000 [size=256]
Memory at d0404000 (64-bit, prefetchable) [size=4K]
Memory at d0400000 (64-bit, prefetchable) [size=16K]
Capabilities:
Thanks for this info. Sorry, but I won't have much time to figure out this issue. I'd need to buy some more hardware to play with before I can try and debug.
@dmitry-bodnya Hello again, do you know which generation Nest hardware you have? If you can find a version number or check out the information file when you plug in via USB, that'd be great.
@grant-h Hello. Sorry, but I don't have any kind of Nest hardware. I am working with our own board(with DM3725), which boots over usb for FW upgrading. And we have PCs with 100% times boot failed. It failed to boot on the very first stage, so I think it is something inside processor ROM, which is hard to investigate and impossible(for us) to change.
@grant-h Hi. May be it will be useful. I moved your code to Windows and after some changes(no getopt, no basename, and others) ran it. I met this issue again - ASIC ID received, x-loader load ok and then no "USBf"(file req) cmd receive. I put Sleep(500) in place, where you wrote about race condition in comments. After that, I got stable file-loading process(tested up to 6 files). Perhaps 500 ms is too much. But, just in theory, it is time for x-loader to boot and to write something to fifo before host side send IN token. If host is sending IN token before x-loader ready - then there is no successful exchange following. So, I guess delay is neccessary and transfer_other_files is not robust enough to work without delay. And also, when I began to transfer large files(uImage - 2MB, fs.bin - 25MB) I got usb timeouts. So I added timeout multiplier, which depends of tx size. Also, to work with OMAP on Windows, you should install driver(I found .inf file in internet, is uses WinUsb).
Anyway, I still have computer which fails to load even first stage.
Dmitry, that's good news! I've tried adding sleeps all over the code for some machines, but even then they still manage to not boot. Also, would you be willing to consolidate your changes that allowed this tool to work on Windows and share them in a pull request? It would be nice to have this tool be cross platform as it's only real dependency is libusb
.
Grant, I'm not good in git.. I can send you files, if You tell me where.
@dmitry-bodnya Here you go. My email is encoded here. Decode it and send me an attachment.
I'm way late to the party here, but sharing my solution.
I've been tinkering with an OMAP3530 BeagleBoard lately, and was running into a bunch of the same issues described in this thread when trying to boot it through USB. The problem only happens on one of my computers. Both of my computers are running Ubuntu 22.04. The working one is an older Intel machine with only USB 2.0 support. The broken one is a newer AMD machine with USB 3.0 support. I have a USB analyzer so I was able to compare the raw USB traffic between them. The command I'm running is:
./omap_loader -p 0xd009 -f x-load.bin -f u-boot.bin -a 0x80800000 -j 0x80800000
I'm still confused about some of the results I saw, but here are the changes I've made that have helped this program work on my AMD computer:
usleep(1000)
before both libusb_bulk_transfer() callsThe two changes above will allow x-loader to load properly, but I still couldn't get it to load secondary files until I added a usleep(20000);
after the "Note: this is a race between the target's processor getting X-loader running" comment. I can't explain why an arbitrary delay fixes it for me, but it does.
None of these changes seem to hurt anything on the older Intel computer that works with the stock code, by the way. So that's good...
I am struggling to fully understand why these changes fix it. It just seems like doing things to slow down and avoid NAKs/NYETs in the raw traffic helps with compatibility. As for the 20 ms delay: without it, I saw some very interesting (and repeatable) behavior when it tried to read from X-loader for the first time after loading it. No traffic appeared on the bus at all. A usbmon trace showed the bulk IN URB failed after about 5 ms with "-EPROTO". Then there was the 2 second delay, and then in my analyzer I actually saw an IN/DATA1/ACK on the bus with data from x-loader on the second attempt. But instead of reporting this data back to libusb, the host just kept doing more IN packets (which were NAKed because x-loader had nothing left to return). Eventually the URB was canceled after a second and the data was lost. For some reason the 20 ms delay completely fixes this issue.
I'm not sure if it's a bug in the Linux kernel or the host controller or what. I don't think it's a bug in the OMAP or x-loader. When a short data packet is received after the IN, shouldn't the Linux machine be saying "okay we got some data" and bailing, rather than trying to grab more data forever? If someone understands USB transactions in Linux better than me, I would love to deep dive further.
Edit: I tested another Intel laptop which doesn't need the usleep(1000)
before libusb_bulk_transfer() but does need the usleep(20000)
. I also ran into an interesting problem on the Intel laptop where libusb doesn't detect the device being attached at all because of how quickly it disables USB. It seems to be an issue some people run into with libusb-1.0, which I solved by compiling a custom build of libusb with udev disabled (uses netlink instead).
@dougg3 Awesome investigative work! It's cool that you have an USB analyzer at hand to actually get down to the PHY layer to figure what is going on. Do you happen to have a "smoking gun" screen shot of the protocol traces between the working and failing devices? What USB analyzer are you using by the way?
As for why this is happening, maybe since the target's USB stack is in flux during the initiated BULK transfer in combination with some flaky Linux USB stack code, things are going out of sync. Probably best to not race either stack. Since the changes are just sleep related I'm happy to accept a diff that adds them and get that commited.
Thanks @grant-h! I'm definitely excited to potentially improve the compatibility of this excellent utility and luckily I have a lot of computers to test with. I'm using Alex Taradov's open source USB sniffer which I built myself. It's really neat because it outputs files compatible with Wireshark. It's reliable for the most part, although sometimes I have to restart captures because it loses communication when I hotplug stuff. I think I might be using cables a little too long. I'm happy to show more detailed info from the sniffer here. I'm also happy to submit a PR to fix a few things (when I have a bit more free time).
Here is what my one perfectly working computer with the current codebase, a Core i3-2120, does. It doesn't support USB 3.0 at all, it has an EHCI host controller. It looks like when running this utility completely unmodified:
$ sudo ./omap_loader -p 0xd009 -f x-load.bin -f u-boot.bin -a 0x80800000 -j 0x80800000 -v
OMAP Loader 1.0.0
File 'x-load.bin' at 0x40200000, size 26956
File 'u-boot.bin' at 0x80800000, size 777760
[+] scanning for USB device matching 0451:d009...
[+] successfully opened 0451:d009 (Texas Instruments OMAP3430)
[+] got ASIC ID - Num Subblocks [05], Device ID Info [01050134300757], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [150901f7488f2800000000]
[+] uploading 'u-boot.bin' (size 777760) to 0x80800000
[+] jumping to address 0x80800000
[+] successfully transfered 2 files
Here's the start of the interesting part of the USB sniffing, after all of the descriptors are grabbed by Linux.
The length = 72 DATA0 is the initial info from the OMAP with the ASIC ID. You can see directly afterward, we do two OUT transactions with 7 bytes each (4 bytes of actual data in them). That's the boot from peripheral command, followed by the length. Finally we start sending the data in 512-byte packets. You'll notice that the first packet is ACKed, then the next packet is NAKed so we ping until getting an ACK, and then do the next OUT packet and so on. This continues until X-loader has been fully loaded.
The 335-byte packet here is the final packet, you can see it was NAKed the first time and had to be repeated. Then we start doing IN transactions after about 17 ms and they are NAKed for a while. I think this is while X-loader is still getting ready to go. Eventually we get a 16-byte response with DATA1 -- this is the USBffile req
initial response from X-loader, which then kicks off the rest of the transmit process. It all continues great from that point on, in a pretty similar fashion. I won't bother with another screenshot, but it's more of the same stuff including lots of PING/NAKs.
So these traces seem to show pretty normal USB 2.0 high speed behavior. It handles the PINGs, NAKs, etc. just fine on this machine and happily finishes the whole thing from start to finish.
Next, I'm going to move onto a Ryzen 5 3600. This has an XHCI host controller and supports USB 3.0 (same with all the remaining computers I'm testing). Here's the output I get when I run unmodified omap_loader on it:
$ sudo ./omap_loader -p 0xd009 -f x-load.bin -f u-boot.bin -a 0x80800000 -j 0x80800000 -v
OMAP Loader 1.0.0
File 'x-load.bin' at 0x40200000, size 26956
File 'u-boot.bin' at 0x80800000, size 777760
[+] scanning for USB device matching 0451:d009...
[+] successfully opened 0451:d009 (Texas Instruments OMAP3430)
[+] got ASIC ID - Num Subblocks [05], Device ID Info [01050134300757], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [150901f7488f2800000000]
[-] fatal transfer error (BULK_OUT) for 26956 bytes (0 made it): LIBUSB_ERROR_PIPE
[-] failed to send file 'x-load.bin' (size 26956)
[-] failed to transfer the first stage file 'x-load.bin'
As you can see it doesn't even succeed at sending anything. Here's the relevant part of the USB sniffer's trace:
What's going on here is we get the initial 72-byte ASIC ID. In return we send the two 7-byte (4-byte data) responses. The second one gets delayed a little bit with NAKs, but it eventually succeeds and we move onto sending data. The first chunk goes out fine, then we try the next chunk and it's NAKed. Eventually it returns a STALL packet. Wireshark does a poor job of showing it's a STALL, but that's what is going on, and that's why I highlighted it so you can see in the bottom left.
It's interesting that as soon as a new frame started, our first PING resulted in a STALL. It's almost like the OMAP bootloader isn't expecting a PING after a new frame has begun and is thinking we'll try sending a new packet instead. This almost does point to some kind of an edge case bug in the OMAP's on chip bootloader's USB implementation. Anyway, adding the 1 ms delay before libusb_bulk_transfer adds enough of a delay to allow X-loader to fully load.
However, now it starts failing on loading the next file:
$ sudo ./omap_loader -p 0xd009 -f x-load.bin -f u-boot.bin -a 0x80800000 -j 0x80800000 -v
OMAP Loader 1.0.0
File 'x-load.bin' at 0x40200000, size 26956
File 'u-boot.bin' at 0x80800000, size 777760
[+] scanning for USB device matching 0451:d009...
[+] successfully opened 0451:d009 (Texas Instruments OMAP3430)
[+] got ASIC ID - Num Subblocks [05], Device ID Info [01050134300757], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [150901f7488f2800000000]
[-] fatal transfer error (BULK_IN) for 512 bytes (got 0): LIBUSB_ERROR_IO
[-] device timed out while transfering in 512 bytes (got 0)
[-] device timed out while transfering in 512 bytes (got 0)
[-] failed to read command from X-Loader
[-] failed to transfer the additional files in to memory
Here's the USB trace where the failure starts happening:
It's stuck in an infinite loop of IN/NAKs for a while. Interestingly at some point we do an IN that gets no response at all and we try IN again. Still no response, I think this is when libusb bails. Then we wait for 2 seconds. Then we try again, and successfully do an IN/DATA1/ACK that gives us the USBffile req
response from X-loader, but then it just seems to get hung up forever doing IN/NAK until we time out and bail on it.
I'm not sure whether to blame the device, or the XHCI driver, or both. Sending an IN packet and getting no response at all seems fishy and I'm tempted to blame the device for that. Maybe it's happening right when X-Loader is first starting up and the OMAP's USB controller is being reconfigured. The weird part is that clearly the device eventually gives us 16 bytes of data, but for some reason the host controller just continues asking for more IN data immediately even though we got a short packet. I think it should be returning the 16 byte packet to libusb at that point, but libusb never sees it. I did some tracing on this and I don't think it's libusb's fault. The kernel never reports it to libusb. So I think that part smells kind of like an issue in the XHCI driver or controller, but I'm not totally sure.
Adding the 20 ms delay before attempting to read from X-loader fixes this issue and it fully transmits. Here's the same spot in the trace with 20 ms of delay added before we try the IN transaction to read from X-loader:
Notice that it doesn't immediately do another IN after the ACK for the 16 byte DATA1 packet, which makes way more sense than what we saw earlier. From this point on there are a ton of IN/NAK and OUT/NAK wait delays, but everything transmits fine. I see PINGs at the start of a frame with no problem, so that is more evidence that the STALL issue we saw before the 1ms delay was added is likely some kind of on-chip bootloader issue.
The result is pretty similar to computer 2's original failure, although it happens earlier.
$ sudo ./omap_loader -p 0xd009 -f x-load.bin -f u-boot.bin -a 0x80800000 -j 0x80800000 -v
OMAP Loader 1.0.0
File 'x-load.bin' at 0x40200000, size 26956
File 'u-boot.bin' at 0x80800000, size 777760
[+] scanning for USB device matching 0451:d009...
[+] successfully opened 0451:d009 (Texas Instruments OMAP3430)
[+] got ASIC ID - Num Subblocks [05], Device ID Info [01050134300757], Reserved [13020100], Ident Data [1215010000000000000000000000000000000000000000], Reserved [1415010000000000000000000000000000000000000000], CRC (4 bytes) [150901f7488f2800000000]
[-] device timed out while transfering out 4 bytes (0 made it)
[-] failed to length specifier of 26956 to OMAP BootROM
[-] failed to transfer the first stage file 'x-load.bin'
In this case the OMAP responds with NYET when we try to send the first response with the "boot from peripheral" command.
So this means the OMAP bootloader accepted the "boot from peripheral" packet but isn't ready for another packet. From this point on it gets stuck and never ACKs another packet even though we try brand new OUT packets later. I think this maybe also points to a bug in the OMAP on-chip bootloader? I don't see anything wrong with what the computer is trying to do here. The OMAP just gets hung up forever.
The 1 ms delay fixes that issue and then I get a completely different result than computer 2 when it starts trying to talk to X-loader:
[-] fatal transfer error (BULK_IN) for 512 bytes (got 0): LIBUSB_ERROR_IO
[-] device timed out while transfering in 512 bytes (got 0)
[-] device timed out while transfering in 512 bytes (got 0)
[-] failed to read command from X-Loader
[-] failed to transfer the additional files in to memory
It looks like we're sitting there doing nothing for 2 seconds before we finally read the "KUSBffile req" from X-loader, but I can confirm that we did try to read from it earlier. libusb returned an error. This seems to be a case where we asked the kernel to send out an IN request, but it refused and errored out instead. After the 2 second retry it finally sent an IN packet and got a response, but then it got stuck in a IN/NAK loop afterward and the data was never returned. Just like computer 2, this feels like some sort of kernel driver or XHCI design bug, because after we got the short data packet I don't think it should have continued trying to send IN requests. Simply adding a 20 ms delay before the IN causes the IN packet to show up correctly the first time, rather than a timeout with no packet at all -- which also seems like another kernel/XHCI bug.
While testing this out I realized that the reason I thought I needed the 512 byte limiter in my first comment (later crossed out) is because when I'm running my USB sniffer at the same time on the same computer, I end up with LIBUSB_ERROR_NO_MEM when I try to transmit u-boot unless I break it up into smaller transfers. This last USB log was done with an independent computer. So there might still be some value in breaking up large transfers into smaller pieces just to be safe. It doesn't seem to hurt anything, and possibly also limits the chances of a bunch of NAKs showing up between each chunk -- which has been demonstrated to cause issues in at least one place in these logs.
I'm running out of steam to keep showing USB traces, but this one has an interesting problem I have to bring up. libusb doesn't even detect the OMAP at all. I guess it takes too long before the OMAP decides nothing has tried to talk with it and disconnects. This seems to be caused by the issue brought up here on the libusb project. I can work around it by building a custom libusb 1.0 with udev disabled, so it uses netlink instead. Not sure what we can do to work around that other than bundling a custom libusb or bypassing libusb altogether for initial detection of the device.
Once that problem is out of the way, the rest of the testing is straightforward. The 1 ms delay isn't needed on this computer (but it doesn't hurt anything to add it). I do need the 20 ms delay though. Without it, I run into the same issue computer 3 where it sits there waiting without actually sending out a packet.
On this laptop, I also occasionally see the STALL issue that I saw on computer 2. The 1 ms delay seems to fix that. Further evidence that the 1 ms delay is probably a good idea in general. I think limiting outgoing transfers to 512 bytes at a time is also a good idea so that we're really cutting back on the NAKs/NYETs which seem to be problematic based on my captures.
I went pretty deep here. It would be interesting to see if Windows' USB stack has the same issues or not. I am tempted to put some of the blame on Linux and/or the USB host controller though -- I don't understand why waiting 20 ms suddenly allows the IN packet to actually go out successfully. This wouldn't be dependent at all on how far the OMAP X-loader -> u-boot transition has made it. That progress wouldn't be able to control whether or not the host can put an IN packet onto the bus.
In conclusion, here are the changes I think are good ideas to add:
I can definitely submit a PR for most of these. Whew, hope this info isn't overwhelming! It was definitely interesting to do all the USB tracing.
There is a particularly annoying issue where
omap_loader
will fail to upload beyond the first stage (x-loader) as x-loader communication fails. I'm not sure what the issue is or where in the stack it lies, but this needs to be investigated as it only affects certain machines.The failure that occurs is a USB PIPE or TIMEOUT error. On some machines the issue is complete reproducable. On others it may only happen every 1 in 5 times. The testing target - a Nest Thermostat - has a 10-15 second restart time which requires a button to be held down. In order to track down this bug, debug output from the x-loader needs to be printed and the error needs to be reproducible 100% of the time.
Checkout and compile off of branch
fix-no-upload
https://github.com/grant-h/omap_loader/tree/fix-no-upload To continue where I left off...