raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.1k stars 4.97k forks source link

Unable to use Hailo-8L from AI Kit behind PCI Express Bridge/switch #6206

Closed geerlingguy closed 4 months ago

geerlingguy commented 4 months ago

Describe the bug

When I boot my Pi with the Hailo-8L plugged into any PCI Express switch (e.g. a Dual-NVMe HAT board), I get the following errors when the Hailo driver tries initializing the driver:

pi@ai-pi:~ $ dmesg | grep hailo
[    3.431097] hailo: Init module. driver version 4.17.0
[    3.437216] hailo 0000:05:00.0: Probing on: 1e60:2864...
[    3.437227] hailo 0000:05:00.0: Probing: Allocate memory for device extension, 11600
[    3.437257] hailo 0000:05:00.0: enabling device (0000 -> 0002)
[    3.437265] hailo 0000:05:00.0: Probing: Device enabled
[    3.437283] hailo 0000:05:00.0: Probing: mapped bar 0 - 00000000731579ec 16384
[    3.438839] hailo 0000:05:00.0: Probing: mapped bar 2 - 00000000ae37d1ac 4096
[    3.438847] hailo 0000:05:00.0: Probing: mapped bar 4 - 00000000e95e07d5 16384
[    3.438853] hailo 0000:05:00.0: Probing: Force setting max_desc_page_size to 4096 (recommended value is 16384)
[    3.438866] hailo 0000:05:00.0: Probing: Enabled 64 bit dma
[    3.438869] hailo 0000:05:00.0: Probing: Using userspace allocated vdma buffers
[    3.438876] hailo 0000:05:00.0: Disabling ASPM L0s 
[    3.438882] hailo 0000:05:00.0: Successfully disabled ASPM L0s 
[    3.446839] hailo 0000:05:00.0: Failed to enable MSI -28
[    3.446849] hailo 0000:05:00.0: Failed Enabling interrupts -28
[    3.446912] hailo 0000:05:00.0: Failed activating board -28
[    3.446935] hailo: probe of 0000:05:00.0 failed with error -28

(Note: I'm testing a configuration with two Hailo-8 series chips behind a bridge, that's how this came up...)

See below 'additional context' for a fix.

To reproduce

Plug a Hailo-8L M.2 Accelerator into any PCI Express switch/bridge, and plug that into the Pi 5. Boot the Pi, follow the Getting Started instructions for the AI Kit, and check dmesg or run hailortcli fw-control identify and observe the results.

Expected behaviour

The Hailo-8L should work as normal, and be usable by applications like rpicam-apps.

Actual behaviour

The Hailo-8L is not usable, and you get the errors as stated above.

System Copy and paste the results of the raspinfo command in to this section. Alternatively, copy and paste a pastebin link, or add answers to the following questions:

Pi 5 model B 4GB
Raspberry Pi reference 2024-03-15
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, f19ee211ddafcae300827f953d143de92a5c6624, stage4
2024/06/04 09:21:37 
Copyright (c) 2012 Broadcom
version 503a909a (release) (embedded)
Linux ai-pi 6.6.31-v8-16k+ raspberrypi/firmware#1766 SMP PREEMPT Fri May 24 12:15:34 BST 2024 aarch64 GNU/Linux

Logs

See above.

Additional context

If I add the following overlay to /boot/firmware/config.txt:

dtoverlay=pineboards-hat-ai

...and then reboot, the Hailo-8L initializes properly, shows up using identify, and can be used for inference by applications on the Pi. Additionally, multiple Hailo-8L can be used this way too :)

Related: https://github.com/raspberrypi/linux/pull/6126

timg236 commented 4 months ago

Moved to Linux repo because quirks for PCIe switches would be a kernel change.

P33M commented 4 months ago

This should be fixed by dtoverlay=pciex1-compat-pi5,no-mip.

See https://github.com/raspberrypi/firmware/blob/master/boot/overlays/README#L3544

geerlingguy commented 4 months ago

@P33M - Indeed, if I add just that overlay, the device is enabled and firmware loads successfully.

P33M commented 4 months ago

In future the pcie*-compat* overlay(s) are going to be the way that users can flip switches until they get something that works with their particular bus topology.

Enquiring minds want to know: what's the output of sudo lspci -v in this case?