home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
5.07k stars 992 forks source link

rpi5-64 (Raspberry Pi 5 64-bit OS) image and Pineboard Coral Edge TPU #3403

Open Zaleo80 opened 5 months ago

Zaleo80 commented 5 months ago

Describe the issue you are experiencing

I am not sure if this is a home-assistant operating-system issue, Frigate or a Google Coral issue, so I also posted this here.

Basically, when installing a Raspberry mPCIe Coral Edge TPU or a Dual Edge Coral TPU with the corresponding Google Coral (Dual) Edge TPU on a Pi5 it is my opinion that this should work in Frigate when changing the detector from cpu to edgetpu in the frigate config file.

I got the famous Permission checking failed error for the apex device in the dmesg, so I tried to add a /etc/udev/rules.d/65-apex.rules file to set mode to 0660 for root:root. That error is now gone.

What operating system image do you use?

rpi5-64 (Raspberry Pi 5 64-bit OS)

What version of Home Assistant Operating System is installed?

Home Assistant OS 12.3

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

  1. Have a working Frigate Add installation running a CPU detector.
  2. Connect a Pineboards Hat AI! Dual Edge Coral TPU Bundle for Raspberry Pi 5
  3. Boot the Home Assistant Pi
  4. Check the Frigate addon is working using the CPU type detector.
  5. Change the config in Frigate.yaml from CPU to edgetpu: image ...

Anything in the Supervisor logs that might be useful for us?

Using HassOS SSH port 22222 Configurator and PuTTy:

# ls /dev/apex_0
/dev/apex_0

# dmesg | grep apex
[    1.898889] apex 0000:03:00.0: enabling device (0000 -> 0002)
[    1.899761] apex 0000:03:00.0: Couldn't initialize interrupts: -28
[    7.132396] apex 0000:03:00.0: Apex performance not throttled due to temperature

# lspci
03:00.0 Class 0880: 1ac1:089a
02:07.0 Class 0604: 1b21:1182
00:00.0 Class 0604: 14e4:2712
02:03.0 Class 0604: 1b21:1182
01:00.0 Class 0604: 1b21:1182
00:00.0 Class 0604: 14e4:2712
04:00.0 Class 0108: 1987:5013
01:00.0 Class 0200: 1de4:0001

# modinfo apex
filename:       /lib/modules/6.6.28-haos-raspi/updates/apex.ko
author:         John Joseph <jnjoseph@google.com>
license:        GPL v2
version:        1.2
description:    Google Apex driver
srcversion:     700E8BBBE9CC23C6EC17712
alias:          pci:v00001AC1d0000089Asv*sd*bc*sc*i*
depends:        gasket
name:           apex
vermagic:       6.6.28-haos-raspi SMP preempt mod_unload modversions aarch64
parm:           allow_power_save:int
parm:           allow_sw_clock_gating:int
parm:           allow_hw_clock_gating:int
parm:           bypass_top_level:int
parm:           trip_point0_temp:int
parm:           trip_point1_temp:int
parm:           trip_point2_temp:int
parm:           hw_temp_warn1:int
parm:           hw_temp_warn2:int
parm:           hw_temp_warn1_en:bool
parm:           hw_temp_warn2_en:bool
parm:           temp_poll_interval:int

### Anything in the Host logs that might be useful for us?

```txt
Process detector:coral1:
[2024-06-03 11:27:45] frigate.detectors.plugins.edgetpu_tfl ERROR   : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors.
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate
    delegate = Delegate(library, options)
  File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__
    raise ValueError(capture.message)
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/frigate/frigate/object_detection.py", line 102, in run_detector
    object_detector = LocalObjectDetector(detector_config=detector_config)
  File "/opt/frigate/frigate/object_detection.py", line 53, in __init__
    self.detect_api = create_detector(detector_config)
  File "/opt/frigate/frigate/detectors/__init__.py", line 18, in create_detector
    return api(detector_config)
  File "/opt/frigate/frigate/detectors/plugins/edgetpu_tfl.py", line 41, in __init__
    edge_tpu_delegate = load_delegate("libedgetpu.so.1.0", device_config)
  File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate
    raise ValueError('Failed to load delegate from {}\n{}'.format(
ValueError: Failed to load delegate from libedgetpu.so.1.0

System information

System Information

version core-2024.6.1
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.2
os_name Linux
os_version 6.6.28-haos-raspi
arch aarch64
timezone Europe/Amsterdam
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 5000 Installed Version | 1.34.0 Stage | running Available Repositories | 1387 Downloaded Repositories | 22
Home Assistant Cloud logged_in | true -- | -- subscription_expiration | 15 april 2025 om 02:00 relayer_connected | true relayer_region | eu-central-1 remote_enabled | true remote_connected | true alexa_enabled | false google_enabled | true remote_server | eu-central-1-10.ui.nabu.casa certificate_status | ready instance_id | bc6c685200cb4e3dba25cc25f82aef2d can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 12.3 -- | -- update_channel | stable supervisor_version | supervisor-2024.06.0 agent_version | 1.6.0 docker_version | 25.0.5 disk_total | 228.5 GB disk_used | 123.8 GB healthy | true supported | true host_connectivity | true supervisor_connectivity | true ntp_synchronized | true virtualization | board | rpi5-64 supervisor_api | ok version_api | ok installed_addons | Studio Code Server (5.15.0), Terminal & SSH (9.14.0), Samba share (12.3.1), Home Assistant Google Drive Backup (0.112.1), eufy-security-ws (1.8.0-2), Mosquitto broker (6.4.1), ESPHome (2024.5.5), HassOS SSH port 22222 Configurator (0.9.3), Whisper (2.1.0), Piper (1.5.0), Music Assistant (2.0.4), Frigate (Full Access) (0.13.2), openWakeWord (1.10.0)
Dashboards dashboards | 4 -- | -- resources | 11 views | 8 mode | storage
Recorder oldest_recorder_run | 31 mei 2024 om 10:23 -- | -- current_recorder_run | 9 juni 2024 om 11:27 estimated_db_size | 1455.27 MiB database_engine | sqlite database_version | 3.44.2

Additional information

When I search the internet, I find things about device tree sets up and the PCIe bus to not have enough MSI-X interrupts? This is a bit too in depth for how much I know about Linux at this point. Advice is welcome.

zerblatt007 commented 3 months ago

I have the same problem using the Coral TPU on Pimoroni NVMe Base Duo. There is also a "permission denied" in the kernel log, but I am not sure if that is because MSI-X missing or access rights to the device: crw------- 1 root root 120, 0 Aug 7 14:37 /dev/apex_0

I am using Frigate Full Access with disabled protections mode also.

I am not sure how to do the device tree change on Home Assistant OS.

This is my lspci -vvv TPU details:

0000:03:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU (prog-if ff)
    Subsystem: Global Unichip Corp. Coral Edge TPU
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 39
    Region 0: Memory at 1800100000 (64-bit, prefetchable) [size=16K]
    Region 2: Memory at 1800000000 (64-bit, prefetchable) [size=1M]
    Capabilities: [80] Express (v2) Endpoint, IntMsgNum 0
        DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 26W TEE-IO-
        DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 256 bytes, MaxReadReq 512 bytes
        DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta: Speed 5GT/s, Width x1
            TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
             10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS- TPHComp- ExtTPHComp-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
             AtomicOpsCtl: ReqEn-
             IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
             10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
        LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
        LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
             EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
             Retimer- 2Retimers- CrosslinkRes: unsupported
    Capabilities: [d0] MSI-X: Enable- Count=128 Masked-
        Vector table: BAR=2 offset=00046800
        PBA: BAR=2 offset=00046068
    Capabilities: [e0] MSI: Enable- Count=1/32 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [f8] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
    Capabilities: [108 v1] Latency Tolerance Reporting
        Max snoop latency: 0ns
        Max no snoop latency: 0ns
    Capabilities: [110 v1] L1 PM Substates
        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
              PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
        L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
               T_CommonMode=0us LTR1.2_Threshold=0ns
        L1SubCtl2: T_PwrOn=10us
    Capabilities: [200 v2] Advanced Error Reporting
        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
        CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
    Kernel driver in use: apex
srpablillo commented 3 months ago

Interested on @Zaleo80 issue. Also trying to run Frigate with the Pineboard Dual Edge Coral TPU on a RP5 with HASOS

SG87 commented 3 months ago

I have the same issue Pineboard Dual Edge Coral TPU on a RP5 with HASOS

Zaleo80 commented 3 months ago

Found out that the drivers for PCI switch used on the Pineboard (ASMedia PCIe Switch - Gen2 variant) that come with HAOS causes the issue. Unfraternally, I don't have any experience or a development setup to create a pull request to solve this issue.

sairon commented 3 months ago

@Zaleo80 Can you be more specific? Is it something discussed somewhere else, possibly with a known fix/workaround? Maybe I could help with preparing a test build for you then.

github-actions[bot] commented 1 day ago

There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant OS version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.