magic-blue-smoke / Dual-Edge-TPU-Adapter

Dual Edge TPU Adapter to use it on a system with single PCIe port on m.2 A/B/E/M slot
271 stars 3 forks source link

Host crashing after passthrough in proxmox #60

Open joshtwc opened 2 months ago

joshtwc commented 2 months ago

So long story short, I am setting up a home assistant/frigate vm and I need to pass through the dual edge tpu to frigate. I have come very close and it appears in home assistant and in frigate, but after some time it will crash the host (which is an HP ProLiant DL380 G10) running Proxmox 8.2 with the following error messages (in iLO):

Uncorrectable Machine Check Exception (Processor 1, APIC ID 0x00000000, Bank 0x00000006, Status 0xBB800000'00000E0B, Address 0x00000000'00000000, Misc 0x00000000'36000000).
Uncorrectable PCI Express Error Detected. Slot 2 (Segment 0x0, Bus 0x36, Device 0x0, Function 0x0). Uncorrectable Error Status: 0x4000```

Here is the lspci information:

37:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:1182]
        Subsystem: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:118f]
        Kernel driver in use: pcieport
38:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:1182]
        Subsystem: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:118f]
        Kernel driver in use: pcieport
38:07.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:1182]
        Subsystem: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:118f]
        Kernel driver in use: pcieport
39:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
        Subsystem: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
        Kernel driver in use: vfio-pci
3a:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
        Subsystem: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
        Kernel driver in use: vfio-pci

My VM config:

agent: 1
bios: ovmf
boot: order=scsi0
cores: 12
cpu: host
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,size=4M
hostpci0: 0000:3a:00
hostpci1: 0000:39:00
localtime: 1
memory: 65536
meta: creation-qemu=8.1.5,ctime=1712586677
name: #########
numa: 0
ostype: l26
protection: 1
scsi0: local-lvm:vm-100-disk-1,cache=writethrough,discard=on,size=32G,ssd=1
scsihw: virtio-scsi-pci
sockets: 2
tablet: 0
tags:  

It is an dual intel xeon motherboard, the adapter is plugged into a riser card at the back of the unit. I have tried the following:

Its strange that it only crashes upon starting frigate, and it runs for a bit (stable) until it crashes suddenly with no useful logs other than those from HP Integrated Lights Out (iLO)

magic-blue-smoke commented 1 month ago

Hi @joshtwc Sorry for the late reply. Is there a chance to try it with desktop PC, rather than server?

joshtwc commented 1 month ago

Hey, I did try it in a desktop and it worked no problem.

joshtwc commented 1 month ago

Hey, I did try it in a desktop and it worked fine. Its just odd that it works on my server for a few minutes then crashes the host, every time without fail.


From: magic-blue-smoke @.> Sent: Thursday, June 6, 2024 6:58 PM To: magic-blue-smoke/Dual-Edge-TPU-Adapter @.> Cc: Josh Wood @.>; Mention @.> Subject: Re: [magic-blue-smoke/Dual-Edge-TPU-Adapter] Host crashing after passthrough in proxmox (Issue #60)

Hi @joshtwchttps://github.com/joshtwc Sorry for the late reply. Is there a chance to try it with desktop PC, rather than server?

— Reply to this email directly, view it on GitHubhttps://github.com/magic-blue-smoke/Dual-Edge-TPU-Adapter/issues/60#issuecomment-2153529267, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AZCYQJB4PTFIL3XDJ3F4Y3DZGDSS5AVCNFSM6AAAAABIDXS5EWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJTGUZDSMRWG4. You are receiving this because you were mentioned.Message ID: @.***>

magic-blue-smoke commented 1 month ago

@joshtwc given it does work with desktop PC, we can conclude incompatibility with particular server. Letter sent if you'd like to return the board