Open markmghali opened 2 years ago
I'm experiencing a similar issue where the second TPU, apex_1
, is visible but not responding, and the test model from the install instructions is failing. However, if I only use the first TPU, apex_0
, in frigate it works. I've opened a support ticket with the Coral team here.
Thank you I will check it out
I am having a heck of a time getting this working
I added pci=noaer pcie_aspm=off to my unraid OS section. It seemed like it was working better but after about an hour or so the whole server just stops responding
So now it works for a bit but then my whole server stops responding. I cannot SSH webgui nothing. I have to hard reboot it by holding the power button. I also don't think I can see logs as I have to reboot so I don't get the syslog.
I thought I was on the right path but I guess not.
@markmghali @tehniemer
Thanks for feedback and diagnostics info. I'm really interested to investigate cause of this issues to see if there's manufacturing flaw of particular incompatibility issue.
Could you please contact me using form at the bottom of page here with your order number?
@magic-blue-smoke ok I have reached out via the contact form.
Thank you
@magic-blue-smoke ok I have reached out via the contact form.
Thank you
likewise.
I was able to get both working again by removing the associated PCI devices and rescanning, however, this fix does not survive a reboot.
root@nvr:~# lspci -PP
00:00.0 Host bridge: Intel Corporation Device 3e0f (rev 08)
00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT1 [UHD Graphics 610]
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:14.5 SD Host controller: Intel Corporation Device a375 (rev 10)
00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #5 (rev f0)
00:1c.7 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 (rev f0)
00:1e.0 Communication controller: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller (rev 10)
00:1f.0 ISA bridge: Intel Corporation H370 Chipset LPC/eSPI Controller (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
00:1c.0/01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
00:1c.7/02:00.0 PCI bridge: ASMedia Technology Inc. Device 1182
00:1c.7/02:00.0/03:03.0 PCI bridge: ASMedia Technology Inc. Device 1182
00:1c.7/02:00.0/03:07.0 PCI bridge: ASMedia Technology Inc. Device 1182
00:1c.7/02:00.0/03:03.0/04:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU
00:1c.7/02:00.0/03:07.0/05:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU
00:1d.0/06:00.0 Non-Volatile memory controller: Shenzhen Longsys Electronics Co., Ltd. Device 1d97 (rev 01)
echo 1 >/sys/bus/pci/devices/0000:00:1c.7/remove
echo 1 >/sys/bus/pci/rescan
I figured out the problem I was having wasn't really a problem after all, turns out the host machine can't use a PCI device that has been passed into a docker container. Once I realized that I determined everything is working properly.
What particular machine do you have? I was hoping to use this adapter with a Synology and pass the PCI coral to my docker container.
What particular machine do you have? I was hoping to use this adapter with a Synology and pass the PCI coral to my docker container.
@nmajin I'll try to explain in other words what @tehniemer mean
When using VMs, they don't have direct access to hardware of your PC. Instead, VM environment emulates network card, drives, video adapter and other hardware. Coral TPU can't be emulated and needs PCIe pass through - a mechanism to "pull out" particular PCIe device from host PC and provide exclusive access to it within VM.
Now if I get it right, adapter made both Coral TPUs available for use. However, one TPU was configured with PCIe passthrough to VM, another was not and remained available in host system. This is expected behavior and means that TPUs can be used in a number of combinations:
@magic-blue-smoke thanks for the detail and providing more context.
So, to clarify both TPUs being available as passthrough (to a docker container), is that possible with this adapter and the dual edge TPU Coral? Sorry, just want to clarify I can in fact use both TPUs if and when I get the Coral and the adapter.
In my configuration I have both TPUs passed through to a docker container.
Just to add an additional anecdoate: I run Frigate with this dual-tpu-adapter in my Unraid Server, both TPUs are passed in and have not had any such issues, been running for about 3 months now. I was sure to disable all C-States for my CPU in the BIOS which is something I've always had to do do ensure stability with Unraid.
@magic-blue-smoke you stated you were making another revision of this adapter? I am tempted to buy it and try again though. I feel like I will have the same issues as I did before.
Hello,
I have been trying to get your PCIe adaptor to work for a few months now with no luck. I am using unraid with Frigate v0.10 Docker container. I can see both TPUs as apex_0 and apex_1. Symptom is Frigate will un for a bot then I get a PCIe error in my syslog for unraid. IT will then shutdown one of the TPUs and the Temp goes negative. I have posted my issues in the Frigate github and the unraid forums with no luck. I have reposted my unraid post below. Please let me know what else I can troubleshoot. Love all the work you have done for the community hoping to get this to work properly.
I am having a similar issue to @AdvancedMobileRepairs Using the Dual TPU in Magic-Blue-smoke PCIe adapter. Prior to this I was using a single TPU with a different adapter that was working fine. I have been monitoring the Coral Temperatures at they have not been going above 48 Degrees. I have this error in my syslog:
If anyone has any insight into this? I already asked in the Frigate github and we troubleshooted to a point but then they told me to ask in the unraid forum.
Thank you
EDIT EDIT:
Per this thread:
https://forums.unraid.net/topic/103901-solved-aer-pcie-bus-errors/
I disabled ASPM on PCIe in my BIOS. restarted server and running frigate to see how long it works before the coral shuts down.
And it failed again! That did not fix the issue. very weird
Temp is not the issue it seems
Any insight?