pcengines / apu2-documentation

Documentation and scripts for building and adjusting PC Engines APU2 firmware
https://pcengines.github.io/apu2-documentation/
208 stars 45 forks source link

XEN booting is unstable #109

Open miczyg1 opened 6 years ago

miczyg1 commented 6 years ago

Sometimes booting hangs at:

(XEN) CPU1: No irq handler for vector e7 (IRQ -2147483648) 
(XEN) CPU2: No irq handler for vector e7 (IRQ -2147483648)
pietrushnic commented 4 years ago

@artur-rs is that still the case, we have some signs that newer Xen may fix that problem also it looks like @miczyg1 added some fixes. More to that regression results lost link to that issue in recent tets results.

artur-rs commented 4 years ago

@pietrushnic @miczyg1 link has been fixed. It seems that the stability problems still occur (1/30 boots), more reliable results will be delivered with the regression testing on apu2-5 platforms on the next release

miczyg1 commented 4 years ago

Log from 100x reboots from Xen staging https://cloud.3mdeb.com/index.php/s/iyzYr78KK9BEF3f

pietrushnic commented 4 years ago

@miczyg1 you mean that problem was fixed upstream?

miczyg1 commented 4 years ago

Not 100% sure yet. This log is from debug build. In case it is a timing problem or something I would also like to conduct non-debug test round.

jpds commented 4 years ago

My apu4d4 with Debian testing as of this week with the following software versions:

...appears to have no trouble booting Xen whereas with Debian stable it would just crash on boot.

miczyg1 commented 4 years ago

There is a patch in coreboot that potentially could fix this issue: https://review.coreboot.org/c/coreboot/+/42434 Will be testing it soon.

HRio commented 3 years ago

I have APU4D4 with BIOS v4.12.0.5 and iommu enabled (doing NIC pass-trough)

Can not see this problem with XEN 4.13.2

miczyg1 commented 3 years ago

@HRio the newer versions of Xen seems to be working better. We still working on infrastructure to automatically test up-to-date versions of Xen, so we keep this open in case anybody faces the issue.

HRio commented 3 years ago

@miczyg1 May I take this opportunity to suggest you have a look on Alpine Linux? its a perfect fit for a device like this, and we aim to keep XEN up to date.

Alpine Version Xen Version
edge 4.14.1
3.13 4.14.1
3.12 4.13.2
miczyg1 commented 3 years ago

@HRio sure, we will take a look on that. Thank you

mmaney commented 3 years ago

I've been meaning to find the time to try Xen on apu2 here, and a couple days ago I pulled the 2E4 out and swapped in a scratch 2 1/2" SSD for some clean install fun. BIOS 4.11.0.6, not quite the very latest. Current Buster minimal install (with sshd because that's smoother than using a serial link, and standard system utils). Then the testing...

Installed Xen system (4.11), omitting qemu and his many, many friends and... Just Worked. Rebooted about a dozen times, with a power cycle or two for variety. No domUs were started (or installed). Then I noticed that some possibly useful features were disabled (by default, I assume) in the BIOS, so enabled first iommu, then EHCI, with a couple reboots each time. Never failed to boot for me.

pietrushnic commented 3 years ago

@mmaney I believe the failure we see is when we do 100x consecutive reboots using our automated testing environment. Please check our Regression Test Results, so maybe you were lucky :)

miczyg1 commented 3 years ago

Also our testing precedure for Xen is different. We are booting from network and using Xen 4.8 (yes, not so young). So definitely we need newer versions. When installed on a physical driver with 4.11 or newer, also didn't face the issues detected in the regression. We keep this issue opened until we migrate to newer Xen hypervisor in our infrastructure and ensure the problem is no longer reproducible.

mmaney commented 3 years ago

Well, this all left me with a bump of curiosity, so I hacked up a lilttle script and called it from rc.local. It was supposed to stop after it had rebooted 100 times, but I never got back to check on it, and the test to stop rebooting after 100 times never tripped. It reported having rebooted for the 653rd time before I interrupted it (by choosing the recovery boot in grub).

I'd say it works fine with the current Debian Buster kernel and Xen version (on 2e4 hardware at least).

pietrushnic commented 3 years ago

@mmaney thanks for spending the time on testing it. I'm not sure if we using system reboot or rather shutdown/poweroff and then power on by connecting the plug in our automated validation environment. As Michał said we have to update our infrastructure and clean this outdated bug report.

jailbird777 commented 3 years ago

I've been running Xen (currently 4.13.2) on OpenSuSE Leap 14.x for about 18 months now on both an apu2 and an apu4 without any stability issues at all. I've even had XCP-ng 8.2.0 running without any issues. So I think the Xen stability issues might be resolved :).