xcp-ng / xcp

Entry point for issues and wiki. Also contains some scripts and sources.
https://xcp-ng.org
1.31k stars 74 forks source link

Unable to boot installer with AMD Ryzen APUs #206

Closed nagilum99 closed 3 years ago

nagilum99 commented 5 years ago

As from the Forum: There seems to be a kernel bug, preventing several kernels from booting on AMD Ryzen APUs, unless maxmem:2048M is set (maybe a higher value works, but this was a recommendation from the OPSI forums and it worked on their linux boot image and for XCP-ng 8.0.0 Beta + CH 8). Tested hardware: AMD Ryzen R5 2400G AMD Ryzen R5 2200G AMD Athlon 240G Asus Prime A320M A BIOS 4801 Gigabyte GA-A320M-S2H V2 BIOS F30

It probably makes sense to upstream at least the problem itself. Also some kernel maintainers should be aware of the problem?

I could test different (newer) images, if you want me to. Didn't test on Epyic, but I assume it's related to APU and shared memory for GPU (which is usually located above 4G and thus could collide, if not proper handled).

XCP-ng 7.6 works! (It's known to work with older kernels!)

More references: https://forum.opsi.org/viewtopic.php?f=8&t=10611 (Et al.)

Solution: Either edit grub.cfg/grub64.cfg on your install media or press "e" during Grub menu, change 8192 value to 2048 and press F10 to boot.

MasterSteelblade commented 5 years ago

Able to confirm this as an issue, but affecting non-APU Ryzen as well - havn't been able to try with maxmem settings, but all USB devices hang after some ELF messages. Using a Ryzen 1700 on a B450 platform. XCP-ng 7.6 works.

nagilum99 commented 5 years ago

It takes like... 10 seconds to apply the documented workaround.

varnav commented 5 years ago

AMD Ryzen 7 1700X server. I'm running 7.6, and was trying to upgrade to 8.0 ISO gives some initial boot messages, but eventually gets stuck with black screen. Those problems did not occur with 7.6 ISO.

I don't want to play with workarounds, better wait this to be fixed.

stormi commented 5 years ago

I don't want to play with workarounds, better wait this to be fixed.

We don't know how to fix it yet and the fix probably belongs into the kernel. Lowering the RAM values for everyone would just make other kinds of hardware fail (this is why it has been set to 8Gb by Citrix in the first place, from what I know).

As @nagilum99 said: "press e during grub menu, change 8192 value to 2048 and press F10 to boot.". If you need to do that to many hosts, you can extract the ISO, change grub.cfg and rebuild the ISO. See https://github.com/xcp-ng/xcp/wiki/Modifying-the-installer

pblakez commented 5 years ago

confirm issue on Gigabyte B450 AORUS M (rev. 1.0) Bios 41a AMD Ryzen R5 3400G

cflaviu commented 4 years ago

same issue on ASRock AB350 Pro4 - BIOS 5.80 AMD Ryzen 7 1700 16 GB RAM

cflaviu commented 4 years ago

I rebuit iso image by changing dom0_mem=max:8192M to dom0_mem=max:2048M in grub.cfg and grub-usb.cfg but no luck.

In safe mode log I noticed:

ACPI sleep modes: S3 VPMU: disabled xenoprof: Initialization failed. AMD processor family 23 is not supported ...... Hardware Dem0 halted: halting machine

In BIOS, CPU virtualization (SVM Mode) is Enabled and ACPI S3 (Suspend to RAM) is Disabled. Enabling ACPI S3 has no effect, same halting.

olivierlambert commented 4 years ago

Without rebuilding, just modifying the menu with the value doesn't work?

cflaviu commented 4 years ago

Sorry for the late reply. I couldn't trigger the edit of grub menu pressing e key and I switched to XCP-ng 7.6 which works fine.

nagilum99 commented 4 years ago

If you couldn't trigger it, as described, something else is wrong.

DavorSaric commented 4 years ago

AMD Ryzen 5 3600 - cd boots with v8.0.0. with mem fix 2048M

nagilum99 commented 4 years ago

That's a different problem then. The solution above works for the symptom described. If it reboots during boot, then it's something else. As usual debug-recommendations: Try to disable all power saving settings in the BIOS. Dunno what USB boot should be, but you should try to boot either via UEFI and BIOS mode. Sometimes one thing is buggy.

Noone's gonna refund you anything. You're using OSS for free - but maybe they refund you 100 € if you pay for a supported license before. People may try to help you debugging the problem but it's also on you to dig deeper and give some helpful infos about the problem. That means: More details - as many as you can provide.

I'm too lazy to google for the 'reboot problem' (and that would be your part anyways), but I have in mind, that someone already had that problem.

rjt commented 4 years ago

Wouldnt a serial port debug session help figure this out?

Have firmware updates been applied?

Intel chips take such a performance hit after applying all MeltDown and latest spectre patches and then disabling HyperThreading. AMD does not take nearly the performance hit and has so many more lanes for NVMe.

On Thu, Nov 14, 2019 at 10:54 AM Davor Saric notifications@github.com wrote:

Hello, I am trying version 8 on Hetzner AX41-NVMe with AMD Ryzen 5 3600 and no luck. Following solution doesn't work, server still reboots during CD boot. Currenty installing v7.6. and boot is stuck at: Xen is relinquishing VGA console and then server restarts. I will try now usb instead of UEFI boot. I hope I didn't waist 100EUR down the drain.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/xcp-ng/xcp/issues/206?email_source=notifications&email_token=AACX7F2PRTIS2FCVGFPCOADQTV7DVA5CNFSM4H247DSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEECQUBQ#issuecomment-553978374, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACX7F6OSDJDP2D2W33XHTDQTV7DVANCNFSM4H247DSA .

rjt commented 4 years ago

Davor,

that is why serial ports can be used for debugging at times. Another option would be to take a pic with your cell phone. AdobeScan, Google, or the CopyFish web browser extension can do OCR to convert the bulk of it to text. Search and replace badly converted characters with notepad.

I do not know if the list accepts pictures, but try cropping down the size and saving ad PNG to make it most likely to get thru the filters.

On Thu, Nov 14, 2019 at 11:18 AM Davor Saric notifications@github.com wrote:

I cannot paste you the output as I am in KVM. USB is BIOS(legacy) mode.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/xcp-ng/xcp/issues/206?email_source=notifications&email_token=AACX7F5YIN4G467EWNX5KRDQTWB5DA5CNFSM4H247DSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEECTFMQ#issuecomment-553988786, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACX7FZAY24CQSGJJDJ5UELQTWB5DANCNFSM4H247DSA .

DavorSaric commented 4 years ago

Tnx, I have successfully installed(BIOS mode, UEFI is not working) it after fideling with UEFI settings. Turning of all C states and etc. And XCP-NG center successfully connected (there were some reports it doesn't want to connect due to certificate/ssl issue)

nagilum99 commented 4 years ago

The certificate thing is an expected situation, as most servers use self-signed ones, but XCP-ng/XenCenter primarily expect a signed one to verify the source. Accept it and you're done.

You coul experiment with the setting that was finally responsible to let you successfully install. UEFI is mature by now, but it's still the cause for some flaws - and probably still will be for quite a while. AFAIK it's often a problem of not being faulty on certain mainboards. In most cases falling back to BIOS-Mode should have no drawbacks.

Edit: As you saved 100 € now - do I get 50? I'd also be fine if you donate them to Vates for further XCP-ng development - @olivierlambert is maybe open for that. ;)

DavorSaric commented 4 years ago

Hello, we are allready using XO Enterprise in business environment with paid licence. This is my private part. I jumped at conclusion, XCP is too buggy on AMD Ryzen 5 3600, now it won't boot, probably due to Hetzner restored BIOS do original state. So if you plan to rent a server and install on it, be warned. You only have limited time with KVM which works buggy in the start (BIOS shows blue screen and no settings, they confirm is a bug) so don't recomment Hetzner(or at least do not recommend AX series)

DavorSaric commented 4 years ago

Update: BIOS didn't changed, but I cannot boot XCP-NG anymore from the DVD, neither over KVM ISO or with ISO directly attached to USB. Neither netinstall or full iso work, BIOS and UEFI. Hetzner said other users also reported Xen is not working on this server so... no luck. v7.6 also doesn't boot. Luckily hetzner offers a refund.

i7-8700 server EX52-NVMe XCP-NG installed and booted without problems.

nagilum99 commented 4 years ago

Then it's probably specific to the mainboards they use in that series. As you already said, you successfully installed XCP-ng, I wonder what happened.

Also: You can always re-request the "Lara" for another 2 or 3 hours for free or continue a longer period as paid service. I forgot what they charge then. I like their "Serverbörse", you can make good snaps, sometimes.

DavorSaric commented 4 years ago

Could be. I wasted yesterday and most of today and gave up. KVM is free for 3h and then it disconnects, you can allways ask for another session of 3h which is free.

valhallen282 commented 4 years ago

I came here just to say that - switching to UEFI - non CSM mode on a Ryzen 5 2400G on an MSI A320M Pro-VD/S V2.0 allowed me to boot and reinstall my instance. EDIT - also set GPU memory allocation to lowest in BIOS. and turned off C-States as with it on i couldnt get through first boot. But it installed fine oddly.

XCp-Ng

stormi commented 4 years ago

@rushikeshjadhav so, from forum feedback, this installer issue is still present on XCP-ng 8.1 beta.

Could you try to investigate and find out of there are patches for that in the linux kernel that we could backport to ours? It should not fail just because the amount of memory is 8 Gb.

rushikeshjadhav commented 4 years ago

As I understand, this issue can be solved by change 8192 value to 2048 while boot up and install. Can someone confirm whether its installation only issue or general runtime as well?

nagilum99 commented 4 years ago

Beside you could read the thread and links carefully before experimenting with such things: Yes and Install.

rushikeshjadhav commented 4 years ago

Quoting would have helped. There seems no error information from regular or serial console about what happened?

For urgent resolution : The installer can be built with kernel 4.4 for home-lab and you may be able to even run your XCP-NG 8.1 with kernel 4.4 without issues.

Regarding upstream fixes, can someone with access to the hardware try kernel-alt?

nagilum99 commented 4 years ago

We would have pasted any errors, if we would have gotten them. The systems just freeze.

And as written: It's only for the Installer, the final system works flawlessly. I'm very confused why you're writing that a kernel downgrade is an urgent solution, when we already found out that changing the boot option does the trick? I guess you still didn't read the thread, instead burning time with trying to find a solution for something that already has a solution (as long as there's no final patch to update the package/image). "XCP-ng 8.0 comes with an updated Kernel (4.19)" and a downstep is never a good option, as you might drop support for newer hardware and also not taking advantage of optimisations.

I'm not gonna test any alternative or older kernels on the system.

stormi commented 4 years ago

What we'll try is to provide an installation ISO that uses kernel-alt by default. kernel-alt is just the base 4.19 kernel to which we apply all kernel.org patches from the 4.19 branch, so it should be quite stable by construction, though we can't give it as much QA as the main kernel receives. If it installs well, I expect it to run well without much risks.

DavorSaric commented 4 years ago

Hi, will that fix the installation issues on AMD Ryzen 5 3600 line of CPU's? If you want, you can test those at Hetzner, they offer 14days server cancellation free with no charge. I have tried before but installer just hangs and some panics or cpu hang errors pop up during boot. Tried a lot of things, changed bios/uefi settings and etc. Managed to boot one time and install XCP-ng and it worked. But after full format of disks I tried to install it again but it failed. Tried all the steps as before but no success. Seems like xcp-ng is currently very buggy to install on Ryzen. Could be to some combination of motherboard and cpu, I don't know. But if XCP-ng would support Ryzen, that would be awesome :)

stormi commented 4 years ago

Have you tried the workaround that consists in changing the grub boot command (in UEFI mode) and reduce the amount of RAM on that line before booting? It works for others.

About kernel-alt, we don't know yet if that will fix anything. One assumption is that it could be a kernel bug that makes it not cope with 8Gb of memory in the system at install, so we want to test with the latest upstream patches.

stormi commented 4 years ago

(Sorry, re-read the thread and I see that you've done that already)

Linh1706 commented 4 years ago

ryzen 2700x also have same issue. ISO gives some initial boot messages, but eventually gets stuck with black screen. CPU Ryzen 2700x MOBO X470 TAICHI RAM 32GB

pblakez commented 4 years ago

Note I have installed on 6 ryzen5 systems now the changing the memory in the grub menu works but the grub menu disappears too quickly and that may be why people are having problems

Also gigabyte motherboards must have the latest bios or may not work properly with later Ryzen CPU

On Thu, 27 Feb 2020, 02:06 Linh Nguyen, notifications@github.com wrote:

ryzen 2700x also have same issue. ISO gives some initial boot messages, but eventually gets stuck with black screen. CPU Ryzen 2700x MOBO X470 TAICHI RAM 32GB

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xcp-ng/xcp/issues/206?email_source=notifications&email_token=AAG3AU5N7S5RQ56GD6U4KBTRE2HRJA5CNFSM4H247DSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENA2SRY#issuecomment-591505735, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG3AU6VVEDV7LKH6LCLNALRE2HRJANCNFSM4H247DSA .

nagilum99 commented 4 years ago

BIOS versions for CPU support are independend of XCP and thus a completely different topic. Also we gave 2 options: either edit on runtime or edit the config file (USB-installs make it easy to do it). If you're too slow to edit it during runtime: Edit the config. Especially when installing multiple systems it makes no sense to do it the runtime way.

nagilum99 commented 4 years ago

Also @stormi: I could test if the boot-image works without the workaround, if you supply a modified image with alt-kernel.

underling3311 commented 4 years ago

Dear, @nagilum99 I just bought / built a computer last night. CPU Ryzen 2700x, motherboard Asrock B450 Pro4, 23GB of Ram inside right now, have 32GB Max.

I am having the same problem as Linh1706 is having and I have no idea to fix this even with the 2 bypasses. I don't even know what that even is or how to do that. I have read the entire post from day one and still don't understand what I should do.

"ISO gives some initial boot messages, but eventually gets stuck with black screen." One message says "AMD processor family is NOT supported"

Is VPMU suppose to be disabled or should enable it?

Could you give a list of directions on what to do? Like Step by Step?

Thanks,

Mr. Waste

Linh1706 commented 4 years ago

Dear, @nagilum99 I just bought / built a computer last night. CPU Ryzen 2700x, motherboard Asrock B450 Pro4, 23GB of Ram inside right now, have 32GB Max.

I am having the same problem as Linh1706 is having and I have no idea to fix this even with the 2 bypasses. I don't even know what that even is or how to do that. I have read the entire post from day one and still don't understand what I should do.

"ISO gives some initial boot messages, but eventually gets stuck with black screen." One message says "AMD processor family is NOT supported"

Is VPMU suppose to be disabled or should enable it?

Could you give a list of directions on what to do? Like Step by Step?

Thanks,

Mr. Waste

I fixed it by changing parameter from 8194M to 2048M I don’t know which works, but other than editing the grub.cfg, i also edit the file isolinux.cfg It’s at boot>isolinux>isolinux.cfg

You can try editing this file first, if still not working, then try edit both grub.cfg file that you can find in your usb.

olivierlambert commented 4 years ago

Note that a new ISO will be out soon with a select choice to boot with "compatible" memory settings for Ryzen CPUs

underling3311 commented 4 years ago

How long do you think before that will actually happen @olivierlambert ?

underling3311 commented 4 years ago

Thank You all for your Help!! I finally worked, THANK YOU AGAIN!!

stormi commented 4 years ago

XCP-ng 8.1 RC is out, with two new installer options offered at boot time. Please test both and report:

stormi commented 4 years ago

So, people, we've put a lot of work to offer two new installer options and I really would like to get feedback before monday from people who had issues with their Ryzen APUs.

So, could anyone please boot the latest ISO (named prefinal3 here: https://updates.xcp-ng.org/isos/8.1/) and try:

No need to perform the installation: just see if the installer starts when it gave a black screen previously.

Linh1706 commented 4 years ago

UPDATE:

  1. It works both config, either normal boot and 2G boot. I don't know why but this time I use another graphic card. Last time I use Vega 56 ref card.
  2. got error with current xcp-ng 8.0 that all VMs went away, and lost connect to server even though I'm in the server console. I recorded the video of the symtoms: https://youtu.be/vjgR59UFQWA the situation is that I installed xcp-ng 8.0 with VEGA 56 card, and now I changed to RX580 card.

Let me try it and get you the result in 1 day. in the meantime, I'm investigating a Man-In-The-Middle attack for a client. anyone has experience in this pls let me know tho. tks

nagilum99 commented 4 years ago

Interesting: I tested the standard install (just press enter) as the alternate kernel. Both booted into the TUI on an AMD Ryzen 3400G with 8 GB RAM. So I didn't try the 2G-version. Didn't test it on multiple machines though.

Linh1706 commented 4 years ago

So I update my machine today. With the condition that I changed the graphic card, and It currently doesn't recognize VMs correctly as stated above, using the normal boot and option, then after updating, restart and the screen is black, with a cursor blinking. I'm trying to restore from backup and go with 2G option now

Same behavior... guess that I will make a clean install and go with it.

olivierlambert commented 4 years ago

@Linh1706 if you have no internet connection nor NTP configured, this is normal. Please wait up to 10 minutes.

Linh1706 commented 4 years ago

Oh. damn I waited about 5 min and then did the clean install. Now I am recovering the disk and VMs

The only thing I wish I could is test the upgrade whether it fix the GPU swap or not. Now I would not be able to test it, at least for a while.

hmm, it may be another problem, not the GPU swap @olivierlambert : hi, is there a limitation to AMD platform that I cannot use the only GPU in the system to pass-through to VM? I have 1 GPU (RX580) in my current system: CPU Ryzen 2700x MOBO X470 TAICHI RAM 32GB

but whenever I hide PCI from dom0 , I cannot connect XCP Center to the server. Only can ssh to it and the only option that I could do to make the system works again is to undo the "hide" process I just did.

image

commandline-be commented 4 years ago

confirmed not working on AMD Ryzen 1700x.

Can this not simply be resolved with an AMD boot entry in grub ?

stormi commented 4 years ago

confirmed not working on AMD Ryzen 1700x.

Can this not simply be resolved with an AMD boot entry in grub ?

Have you tried the already available boot options I described above?

commandline-be commented 4 years ago

confirmed not working on AMD Ryzen 1700x. Can this not simply be resolved with an AMD boot entry in grub ?

Have you tried the already available boot options I described above?

Yes, the system boots the reboots, every time. I disabled Global c-states in the bios, supplemented boot options such as iommu=pt amd_iommu=on to no avail. System is set to UEFI als legacy+uefi to no avail.

MSI X370 chipset with latest bios and AMD 1700x CPU.

olivierlambert commented 4 years ago

Also it's possible you have the first batch of Ryzen with a known Linux CPU bug.