xcp-ng / xcp

Entry point for issues and wiki. Also contains some scripts and sources.
https://xcp-ng.org
1.26k stars 74 forks source link

Unable to install XCP-ng on some hardware with i915 compatible graphics card #436

Open rushikeshjadhav opened 4 years ago

rushikeshjadhav commented 4 years ago

i915 compatible graphics card has become common in many cases (Intel NUCs) and users are unable to do installation with default kernel or kernel-alt. Current iso build process strips i915 from both kernels which could be causing a black screen after Xen relinquishes VGA. This needs more testing from users who have only i915 compatible VGA.

Users can boot an already installed system as it has i915 module present.

It is not certain that having i915 module in the ISO will solve this issue.

stormi commented 3 years ago

It now has been tested that having the i915 module solves the issue. However we still don't know why on those specific computers the installer's kernel is not able to display text without the GPU driver. Adding that driver would be a workaround (and we can help users do it manually during installation), but not the complete solution.

In addition to that, the i915 module is blacklisted in dom0. We don't know the reason why yet.

vicsanca commented 2 years ago

Hello, I'm having this problem with comet lake and rocket lake processors. How can I manually solve it? Thanks

stormi commented 2 years ago

There are test installation images at https://xcp-ng.org/forum/topic/5492/xcp-ng-8-2-1-maintenance-update-ready-for-testing that are supposed to bring Rocket Lake support, but I don't think it's related to i915.

vicsanca commented 2 years ago

Same problem with 8.2.1. Black screen after relinquish VGA. I'm able to install only with PCIe GPU. Once installed and removed GPU tested lots of workarounds for CentOS 7 in grubs command line and i915.conf without success. Same result with comet lake (i5-10400) and rocket lake (i5-11400).

stormi commented 2 years ago

So even by un-blacklisting i915 in /etc/modprobe.d/i915.conf on the installed system you still don't have any display after reboot? This would differ from what other users experienced in a similar situation.

We still don't know why such hardware is not able to display a simple console without using the i915 module though.

vicsanca commented 2 years ago

Yes, un-blacklisting has no effect. Maybe the problem is with Intel B560 chipset?. Which CentOS release it's based on? I'm gonna try with CentOS to see if has the same problem.

Thanks

stormi commented 2 years ago

The CentOS release (7) wouldn't tell you much as we have a custom 4.19 kernel and there's also the Xen layer that might play a role.

orther commented 2 years ago

I can confirm @vicsanca report that the black screen when relinquish VGA.

One thing of note is that the 8.2.1 release did render the GRUB install menu on my Intel 11th Gen NUC (NUC11PAHi7) which could open a path for hacking an i915 fix on those NUCs (with less hassle).

vicsanca commented 2 years ago

OK. I have done more tests with comet lake.

Updated 8.2 to 8.2.1 with yum, same problem, but now lspci shows a name for VGA (00:02.0 VGA compatible controller: Intel Corporation CometLake-S GT2 [UHD Graphics 630] (rev 03)), 8.2 only shows hardware identifier 9bc8

Un-Blacklisting i915 in /etc/modprobe.d/i915.conf has no effect, lsmod shows that i915 module is NOT loaded.

Tested "options i915 force_probe=9bc8" in i915.conf...same result. It's not loading i915.

orther commented 2 years ago

I've read that the Intel Iris Xe Graphics require kernel version 5.4 or higher and I am seeing that reported for my specific device an 11th Gen Intel NUC i7 here: http://linux-hardware.org/?id=pci:8086-9a49-8086-3004

vicsanca commented 2 years ago

Could be a problem related to UEFI/CSM?

https://scottiestech.info/2021/04/13/why-cant-i-enable-csm-on-my-new-motherboard/

cheezgr8r commented 2 years ago

Adding that driver would be a workaround (and we can help users do it manually during installation)

@stormi , how can this be done manually during installation?

stormi commented 2 years ago

Adding that driver would be a workaround (and we can help users do it manually during installation)

@stormi , how can this be done manually during installation?

You attach a device with the driver on it, switch to a shell with ALT+RIGHT and insmod the driver.

orther commented 2 years ago

There’s a new NUC Test 2 ISO that works on my NUC 11 w/ i915. You can find out more about it on this XCP-NG community forum thread

danieltwagner commented 2 years ago

I came here because I've been trying to install XCP-ng on a Jasper Lake N5105 system which also uses i915 graphics and further has 4x i225 network cards and between the two it feels like I'm living on the bleeding edge. The "NUC Test 2" ISO linked above got me through to installation where the vanilla 8.2.1 installer failed. Is there an ETA for inclusion of the i915 driver in the official ISOs?

olivierlambert commented 2 years ago

Our objective is to product nightly ISO with latest updates (and maybe even test drivers), especially for people with non-server hardware.

stormi commented 2 years ago

Note: this is not about including the i915 drivers as they are not required to get a display. The fixes are related to console display in the linux kernel.

danieltwagner commented 2 years ago

@stormi You're right, I should have used more precise language; my question was if there was a timeline to incorporate these improvements to the installer or iso build process such that the installation can complete as normal on Intel NUC and other devices using i915 graphics.

@olivierlambert That's great! I take it this isn't the case yet, or have I missed them?

olivierlambert commented 2 years ago

Not yet, as I said it's an objective, not something already done ;)

dezren39 commented 2 years ago

I am also running into this issue with a Jasper Lake N6005 CPU very similar to @danieltwagner's. Once the 1165g7 version shows up, I will let you know if it's also affected. Likely will be, based on this and the forum threads. Unfortunately there is no legacy mode in the new intel chipsets. (EDIT: Confirmed, also broken with regular and fixed by custom iso from forums.)

They are becoming popular partially because more affordable models from AliExpress (and occasionally Amazon) are showing up now and have been reviewed on YouTube by a few channels, generating at least a little interest. (Between ServeTheHome && Level1Techs combined, I'm sure there's more than one following a similar path.)

davidpesce commented 2 years ago

Any updates on this? Has it been integrated into nightly build?

rjt commented 2 years ago

XCP-Ng devs, many sysAdmins use desktops to test newer versions of hypervisors before rolling out. I would think this would effect many users. Isn’t there an ALT XCP-Ng version the below solution might belong in?

David,

I happened to notice this kernel mailing list post having to do with Intel Integrated Graphics and a particular patch. I would Google for part of what I quote below to get the full thread. A later posting encourages users to contact Linus himself.

Reverting the kernel patch bdd8b6c98239 fixes the problem.

BELOW IS COPIED FROM A LINUX KERNEL MAILING LIST

Update: On affected hardware, you do not need to run in a Xen PV Dom0 to see the regression caused by bdd8b6c98239.

All you need to do is run, on the bare metal, on the affected hardware, with the Linux kernel nopat boot option.

Jan mentions in his commit message the function in the i915 driver that was touched by bdd8b6c98239 and that causes this regression. That is, any Intel IGD that needs to execute the function that Jan mentions in the commit message of his proposed patch when the i915 driver is setting up the graphics engine will most likely be hardware that is affected. My Intel IGD was marketed as HD Graphics 4600, I think.

So find an a system with these hardware characteristics, and try running, with the nopat option, the Linux kernel, with and without bdd8b6c98239. You will see the regression I am experiencing, I predict.

On Mon, Sep 5, 2022 at 9:18 AM David Pesce @.***> wrote:

Any updates on this? Has it been integrated into nightly build?

— Reply to this email directly, view it on GitHub https://github.com/xcp-ng/xcp/issues/436#issuecomment-1237108757, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACX7F5JPSW4XYQQU5C2WF3V4X6MJANCNFSM4RMU72BQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

staticfrost commented 2 years ago

Hey, is the changes from the NUC test2 iso going to be merged in any time soon?

stormi commented 2 years ago

So, regarding the graphics themselves, the next ISOs we build will have the fix:

Regarding refreshed 8.2 ISOs, the display issue is not the only issue that needs fixing before we release any: we also need backported network drivers. This is where we're not advancing right now. The initial plan, devised back in May IIRC, was that @andrew64k would contribute pull requests to the XCP-ng project so that we can include these drivers. I also offered help to accompany the process. But I think both sides have been busy and nothing was done.

Another solution is I can do the packaging work myself. I just need some input from Andrew, as we already discussed it on the forum, in the dedicated forum thread: the upstream code that was extracted from the linux kernel + patches that were necessary to make them work on our older 4.19 kernel, and explanations of the rationale behind the changes because we need traceability.

The last resort would be doing it all by myself without @andrew64k, but this doesn't look like a good solution in my eyes, and is likely to happen later than the other solutions.

andrew64k commented 2 years ago

I actually have paid contracting work this summer so I have been busy.

The network coding/testing is not a problem. I have real hardware for testing so I can debug issues. It's git where I'm not an expert and it should be quick and easy, but it's not (at least to do it correctly).

I have updated the 8125 driver and it seems to work ok (where it works) and better than the current XCP included version. The i225 has been so solid I bought myself a new mini machine as a home XCP server that uses the i225 interface (and dual M.2, it's crazy fast and small)!

I'll put in an effort to get a PR for the network drivers so they can be added to testing and 8.3. The three issues (EFI, i225, 8125) are very common now for most new machines.

GRUB should also be updated (2.03) or patched to better support 64bit EFI FB but an update not directly "required" for XCP as Xen sends the right 64bit FB data to Dom0 (as tested).

I know the 8125 and driver is having big problems in the FreeBSD world. TrueNAS pulled out the 8125 driver because of iSCSI data corruption.

-- Andrew Lindh @.***

Quoting Samuel Verschelde @.***>:

So, regarding the graphics themselves, the next ISOs we build will
have the fix:

  • upcoming 8.3 Beta (probably end of 2022)
  • refreshed 8.2 ISOs

Regarding refreshed 8.2 ISOs, the display issue is not the only
issue that needs fixing: we also need backported network drivers.
This is where we're not advancing right now. The initial plan,
devised back in May IIRC, was that @andrew64k would contribute pull
requests to the XCP-ng project so that we can include these drivers.
I also offered help to accompany the process. But I think both sides
have been busy and nothing was done.

Another solution is I can do the packaging work myself. I just need
some input from Andrew, as we already discussed it on the forum, in
the dedicated forum thread: the upstream code that was extracted
from the linux kernel + patches that were necessary to make them
work on our older 4.19 kernel, and explanations of the rationale
behind the changes because we need traceability.

The last resort would be doing it all by myself without @andrew64k,
but this doesn't look like a good solution in my eyes, and is likely
to happen later than the other solutions.

-- Reply to this email directly or view it on GitHub: https://github.com/xcp-ng/xcp/issues/436#issuecomment-1254716864 You are receiving this because you were mentioned.

Message ID: @.***>

andrew64k commented 2 years ago

PR submitted for new IGC and r8125 drivers.

It should be plug and play for the XCP 8.2 build (and 8.3, I think).

-- Andrew Lindh @.*** NETPLEX 860-233-1111

Quoting Samuel Verschelde @.***>:

So, regarding the graphics themselves, the next ISOs we build will
have the fix:

  • upcoming 8.3 Beta (probably end of 2022)
  • refreshed 8.2 ISOs

Regarding refreshed 8.2 ISOs, the display issue is not the only
issue that needs fixing: we also need backported network drivers.
This is where we're not advancing right now. The initial plan,
devised back in May IIRC, was that @andrew64k would contribute pull
requests to the XCP-ng project so that we can include these drivers.
I also offered help to accompany the process. But I think both sides
have been busy and nothing was done.

Another solution is I can do the packaging work myself. I just need
some input from Andrew, as we already discussed it on the forum, in
the dedicated forum thread: the upstream code that was extracted
from the linux kernel + patches that were necessary to make them
work on our older 4.19 kernel, and explanations of the rationale
behind the changes because we need traceability.

The last resort would be doing it all by myself without @andrew64k,
but this doesn't look like a good solution in my eyes, and is likely
to happen later than the other solutions.

-- Reply to this email directly or view it on GitHub: https://github.com/xcp-ng/xcp/issues/436#issuecomment-1254716864 You are receiving this because you were mentioned.

Message ID: @.***>

danielbayley80 commented 1 year ago

So, regarding the graphics themselves, the next ISOs we build will have the fix:

  • upcoming 8.3 Beta (probably end of 2022)
  • refreshed 8.2 ISOs

I have been trying this with the Alpha. It installs and boots cleanly (unlike the previous version which had the relinquish issue). I am still having issues in one of my VMs. I think it is when the Guest OS (Windows 10) probes graphics.

I raised this for 8.2.1 on my N5105. It works fine bare metal. More recently I tried a 12th Gen i7-1260P with the same issue. I will go back and test my N5105 but I strongly suspect I will see the same issue.

https://github.com/xcp-ng/xcp/issues/565

exetico commented 1 year ago

8.2.1 gave me black screen after the "relinquishing vga console" message. 8.3a2 booted the installation with no problem. I'm on a ODROID-H3+.

olivierlambert commented 1 year ago

Yes, it's somehow expected :) Thanks for your feedback confirming it's the case :+1:

mrnaz commented 11 months ago

I just purchased a bunch of N5105 based units intending to install xcp-ng on them for light duty stuff. I SHOULD have bought one and tested first, but I didn't. Now I can't install XCP-ng on any of them, and this is the issue that I'm bumping into.

Is there a solution for this yet? Or are the various hacks in this thread still the only workaround?

andrew64k commented 11 months ago

Try installing 8.3 beta or the new 8.2 ISO. There are official updates now to support some newer hardware.

Check the XCP forums for additional information.

mrnaz commented 11 months ago

Is the new 8.2 ISO just from the regular download link? If so, I downloaded 8.2.1 this morning and compared the hashes with one I had from about 6 months ago and they are the same. Both give me the same "relinquishing" bug.

Is 8.3 beta safe(ish) to use in a production environment? Or am I asking for trouble?

mrnaz commented 11 months ago

I confirm that 8.3b installs fine.

olivierlambert commented 11 months ago

We'll have soon © new updated 8.2 ISO so you can also install 8.2 directly on it. However, 8.3 works already pretty well, which is OK-ish for this kind of hardware anyway :)

mrnaz commented 11 months ago

It's good to know, I would much rather use 8.2 because while this is low power hardware, the use case is production. These boxes are going to be used to run pfsense as the core router, virtualized so that we can also have a few other lightweight functions happening on the same hardware. It will be backed up, so if it goes down it's not the end of the world, but nonetheless I'd sleep better with LTS rather than a beta release.

lethedata commented 8 months ago

It now has been tested that having the i915 module solves the issue. However we still don't know why on those specific computers the installer's kernel is not able to display text without the GPU driver. Adding that driver would be a workaround (and we can help users do it manually during installation), but not the complete solution.

In addition to that, the i915 module is blacklisted in dom0. We don't know the reason why yet.

@stormi I think the reason this module was/is blacklisted is due to xen and dom0 using the kernel based vga settings allowing a default expected resolution to be set at all times. By enabling i915, the module is able to take over and adjust the resolution however sometimes it will adjust to something unsupported leading to an "Input Signal Out Of Range" error on the attached monitor.

I've ran into this issue and normally setting the resolution in the kernel parameters manually (required during install) or permanently via Citrix Article CTX226191 fixes the issue however with 8.3 Beta it didn't work. Took me a bit to pinpoint that the i915 module was loading requiring modprobe.blacklist=i915 be added to the module2 kernel parameter as well. I was lucky to have another monitor on-hand as remoting in wasn't an option and it was presenting itself during the install process.

From my understanding of things, servers aren't really expected to need GPU drivers in the first place so all module based issues can be avoided by simply disabling it. It's much easier to troubleshoot and recognize when someone's hardware requires the module than to figure out the opposite due to some obscure module/monitor issue. That's not even mentioning the frustration of trying to information off a system when the customer can't even get any output in the first place, especially as it only presents itself when trying to access things locally.

I'm planning a PR to document the fix but for now opened https://github.com/xcp-ng/xcp-ng-org/issues/255 .

stormi commented 7 months ago

Hi! I must admit I'm not sure what we should do with these issues, but I'm following your efforts in documenting a way to workaround them! Thanks!

If at some point, you believe you have found something we change in XCP-ng's default configuration, feel free to suggest it too. I'm not sure why i915 is not blacklisted anymore from XenServer 8 (which is what XCP-ng 8.3 inherited this change from).

stormi commented 7 months ago

Oh, also, I think we should open a separate issue, because yours doesn't seem to have the same cause as the initial issue reported here. Do you agree?

stormi commented 7 months ago

Oh, by the way, i915 is blacklisted in XCP-ng 8.3:

# cat /etc/modprobe.d/i915.conf 
blacklist i915
options i915 enable_gvt=1
lethedata commented 7 months ago

Oh, also, I think we should open a separate issue, because yours doesn't seem to have the same cause as the initial issue reported here. Do you agree?

I don't think an issue is needed. It seems to be more of an edge case so the documented work around should be good enough. Anything in there could be set as the defaults if needed but the trade-off is that you'll be forcing a particular resolution size rather than letting the system automatically handle it. For consistency, forcing resolution on UEFI will match the BIOS config.

Oh, by the way, i915 is blacklisted in XCP-ng 8.3:

I think my system was ignoring that blacklist. I've seen it happen in the past where blacklisting in modprobe.d just didn't work for whatever reason requiring the blacklist be set in the kernel parameters directly. The system is on beta2 now (fresh install) and isn't loading it with the modprode.d file only, as it should.

stormi commented 7 months ago

Could it be blacklisted in the running system, but not in the initrd?

lethedata commented 7 months ago

Doesn't look like it. I decompressed both beta1 and beta2 and the blacklist file is there.

"This alone will not prevent a module being loaded if it is a required or an optional dependency of another module. Some kernel modules will attempt to load optional modules on demand...." - Red Hat KB41278

stormi commented 7 months ago

Right, so maybe another module requires it in your case :thinking:

rjt commented 7 months ago

“How do I prevent a kernel module from loading automatically?” https://access.redhat.com/solutions/41278#rhel6only

Samuel, thanks for the link to a great article. For xcp-ng 8, is it safe to assume that following the #rhel6only https://access.redhat.com/solutions/41278#rhel6only instructions is enough?

Helpful article, not just for XCP-ng, but any Linux use case. Will try to commit to memory. Surprising how much blocking a module changed among RHEL versions and that blacklisting in a single file is not nearly enough. ie, not just ever would have thought about kdump.

Need to make sure it gets added to archive.org before IBM puts it behind a paywall.

On Mon, Feb 26, 2024 at 7:55 AM Samuel Verschelde @.***> wrote:

Right, so maybe another module requires it in your case 🤔

— Reply to this email directly, view it on GitHub https://github.com/xcp-ng/xcp/issues/436#issuecomment-1964203955, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACX7F2ZE556RQ5L73QFATDYVSH4TAVCNFSM4RMU72B2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJWGQZDAMZZGU2Q . You are receiving this because you commented.Message ID: @.***>

lethedata commented 7 months ago

@rjt No,not fully anyway. Those steps are written for rhel which do not match to how dom0 is built in xcp-ng; grub config being different is the big one.

Due to what dom0 is and how integrated it is with xen, I'd say it's a better idea to blacklist things via kernel parameters using xen-cmdline. This would allow one easy remove any changes at boot time without having to fight with a broken system and broken initrd. It also allows upgrades without breaking things unexpectedly and "wins" over files

Basically, any deviations from the standard dom0 should be done in a non-permanent easily removable way.

rjt commented 7 months ago

Oh yes, used xen-cmdline to have dom0 reserve more Ram for itself and prevent Mellanox infiniband from loading. So works well for the virtual machines and of course dom0 is a vm. Glad you brought that to attention because xcp-Ng is such an amalgam and RedHat is so different among the different versions.

But xen-cmdline would not help with making a new installation ISO, right?

I guess once you know what modules to exclude, add those exclusions on the initial installation boot as ISO kernel parameters. Then during very first bootup, excluding via XEN-cmdline should cover further bootups.

On Mon, Feb 26, 2024 at 9:55 AM Echo Nar @.***> wrote:

@rjt https://github.com/rjt No,not fully anyway. Those steps are written for rhel which do not match to how dom0 is built in xcp-ng; grub config being different is the big one.

Due to what dom0 is and how integrated it is with xen, I'd say it's a better idea to blacklist things via kernel parameters using xen-cmdline. This would allow one easy remove any changes at boot time without having to fight with a broken system and broken initrd.

— Reply to this email directly, view it on GitHub https://github.com/xcp-ng/xcp/issues/436#issuecomment-1964485255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACX7F4WEIJKEIXNQDIHWRDYVSWAPAVCNFSM4RMU72B2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJWGQ2DQNJSGU2Q . You are receiving this because you were mentioned.Message ID: @.***>