Open nagilum99 opened 5 years ago
Please use code block markdown syntax for your logs, otherwise it's very hard to read/scroll here.
Done. I used "Code" before but it looked unreadable. Had to google how codeblocks work. This editor is not super userfriendly! >:(
Adding some dmidecode regarding Firmware Version:
# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 3.1.1 present.
# SMBIOS implementations newer than version 3.0 are not
# fully supported by this version of dmidecode.
Table at 0x5CF12000.
Handle 0x0002, DMI type 0, 26 bytes
BIOS Information
Vendor: HPE
Version: U43
Release Date: 04/04/2019
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 16384 kB
Characteristics:
PCI is supported
PNP is supported
BIOS is upgradeable
BIOS shadowing is allowed
ESCD support is available
Boot from CD is supported
Selectable boot is supported
EDD is supported
5.25"/360 kB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Function key-initiated network boot is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 1.22
Firmware Revision: 1.40
Handle 0x001E, DMI type 1, 27 bytes
System Information
Manufacturer: HPE
Product Name: ProLiant DL20 Gen10
Version: Not Specified
I think it would be better to open a thread on the community forum (https://xcp-ng.org) so people could help there. I would keep the bug report for issues that are precisely identified.
This will you'll have assistance from the whole community without adding issues that are difficult to diagnose "as is" on this repo.
Note: the forum is also using Markdown, so keep the same syntax ;)
As commented in the forum: The problem does not occur with XCP-ng 7.6. Something on the install media must have been changed.
Also I watched the boot process more carefully and found a
[DEPEND] Dependency failed for XCP-ng installer.
I could see before, that udev is waiting for something - but for whatever reason it doesn't tell for what:
Seems there are also no logfiles containing anything useful.
Adding here: Forum link: https://xcp-ng.org/forum/topic/2014/installation-on-hpe-dl20-gen10-stucks-on-boot Solution: https://xcp-ng.org/forum/topic/2014/installation-on-hpe-dl20-gen10-stucks-on-boot/15?_=1573751770221&lang=de
Also: Problem didn't occur on HPE DL20 Gen 9 (using Xeon E3 v5), only on Gen 10.
Does it work better with XCP-ng 8.1's installer?
Has anyone confirmed if 8.1 works with Gen 10 and if the bug is still being resolved?
Confirmed 8.1 installer not working on my HP DL20 as well
With what error messages exactly? Have you tried other boot options (alternate kernel, safe mode...) ? Legacy BIOS vs UEFI?
Sorry already moved on to installing 7.6 which works but doesn't detect the disks so I'm working on tracking down that issue now
Can confirm that 8.1.0 installer does not work on HPE DL20. Server stops at cdrom: Uniform CD-ROM driver Revision: 3:20
While trying to install over iLO. Fans start to ramp up after a while and it just won't go any further.
Going to try the older version and will update if any difference.
Hey guys, hopes this helps someone else with their HP DL20:
Possibly unrelated to this install bug, but I discovered the solution to the fan issues. The RAID controller in mine is Windows only, which is why the RAID drives don't show up. When disabling the RAID controller in BIOS, they show up and I can then software RAID it but the problem is the fans go crazy non-stop.
The solution for me was to install Hyper-V, load the RAID driver during install, add a bunch of hacks to get it working on a non-AD network. Now everything runs smoothly and it's super quiet.
Nice :+1:
Are those machine still on the market? I might try to ping someone at HPE to see if we can investigate. It's not in the Citrix HCL, so there's maybe a reason for that :factory_worker:
What kind if "RAID controller" are you talking about? I had the E208 (IIRC) so a plugged daughterboard inside the problematic servers.
@olivierlambert: They are for years and I doubt they'll drop that line soon.
I DO have these servers running under XCP-ng 8.0 and they still do fine - no problems with fans after successful installation. only the installer was stuck, which means that the solution probably is to find the difference between the installer kernel/modules and the operative one.
We have currently a customer with those machines and they are running well (no issue during 8.1 install). So I suppose there's a diff somewhere between some models causing this.
As he said RAID controller: Any chance to check for that? Probably the only real relevant part that may be different everywhere, as I doubt different CPUs matter for that.
And all that firmware involved.
On Thu, Sep 3, 2020 at 8:44 AM nagilum99 notifications@github.com wrote:
As he said RAID controller: Any chance to check for that?
Probably the only real relevant part that may be different everywhere, as I doubt different CPUs matter for that.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/xcp-ng/xcp/issues/283#issuecomment-686501736, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACX7F5OLNAY527OZBZUOXLSD6MSLANCNFSM4I33S7BQ .
Hmm might be a point, too - so maybe the easiest is indeed to have a diff between installer and os kernel. Theroretically there can't be different behavior if you have the exact same stuff running.
Update for installation on DL20 GEN10. Hardware:
Installation of 8.x fails. Installation of 7.6 successful with no issues.
Upgrading to 8.x
Was unable to perform an upgrade by CD/ISO or by Alternate method: remote upgrade
Installer never starts and the system just hangs waiting indefinitely.
Switching to an alternate terminal (/opt/xensource/installer/preinit console
I performed this using the alternative upgrade method which did not require me to enter anything at the console. Do note that the output on the console was a little hard to view with jumbled text over dialogs, but worked.
The system rebooted and appears to be operating fine. I will make note that the system does hang for a while during the boot process likely waiting for some detection process to timeout. Takes about 4min to boot from power on. That being said it is on 8.1.0-2 and all seems to be working fine at this point.
Hope that helps anyone with a DL20 attempting to do an installation to 8.1.x
Being transparent it does not appear that DL20 is listed in any generation or XenServer version on the HCL. of Citrix website. And it is also a little bit of the forgotten stepchild on HPE's list for running as a hypervisor as well. However, it offers a great price/feature point for small simple networks on a budget and has always performed well with Xen in the past. Hopefully, we can help figure out what exactly is stopping the process. @olivierlambert if you have a specific log you would like us to pull off the system we can attempt to do so after its installed for the client.
It wasn't stormy, it was me spending quite some time on that (and in final I got hints from XenServer devs via their bugtracker). ;-)
The "hanging" XCP boot seems rather common: It's not only the DL20, I have a Ryzen Pro 4650G on a Gigabyte B550 mainboard (cut that combo works with ECC) and this also displays an almost empty console after the boot-splash. After a few minutes xsconsole appears and everything looks good.
Didn't try to find out what it does, meanwhile.
I agree: They are a good solution for SMB with small budget and needs, also it's compact and rather low on energy usage (compared to bigger Dual socket Xeon systems).
Confirmed 8.1 installer not working on my HP DL20 as well
Can confirm, doesn't work with 8.1 either. Hangs on hpsa dependency "board not ready, timed out", it seems.
I have a HPE DL380 Gen10 with Xeon Silver 4210R, disk controller P408i-a with 6 SAS HD in RAID 6. The XPC-NG 8.2.0 install stuck, the last message was:
started update utmp about system runlevel changes
Have you tried the alt kernel @thiagoras ?
@olivierlambert
The menu with the kernel-alt option does not appear (that screen with the rocket and the default option of install) during boot. But I tested the same media on an old DL380p Gen 8 server with Xeon E5 and the installation of the XCP-NG 8.2.0 works in default mode and kernel-alt. Maybe the problem is something in the RAID configuration, i will test without RAID to takethe test
[RESOLVED] I've tested it without RAID but it didn't work. After I recreated the RAID forcing the construction, it was more than 6 hours with intense activity on all HDs, but the installation of XPC-NG didn't work anyway.
But after that I realized that the hard drives were without a partition table. Then I created the partition table and created a partition using the entire logical drive, but XPG-NG didn't recognize it either. Almost giving up, I installed Ubuntu on the server, created some partitions and performed disk benchmarks. After that I tested the XPC-NG installation again in standard mode and it worked, that is, I suppose it is necessary to have at least one partition created and used for the installation to work, at least in my case.
Thank you @olivierlambert
Wow. That's weird. Thank you @thiagoras for the feedback. @stormi and @Fohdeesha what do you think about this?
Soon I can do some other testing, I can try to create some specific partition table, if you can guide me I can give feedback.
I have been monitoring this thread with the same problem. I just tried to install again after doing a clean Debian 11 install with disk partitioning first and installer still gets stuck. We have no problem installing on the DL20 G9 servers. As soon as this is resolved we will be moving to the gen10.
I think I finally cracked this nut. The solution was simple actually, but doesn't make much sense to me. Even though I have hardware RAID card installed, the solution is to put the embedded software raid card into SW RAID mode. Once that is done the installer runs fine and it will see the hardware RAID volume and other disks.
Procedure:
It's known that it seems to hang on hpsa... I just couldn't dig down further (where exactly it has problems). But that would at least fit to hpsa being the troublemaker. Edit: If RAID card but no optical device is installed: It could help just disabling the controller, as it's good for nothing in that case. It's bascially the usual chipset fakeraid thing, just HPE named.
I agree, seems pointless and only increases boot times.
Now faced with the problem "[ OK ] Started Update UMTP about System Runlevel Changes". The solution was to REVERSE change "SmartArray SW RAID Support" to "SATA AHCI Support" After installation, you can change back.
I downloaded the current (Version 8) ISO and moved it onto a USB-stick with Rufus (Unetbootin can't handle UEFI install, it seems).
It (finally) stucks at:
[ OK ] Started Update UMTP about System Runlevel Changes.
Though it still reacts onto STRG+ALT+DEL to reboot and I'm able to switch to, and use, other consoles.
I called top and it shows no real activity on the system.
dmesg says:
Any Ideas? How can I debug that further? Can I try to start the installer manually? (Does it make sense for finding the problem?) I have 2 of these machines, I'd like to have at least one working rather soon, could continue debugging with the 2nd almost identical system (just less RAM).