xcp-ng / xcp

Entry point for issues and wiki. Also contains some scripts and sources.
https://xcp-ng.org
1.32k stars 74 forks source link

Failed installation on top of previous soft RAID Linux install #543

Open AtaxyaNetwork opened 2 years ago

AtaxyaNetwork commented 2 years ago

Hello !

I regularly reinstall machine which previously run Linux (Debian 10 mostly) with soft raid 1 to XCP-ng. Since 8.2 (And I think it's older than that), when I recreate the soft raid 1 from the installer, the installer finishes correctly, but I end up in grub rescue at the reboot. My guess is that XCP-ng installer don't delete the old soft raid correctly, and the grub get confused. I try to boot via grub rescue, but with no success. My workaround is to boot a live Debian, launch the shell, and execute this for each disk I want to use in my soft raid:

DISK=sdx
LBAS=$(cat /sys/block/$DISK/size)
dd if=/dev/zero of=/dev/$DISK bs=512 count=1024
dd if=/dev/zero of=/dev/$DISK bs=512 seek=$(($LBAS-1024)) count=1024
mdadm --zero-superblock /dev/$DISK
sync

Then I can relaunch the installer, and XCP-ng install successfully !

Let me know if I can help !

Cécile

stormi commented 2 years ago

Thanks for the report. It was known that creating a soft RAID may fail on previously used disks due to stale metadata, but not that it may succeed and then fail only at grub install stage.

Do you see what the error is in the installer logs (/tmp/install-log from the installer before rebooting, or /var/log/installer/install-log from the installed system that doesn't boot)?

stormi commented 2 years ago

Related to https://github.com/xcp-ng/xcp/issues/107

AtaxyaNetwork commented 2 years ago

Hello !

Unfortunately, I didn't keep the logs, since I need the machine urgently. I will try to set up a test machine to reproduce this bug ASAP :)

AtaxyaNetwork commented 2 years ago

Hello !

I found the time to test the installation of XCP-ng on top of a Debian (11.3) soft raid 1 I tried the process on one of my lab machine (Dell R610 with 2 146G HDD) and a VM with two 80G disk. I have the same behavior on both machines. I attach the log of the VM one. installer.log

I did this to test raid soft:

I think the best workaround is to allow on the installer to delete old soft raid, using the command I provided in my first message.

I can provide you access to my lab machine and/or the VM I use to test, if you want to dig directly.

Thanks again for looking into that, and sorry for the delay !