xcp-ng / xcp

Entry point for issues and wiki. Also contains some scripts and sources.
https://xcp-ng.org
1.28k stars 74 forks source link

Host restore not working on XCP-ng 8.1 RC - diskfilter writes not supported #357

Open ghost opened 4 years ago

ghost commented 4 years ago

On several occasions I have tried to do a host restore using the XCP-ng 8.1 RC ISO and it is failing. In most cases it leaves an damaged/corrupt filesystem and does not boot. I can give more details but wanted to share so the devs can test before final release.

Scenario: installed both a production and test case with XCP-ng 8.1 RC. In both cases I use two 64GB SATADOM disk in the production case and two 64GB virtual disks in the test case (VM). After installing and configuring the systems, I do a host-backup and pool-dump-database. Then I do a host-restore, reboot with the ISO and try to restore using the ISO installer (shown below).

restore-xcp

Then, I try the restore, which fails with the following error:

restore-error

After trying the reboot, the system fails to boot and looks like this:

reboot-error

I have had this happen on both VM simulated installs as well as real production hardware servers. I have also had this happen on both BIOS and UEFI based systems. It was happening on 8.0 as well but did not have time to document it.

If you need more info let me know but I wanted to get this out there because if it was not for my Clonezilla backups I would have been in bad shape for restorations.

olivierlambert commented 4 years ago

Can you make a quick test on Citrix Hypervisor 8.1?

ghost commented 4 years ago

No I have not. That would not be an apples to apples comparison because CH does not support software raid.

stormi commented 4 years ago

That would still be valuable information to help move forward a diagnosis.

ghost commented 4 years ago

I'm sorry I do not have access to CH 8.1 ISO install media and do not want to create an account on their site. Therefore, if you want to close this then I will just continue to use third party means of backup and restore.

stormi commented 4 years ago

I think the report is a valid one so we won't close it because you can't test. However this would have saved us some time and maybe allowed to put a diagnosis faster.

If you still have a way to do tests, I'd be interested in knowing if restore works better for you without using software raid.

ghost commented 4 years ago

That's a good idea; I can certainly spin up another test VM and try it without software raid and let you know if it has problems with that setup.

ghost commented 4 years ago

@stormi, Okay so I spun up a new VM with a single 100GB disk (sda). Went through the install, rebooted, ran the host-backup, and then host-restore. Rebooted into the ISO installer and restored the system just fine. Therefore, without any other testing, it looks like it might be an issue with the software raid on the restore.

HTH

ghost commented 4 years ago

I'm going to take a stab at the problem here. Based on the error message, diskfilter writes are not supported it looks like the installer is running into a longstanding bug in grub2 that has been affecting Ubuntu installers for a long time. This Ubuntu bug post discusses some of these problems that I believe are showing up in CentOS as well recently. Just thought I would point you in some direction even though I can't be of much help on the testing side.

Good luck!

dpcinc commented 3 years ago

Had this exact error on a 8.0 -> 8.2 upgrade and then rollback today. Looks like the system is trashed.

dpcinc commented 3 years ago

Was able to recover the install and get it back on 8.0. This is a disk label issue in grub.cfg as well as /etc/fstab. blkid shows the label is wiped from root partition (md127p1) and also from swap md127p6 (log partition label remains intact) . Fix is to e2label the label back on by booting the installer, f2 for advanced, type shell and run e2label. You can see your old labels in /etc/fstab , the fstab can also be updated to the new labels that you'll see in blkid for logs.

stormi commented 3 years ago

Thanks for the feedback. This label issue is a very good lead.

olivierlambert commented 3 years ago

Indeed, this is precious feedback for us to be able to solve it faster :+1:

dpcinc commented 3 years ago

Welcome guys! I'm pretty sure restore would be broken for anyone using software raid install at this point, might want to add that to docs ?

olivierlambert commented 3 years ago

We'll try to reproduce it in our lab (and maybe at some point in our auto-tests)

nagilum99 commented 3 years ago

At least I can verify that "restore" didn't work on 8.2 for me. It didn't come very far and stopped with some error, IIRC. It afterwards ended up pretty weird offering a bunch of previous installs while there should by only one backup.