Open lethedata opened 9 months ago
Hi! Thanks for the report.
@ydirson is it related to the PR we made a while ago and Citrix/XS never wanted to merge?
@olivierlambert Could be. This failing command is in our RAID-creation code that that XS will not merge, more specifically about https://github.com/xcp-ng/host-installer/pull/7 which added the sgdisk --zap-all
call.
It could be that https://github.com/xenserver/host-installer/pull/38 would help, notably by stopping the OS from auto-assembling pre-existing RAID volumes.
@lethedata I'm interested in the logs for this "successful failing" here if you still have them, maybe we can improve the behavior here.
@lethedata I'm interested in the logs for this "successful failing" here if you still have them, maybe we can improve the behavior here.
@ydirson Unfortunately I didn't think to grab any output until after I fixed things. For future reference, if the iso logs, where does it output to when install not completed?
I messed around trying to reproduce the error but was only able to get sgdisk to push a GPT restore once. The issue is that I don't know what was written in the original backup GPT sector leading sgdisk to constantly detect MBR after restore. This process also doesn't seem to impact mdadm through the installer.
fdisk /dev/DISK
(Options: o, n, p, default, default, w)dd if=/dev/DISK of=/PATH/MBR.backup bs=512 count=1
fdisk /dev/DISK
(Options: g, w)dd if=/dev/zero of=/dev/DISK bs=512 count=34
dd if=/PATH/MBR.backup of=/dev/DISK bs=512 count=1
sgdisk --zap-all /dev/DISK
My hunch is that whatever was written in the back of the drive was a "perfect sequence" leading sgdisk to not wipe and mdadm to fail but this is just a hunch. No matter what I tried I couldn't seem to reproduce it. Now I know when it comes to odd disk issues it's probably a good idea to at least backup the tables before wiping things.
For future reference, if the iso logs, where does it output to when install not completed?
During the installation it logs essentially into /tmp/install-log
. Then you'll find the installer logs on the installed host in /var/log/installer/
During installation RAID creation there were no failure messages but the raid device was not appearing and selected disks still appeared as individual disks. Dropping to console I was able to manually create the RAID but wiping it, rebooting, and letting the installer handle it still failed.
Looking at the host-installer specifically diskutil.py create_raid I manually ran each command. Doing this I was able to see that
sgdisk --zap-all
was "failing successfully". The program seemed to complete without error however it didn't properly wipe the GPT tables so the system was auto recovering. I'm not exactly clear why this was impacting the rest of the raid creation but after manually wiping with gdisk's zap I had no other issues.Looking online I found Ubuntu gdisk Bug 1303903 which mentions that this might be caused by how sgdisk handles MBR disks and incorrectly assuming an MBR disk. Following their work around, adding the
--mbrtogpt --clear
flags might prevent this from happening. I was unable to reproduce after my gdisk wipe so am unable to verify.