Software RAID install on previous used mdadm disks

olivierlambert commented 5 years ago

IIRC, we are already using mdadm --zero-superblock /dev/sdX to clean all the selected disks from previous mdadm superblocks.

However, if mdadm was used on partitions level (eg sda2), our command won't clean it, and the install will fail.

Ideally, we should loop on every partition and remove the superblock. Maybe there is a better way (super block detection?) to find and remove only where a superblock was stored before.

randadinata commented 5 years ago

If we already have user consent for destroying data, can't we just nuke the first and last 2MiB with dd followed by partprobe ? 🤣 everything in between doesn't matter anymore

olivierlambert commented 5 years ago

That's an option (but same idea: on each partition). So it won't simplify a lot the equation (need to loop on each partition).

oallart commented 5 years ago

We have a similar approach, a script that nukes the md's and other bits. It can be passed as <script stage="installation-start" type="url"> from a remote server on an unattended install

The script runs

mdadm --zero-superblock
wipefs --all
sgdisk -Z

on all drives and/or partitions

olivierlambert commented 5 years ago

Can you describe with more details exactly each step? (and in which order?) So we can maybe do that instead just the zero on the whole disk (and miss partitions).

oallart commented 5 years ago

Yes, am working on refining that right now, it's not quite world ready yet. Works well via pxe on a rescue boot, but not quite there as an integrated step with the xs answerfile method.

Basically it does

identify, activate and destroy LVM
identify, activate and destroy md's
wipefs (erase filesystem, raid or partition-table signatures)
zap the GPT and MBR data structures
dd zero some parts just in case

Some of these are probably redundant but work well. We use the script to zero drives for reinstall much faster than dban or a full dd zero can do. I'll get something a bit cleaner and will share.

oallart commented 5 years ago

Ok so here's something I tested a bit and does work when supplied from an answer file as <script stage="installation-start" type="url">

Still a bit crude but works well. Output is redirected to /tmp/prescript.log I also have in there a bit to prevent the package installation delay caused by the md resync.

#!/bin/sh
# O. Allart - 2018/12
# to be executed at the very first stage of install of a fresh xenserver
# - dbanlite style wipe
# - disable md resync
{
# identify partitions, md devices
# map partitions to md devices 
echo "md devices found:"
cat /proc/mdstat | grep ^md  
if [[ $? -ne 0 ]]
then 
    echo "No software RAID md device found in /proc/mdstat, no MD to destroy"
else

    for DEVICE in $(cat /proc/mdstat | sed -n 's/\(md[0-9]\+\).*\(sd[a-f][1-9]\?\).*\(sd[a-f][1-9]\?\).*/\1:\2:\3/p'); do
        # Extract md device and associated devices
        MD=$(echo $DEVICE | cut -d: -f1)
        DEV1=$(echo $DEVICE | cut -d: -f2)
        DEV2=$(echo $DEVICE | cut -d: -f3)

        # test these are valid
        mdadm --detail /dev/$MD | head -5
        if [[ $? -ne 0 ]]; then
            echo "Reported device /dev/$MD invalid"
            exit 6
        fi

        mdadm -E /dev/$DEV1 | head -10
        if [[ $? -ne 0 ]]; then
            echo "Reported partion /dev/$DEV1 invalid"
            exit 7
        fi
        mdadm -E /dev/$DEV2 | head -10
        if [[ $? -ne 0 ]]; then
            echo "Reported partion /dev/$DEV2 invalid"
            exit 7
        fi

        echo "Stopping device"
        mdadm --stop /dev/$MD
        if [[ $? -ne 0 ]]; then echo "Device $MD could not be stopped" && exit 8; fi

        echo "Zeroing superblock on /dev/$DEV1"
        mdadm --zero-superblock /dev/$DEV1
        if [[ $? -ne 0 ]]; then
            mdadm --zero-superblock /dev/$DEV1
            if [[ $? -ne 0 ]]; then echo "CRITICAL: Partion /dev/$DEV1 could not be zero'd - Drive is NOT ready for reuse" && exit 9; fi
        fi

        echo "Zeroing superblock on /dev/$DEV2"
        mdadm --zero-superblock /dev/$DEV2
        if [[ $? -ne 0 ]]; then
            mdadm --zero-superblock /dev/$DEV2
            if [[ $? -ne 0 ]]; then echo "CRITICAL: Partion /dev/$DEV2 could not be zero'd - Drive is NOT ready for reuse" && exit 9; fi
        fi
        echo "-------------------------------------------------------------"

    done
fi

# Finishing touch: wipe FS signatures, zap partition tables.
for DRIVE in $(cat /proc/partitions | grep -o "sd[a-z]$")
do
        echo Finishing $DRIVE
        wipefs --all /dev/$DRIVE
        sgdisk -Z /dev/$DRIVE
done

# delays resync to speed up install in raid1 md configs
echo 0 > /proc/sys/dev/raid/speed_limit_max
echo 0 > /proc/sys/dev/raid/speed_limit_min
} > /tmp/prescript.log 2>&1

olivierlambert commented 5 years ago

Pinging this info to @nraynaud who did the software RAID stuff, for potential inclusion directly in the installer :+1:

gdelafond commented 5 years ago

@olivierlambert @nraynaud if you include it in the installer, beware that disk's name will not always match sd[a-z]$. As far as I know, Linux disk name scheme is the following:

SATA/SAS: sd[a-z]+$
inside a VM (to make tests) usually: xvd[a-z]+$
NVMe: nvme[0-9]+$.

Some rules are defined in /lib/udev/rules.d/60-persistent-storage.rules

Maybe I should not take information from /proc/partitions but from something like: lsblk | awk '$6 == "disk" {print $1}' ?

gdelafond commented 5 years ago

Instead of erasing all available drives, maybe the installer should ask for the disk have to be erased. Or only erase disk that have been chosen for the XCP installation.

olivierlambert commented 5 years ago

Yes, this is already what we do (the select disk only are magic block zeroed). But we lack the fact of doing that on all partitions

oallart commented 5 years ago

Good points. As said earlier, it is a bit crude and more specific to our use. But glad to see the ball rolling and hoping for the feature to be included someday. It's nice that xcp-ng has the tools available to perform the various tasks (sgdisk, wipefs etc.). We work a lot with answer files (see my posts on upgrading too) so we can build the logic around drives in there. Until the feature is built in, there is an avenue for people to use the feature externally. Those script stage entries are incredibly useful.

olivierlambert commented 5 years ago

@oallart feel free to create a dedicated entry in the Wiki with a "how to", this could be useful for all XCP-ng users :+1:

oallart commented 5 years ago

@olivierlambert yep I have already started and taken over some sections :smile:

gdelafond commented 5 years ago

5. dd zero some parts just in case

You can wipe all fs information with:

DISK=sda
LBAS=$(cat /sys/block/$DISK/size)
dd if=/dev/zero of=/dev/$DISK bs=512 count=1024
dd if=/dev/zero of=/dev/$DISK bs=512 seek=$(($LBAS-1024)) count=1024

nraynaud commented 5 years ago

Hi all, I am working on the issue. The UI side of things is a bit complicated.

I worked with the installer yesterday. 1) Here is what I have:

some RAID array devices (/dev/md127) could be hidden in the UI because they expose less than 46GB. But their underlying members could represent more than that and be recycled in a new configuration for XCP-ng.
if a RAID array exists but is hidden, modifying it will simply not happen, there is a guard in the code, but there is no user feedback.

2) I am thinking of various UI solutions:

add a screen between "EULA" and "Select Primary disk" that would show everything (disks, partitions, RAIDs, and maybe LVM) and allow for some destructive actions on those (delete RAIDs, partitions, boot bits, FS markers, RAID member markers). Then the workflow would continue to the "Select Primary disk" screen.
Or somehow show what has been filtered out in the Select Primary disk screen partition and allow interaction with it (I am still unclear on this)

3) As for the partitions (eg. /dev/sda2), should we keep the partitions as they exist or destroy them and use full disks all the time?

olivierlambert commented 5 years ago

IMHO, when the user select its disks, it should destroy everything on it, without any other possibilities. XCP-ng is a kind of "Xen Appliance", not a "normal" Linux distro. Partitioning is done by XCP-ng, not the user.

stormi commented 5 years ago

For those willing to test the pull request or even help developing the feature, here's a guide that explains how to build a modified ISO image with a modified installer: https://github.com/xcp-ng/xcp/wiki/Modifying-the-installer

klou commented 5 years ago

I'm not in a position to try this, but we upgraded from a XS-7.0 (RAID 1 on individual partitions) to XCP 7.5 (RAID 1 on individual disks) a few months ago, and I'm trying to figure out why my IO sucks.

Anyways, the below is from dmesg, in case it helps as an additional example of stuff left over on a 7.5 conversion.

[    3.011900] GPT:Primary header thinks Alt. header is not at the end of the disk.
[    3.011903] GPT:1465148799 != 1465149167
[    3.011905] GPT:Alternate GPT header not at the end of the disk.
[    3.011906] GPT:1465148799 != 1465149167
[    3.011907] GPT: Use GNU Parted to correct GPT errors.
[    3.011921]  sdb: sdb1 sdb2 sdb3 sdb4 sdb5 sdb6
[    3.012711] sd 2:0:0:0: [sdb] Attached SCSI disk
[    3.014863] GPT:Primary header thinks Alt. header is not at the end of the disk.
[    3.014865] GPT:1465148799 != 1465149167
[    3.014867] GPT:Alternate GPT header not at the end of the disk.
[    3.014868] GPT:1465148799 != 1465149167
[    3.014870] GPT: Use GNU Parted to correct GPT errors.
[    3.014882]  sda: sda1 sda2 sda3 sda4 sda5 sda6
[    3.015562] sd 0:0:0:0: [sda] Attached SCSI disk

ydirson commented 1 year ago

Let's describe the problem differently: we're installing an appliance, not a general-purpose OS... so we should not care at all about whatever RAID/LVM setup had been on the disks we're anyway going to overwrite. The problem is, when booting the ISO, some udev rules react to the presence of software-RAID signatures in some disks/partitions and assemble them... which is what we don't want. And in fact, that udev rules file from CentOS (/lib/udev/rules.d/65-md-incremental.rules) already has a special-case to neutralize it when running the Anaconda installer.

So we're left with a few actions to take:

inform udev that an installer is running, so it won't auto-assemble RAID arrays
clear the partition table in the disks selected for assembling a new RAID for good measure
add special support to detect a previous installation of XCP-ng on RAID, since this is the one case where we may want to activate a preexisting RAID ("may", because we still don't want to activate it if we're going to overwrite the disks with a new install)

ydirson commented 1 year ago

A test image is now available here. Please let us know if it works for you! It is based on the 8.3-alpha2 install image, with installer changes detailed here.

xcp-ng / xcp

Software RAID install on previous used mdadm disks #107