rear / rear

Relax-and-Recover - Linux bare metal disaster recovery and system migration solution (cfr. mksysb, ignite)
http://relax-and-recover.org/
GNU General Public License v3.0
943 stars 255 forks source link

Rear Recover finishes but system will not boot. SLES12 SP3 with BRTFS/ppc64le #2094

Closed kkoehle closed 4 years ago

kkoehle commented 5 years ago

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue (quick response is not guaranteed with free support):

*OS version: SUSE Linux Enterprise Server 12 SP3

rear.txt

schabrolles commented 5 years ago

@kkoehle,

I tried rear from the git repo on a sles12 without any issue (on KVM not PowerVM)

I notice that you don't have POST_RECOVERY_SCRIPT in your rear local.conf file.

Here is the part I usually add to my SLES12 servers.

## SLES12
#BACKUP_OPTIONS="nfsvers=4,nolock"
REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" snapper chattr lsattr )
COPY_AS_IS=( "${COPY_AS_IS[@]}" /usr/lib/snapper/installation-helper /etc/snapper/config-templates/default )

for subvol in $(findmnt -n -r -t btrfs | cut -d ' ' -f 1 | grep -v '^/$' | egrep -v 'snapshots|crash') ; do
        BACKUP_PROG_INCLUDE=( "${BACKUP_PROG_INCLUDE[@]}" "$subvol" )
done

POST_RECOVERY_SCRIPT=( 'if snapper --no-dbus -r $TARGET_FS_ROOT get-config | grep -q "^QGROUP.*[0-9]/[0-9]" ; then snapper --no-dbus -r $TARGET_FS_ROOT set-config QGROUP= ; snapper --no-dbus -r $TARGET_FS_ROOT setup-quota && echo snapper setup-quota done || echo snapper setup-quota failed ; else echo snapper setup-quota not used ; fi' )
kkoehle commented 5 years ago

@schabrolles

I added the POST_RECOVERY_SCRIPT and no change. I don't imagine it is a KVM vs PowerVM thing that determines if a disk shows bootable. Does your SLES12 install use LVM and BTRFS for the boot disk?:

 PowerPC Firmware
 Version FW860.11 ### (SV860_063)
 SMS (c) Copyright IBM Corp. 2000,2016 All rights reserved.
-------------------------------------------------------------------------------
 Current Boot Sequence
 1.   Device is not bootable or removed.
 2.    None
 3.    None
 4.    None
 5.    None

What can I do to make PowerVM think it is bootable? Why is there no error message saying the bootloader didn't work? Is there anything I should check for?

schabrolles commented 5 years ago

@kkoehle,

agree, I don't think my KVM test could explain the difference. I think there is something wrong with grub2 but I never had this on my rear on Power test.

you can try to run rear -d recover to have a debug log and check if something wrong happens during the grub installation. usr/share/rear/finalize/Linux-ppc64le/620_install_grub2.sh

I won't be available from today to end of next week.

@jsmeix do you think it could be related to #2093

jsmeix commented 5 years ago

@kkoehle use rear -d -D recover (i.e. with -D) to get a full debug log, cf. "Debugging issues with Relax-and-Recover" at https://en.opensuse.org/SDB:Disaster_Recovery

jsmeix commented 5 years ago

@schabrolles https://github.com/rear/rear/issues/2093 looks totally unrelated because there 1:1 restore for all fs and disks works fine.

But what happens in this issue here is not a recovery on 100% compatible hardware because in https://github.com/rear/rear/files/2993248/rear.txt there is (excerpts)

Switching to manual disk layout configuration
Original disk /dev/mapper/36005076802818158a400000000000495 does not exist (with same size) in the target system
...
User confirmed disk layout file
Partition primary on /dev/mapper/36005076802818158a4000000000004c0: size reduced to fit on disk.
Doing SLES12-SP1 (and later) btrfs subvolumes setup because the default subvolume path contains '@/.snapshots/'
No code has been generated to recreate pv:/dev/mapper/36005076802818158a4000000000004a2 (lvmdev).
    To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
Manually add code that recreates pv:/dev/mapper/36005076802818158a4000000000004a2 (lvmdev)
1) View /var/lib/rear/layout/diskrestore.sh
2) Edit /var/lib/rear/layout/diskrestore.sh
3) Go to Relax-and-Recover shell
4) Continue 'rear recover'
5) Abort 'rear recover'
(default '4' timeout 300 seconds)
1
No code has been generated to recreate pv:/dev/mapper/36005076802818158a40000000000049f (lvmdev).
    To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
Manually add code that recreates pv:/dev/mapper/36005076802818158a40000000000049f (lvmdev)
1) View /var/lib/rear/layout/diskrestore.sh
2) Edit /var/lib/rear/layout/diskrestore.sh
3) Go to Relax-and-Recover shell
4) Continue 'rear recover'
5) Abort 'rear recover'
(default '4' timeout 300 seconds)
4
No code has been generated to recreate pv:/dev/mapper/36005076802818158a4000000000004a3 (lvmdev).
    To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
Manually add code that recreates pv:/dev/mapper/36005076802818158a4000000000004a3 (lvmdev)
1) View /var/lib/rear/layout/diskrestore.sh
2) Edit /var/lib/rear/layout/diskrestore.sh
3) Go to Relax-and-Recover shell
4) Continue 'rear recover'
5) Abort 'rear recover'
(default '4' timeout 300 seconds)
4
Confirm or edit the disk recreation script
1) Confirm disk recreation script and continue 'rear recover'
2) Edit disk recreation script (/var/lib/rear/layout/diskrestore.sh)
3) View disk recreation script (/var/lib/rear/layout/diskrestore.sh)
4) View original disk space usage (/var/lib/rear/layout/config/df.txt)
5) Use Relax-and-Recover shell and return back to here
6) Abort 'rear recover'
(default '1' timeout 300 seconds)
2
***** At this point I change the line: 
    lvm lvcreate -L 64424509440b -n root system <<<y
****** to: 
    lvm lvcreate -L 55g -n root system <<<y
***** I do this because I am restoring a disk that was 100 GB to a 60 GB disk, and the 64424509440b is too big (especially since swap still needs 2 GB)

So the recovery hapens on a smaller disk and that is a migration that will for sure not just work.

I was wondering why that

Partition primary on /dev/mapper/36005076802818158a4000000000004c0: size reduced to fit on disk.

happened because that is a far too late "desperate" mesage from the create_partitions() function when it works aready with wrong data in disklayout.conf

I would have expected to see messages from layout/prepare/default/420_autoresize_last_partitions.sh because of

Device mapper!36005076802818158a400000000000495 does not exist (manual configuration needed)
Switching to manual disk layout configuration

we are in MIGRATION_MODE where in particular scripts like layout/prepare/default/420_autoresize_last_partitions.sh should run (regardless that this script alone does not help here because in case of LVM manual adaptions are needed anyway, cf. the Resizing partitions in MIGRATION_MODE during "rear recover" section in default.conf: https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L370

kkoehle commented 5 years ago

Here is the log file: I couldn't see anything that stood out. rear-hana-n3.log

jsmeix commented 5 years ago

I think I know why layout/prepare/default/420_autoresize_last_partitions.sh doesn't actually do something in this particular case: https://github.com/rear/rear/files/2998736/rear-hana-n3.log contains

+ source /usr/share/rear/layout/prepare/default/420_autoresize_last_partitions.sh
...
++ cp /var/lib/rear/layout/disklayout.conf /var/lib/rear/layout/disklayout.conf.resized_last_partition
...
+++ grep '^disk ' /var/lib/rear/layout/disklayout.conf
++ mv /var/lib/rear/layout/disklayout.conf.resized_last_partition /var/lib/rear/layout/disklayout.conf

i.e. grep '^disk ' /var/lib/rear/layout/disklayout.conf does not find anything.

@kkoehle we also need at least your var/lib/rear/layout/disklayout.conf plus some more additional files in the /var/lib/rear/ directory and in its sub-directories in the ReaR recovery system, cf. "Debugging issues with Relax-and-Recover" at https://en.opensuse.org/SDB:Disaster_Recovery

Be careful when attaching files here to not make possibly confidential internal information public here. I.e. you may have to obfuscate some values in those files before you upload those files here. On the other hand you should not obfuscate too much so that it would become impossible for us to see what actually goes on on your particular system.

Additionally describe as exact as you can how your replacement system where you run "rear recover" differs from your original system where you had run "rear mkbackup/mkrescue".

kkoehle commented 5 years ago

@schabrolles @jsmeix, I found the problem: the boot flag never gets set on the correct partition:

hana-n4:~ # parted /dev/mapper/36005076802818158a4000000000004c0
(parted) print                                                            
Number  Start   End     Size    Type     File system  Flags
 1      1049kB  8389kB  7340kB  primary               prep, type=41
 2      8389kB  64.4GB  64.4GB  primary               lvm, type=8e
(parted) set 1 boot on
 1      1049kB  8389kB  7340kB  primary               boot, prep, type=41
 2      8389kB  64.4GB  64.4GB  primary               lvm, type=8e
jsmeix commented 5 years ago

According to https://github.com/rear/rear/issues/2094#issuecomment-487184999 this is a bootloader issue in MIGRATION_MODE so I think the general issue is https://github.com/rear/rear/issues/1437

github-actions[bot] commented 4 years ago

Stale issue message