rear / rear

Relax-and-Recover - Linux bare metal disaster recovery and system migration solution (cfr. mksysb, ignite)
http://relax-and-recover.org/
GNU General Public License v3.0
942 stars 255 forks source link

IBM Power8 BareMetal rescue USB boot kernel panic: VFS Unable to mount root fs on unknown-block(1,0) #2243

Closed leonardottl closed 4 years ago

leonardottl commented 5 years ago

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue (quick response is not guaranteed with free support):

image

here are the output from the mkrescue :

Relax-and-Recover 2.5-git.3428.dbdfb5f.master.changed / 2019-09-17
Running rear mkrescue (PID 131354)
Using log file: /var/log/rear/rear-koza.log
Using autodetected kernel '/boot/vmlinux-4.4.0-137-generic' as kernel in the recovery system
Creating disk layout
Docker is running, skipping filesystems mounted below Docker Root Dir /var/lib/docker
Using guessed bootloader 'EFI' (found in first bytes on /dev/sda)
Verifying that the entries in /var/lib/rear/layout/disklayout.conf are correct ...
Creating root filesystem layout
Skipping 'docker0': not bound to any physical interface.
Cannot include default keyboard mapping (no KEYMAPS_DEFAULT_DIRECTORY specified)
Cannot include keyboard mappings (neither KEYMAPS_DEFAULT_DIRECTORY nor KEYMAPS_DIRECTORIES specified)
Copying logfile /var/log/rear/rear-koza.log into initramfs as '/tmp/rear-koza-partial-2019-09-30T15:14:15-07:00.log'
Copying files and directories
Copying binaries and libraries
Copying all kernel modules in /lib/modules/4.4.0-137-generic (MODULES contains 'all_modules')
Copying all files in /lib*/firmware/
Symlink '/lib/modules/4.4.0-137-generic/build' -> '/usr/src/linux-headers-4.4.0-137-generic' refers to a non-existing directory on the recovery system.
It will not be copied by default. You can include '/usr/src/linux-headers-4.4.0-137-generic' via the 'COPY_AS_IS' configuration variable.
Symlink '/etc/pki/nssdb' -> '/var/lib/nssdb' refers to a non-existing directory on the recovery system.
It will not be copied by default. You can include '/var/lib/nssdb' via the 'COPY_AS_IS' configuration variable.
Symlink '/usr/share/misc/magic' -> '/usr/share/file/magic' refers to a non-existing directory on the recovery system.
It will not be copied by default. You can include '/usr/share/file/magic' via the 'COPY_AS_IS' configuration variable.
Testing that the recovery system in /tmp/rear.Xhun18bUFlRMlIR/rootfs contains a usable system
Creating recovery/rescue system initramfs/initrd initrd.cgz with gzip default compression
Created initrd.cgz with gzip default compression (238003711 bytes) in 28 seconds
Exiting rear mkrescue (PID 131354) and its descendant processes ...
Running exit tasks

Here is the log file: rear-koza.log

jsmeix commented 5 years ago

@leonardottl I am not a IBM POWER user so that I cannot help with POWER specific issues. @schabrolles is our IBM POWER expert but currently he is busy with client on-site requests cf. https://github.com/rear/rear/pull/2232#issuecomment-531280327

I notice that in your https://github.com/rear/rear/files/3673761/rear-koza.log there is (near the end) only

2019-09-30 15:15:01.120713440 Including output/default/950_copy_result_files.sh
2019-09-30 15:15:01.123360621 Including output/default/950_email_result_files.sh

without the expected log messsage

Copying resulting files to usb location

that should normally come from usr/share/rear/output/default/950_copy_result_files.sh https://github.com/rear/rear/blob/master/usr/share/rear/output/default/950_copy_result_files.sh unless things have gone wrong before so that this script does nothing.

I am also missing

... Including output/USB/Linux-i386/850_make_USB_bootable.sh
... Writing MBR of type ...

lines in your log file when running https://github.com/rear/rear/blob/master/usr/share/rear/output/USB/Linux-i386/850_make_USB_bootable.sh

So I was wondering if before you did run rear mkrescue, you had formatted your USB disk especially for use with ReaR, cf. http://relax-and-recover.org/documentation/getting-started

But the log messages about REAR-000 like

2019-09-30 15:14:10.319559019 Including prep/NETFS/default/060_mount_NETFS_path.sh
mkdir: created directory '/tmp/rear.Xhun18bUFlRMlIR/outputfs'
2019-09-30 15:14:10.326117923 Mounting with 'mount -v -o rw,noatime /dev/disk/by-label/REAR-000 /tmp/rear.Xhun18bUFlRMlIR/outputfs'
mount: /dev/sdk1 mounted on /tmp/rear.Xhun18bUFlRMlIR/outputfs.

indicate that you had formatted your USB device for use with ReaR.

Therefore I think the root cause is that scripts like

usr/share/rear/output/USB/Linux-i386/850_make_USB_bootable.sh

have the system architecture Linux-i386 directory in their path so that such scripts are not run on POWER architecture (POWER architecture specific scripts would have to have a Linux-ppc64 or Linux-ppc64le directory in their path) which means in the end that current ReaR seems to not support OUTPUT=USB on POWER architecture.

@schabrolles could you please have a look here and verify whether or not current ReaR supports OUTPUT=USB on POWER architecture?

schabrolles commented 5 years ago

@jsmeix I currently never done any changes or test on OUTPUT=USB on Power. SO I think you point where the problem is ;-) Unfortunately, as you said, it is difficult for me to work on this point for the moment.

leonardottl commented 5 years ago

Thank you both, @jsmeix and @schabrolles , I think you have identified the issue, thanks. Would you recommend I try to build the ISO output and try with that?

leonardottl commented 5 years ago

Let me update that using the ISO OUTPUT allowed us to make bootable media and we are now performing the backup, I will update if all goes well. So til now it seems that the issue was the fact that for Power USB OUTPUT seems to not be supported.

schabrolles commented 5 years ago

@leonardottl, just for a test, if you really want to boot on USB, you can try to copy the iso into a USB device with dd

dd if=<path_to_rear.iso> of=/dev/<USB_block_device> bs=1M

WARNING, it will destroy all the data you have on the USB device.

Tell le if it works

jsmeix commented 5 years ago

@leonardottl

in general regarding OUTPUT=USB versus OUTPUT=ISO and BACKUP_URL=usb:... versus BACKUP_URL=nfs:...:

The USB specific parts in ReaR behave somewhat different compared to the usual way via ISO and backup on NFS.

In particular when you have your backup on USB things are different compared to backup on NFS, see in particular USB_SUFFIX in default.conf https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L754

For some examples of USB specific issues that I found right now see https://github.com/rear/rear/issues/2171#issuecomment-509143476 and https://github.com/rear/rear/issues/1738#issuecomment-370394149 and https://github.com/rear/rear/issues/1571#issuecomment-343461088 and https://github.com/rear/rear/issues/1158 and several more...

But the most dangerous generic issue when using a USB disk originated in https://github.com/rear/rear/issues/1271 which should nowadays be mostly avoided, see https://github.com/rear/rear/issues/1271#issuecomment-346633365 but that is not yet fully sufficient because there is still a generic problem when a ReaR recovery disk is used, see https://github.com/rear/rear/issues/1854

By the way: Regarding "Creating bootable USB from ReaR ISO" you may have a look at https://github.com/rear/rear/issues/2210

If you like to try out if OUTPUT=USB could be made working on POWER with relatively little effort:

  1. Create a new usr/share/rear/output/USB/Linux-ppc64le/ directory. For ppc64le versus ppc64el see https://github.com/rear/rear/pull/2195#issuecomment-517648246

  2. Copy the scripts in usr/share/rear/output/USB/Linux-i386/ into the new usr/share/rear/output/USB/Linux-ppc64le/ directory.

  3. Run rear -D mkrescue in full debug mode and inspect the log file if things look o.k. when running those scripts. If not you have to adapt those scripts as needed for POWER. If things look o.k. try to boot the ReaR recovery system and if it starts up well try if rear -D recover works. If not see in particular "Debugging issues with Relax-and-Recover" in https://en.opensuse.org/SDB:Disaster_Recovery

leonardottl commented 5 years ago

Thank you @jsmeix for the detailed response, I think making the USB directly may be doable then. However, we decided to generate the ISO and burn it on the USB directly using dd as you also suggested, this worked fine!

We have now made the backup successfully, but have one last question. Is it normal for mkbackup to not consider some files or directories, for instance we noticed that the home directory was not included in the tar backup file, is this correct? can we add a flag to also include it and others? Thank you for your time and help.

jsmeix commented 5 years ago

ReaR runs the tar backup with the --one-file-system option, see https://github.com/rear/rear/blob/master/usr/share/rear/backup/NETFS/default/500_make_backup.sh#L136

When you use btrfs subvolumes they behave like separated filesystems, see https://github.com/rear/rear/blob/master/usr/share/rear/conf/examples/SLE12-SP2-btrfs-example.conf#L54

To specify explicitly what and only what is included in the backup you may use BACKUP_ONLY_INCLUDE and BACKUP_ONLY_EXCLUDE, see https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L996

Another hint regarding architecture specific scripts: Run e.g. rear -s mkrescue (-s is simulation mode) to see what scripts would be sourced and grep for architecture specific scripts. For example on my Ubuntu x86 system with OUTPUT=USB I get

# usr/sbin/rear -s mkrescue | grep 'i386'

Source conf/Linux-i386.conf
Source prep/USB/Linux-i386/340_find_mbr_bin.sh
Source prep/USB/Linux-i386/350_check_usb_disk.sh
Source prep/USB/Linux-i386/350_find_syslinux_modules.sh
Source prep/USB/Linux-i386/400_check_extlinux.sh
Source build/Debian/i386/600_fix_debian_stuff.sh
Source output/USB/Linux-i386/100_create_efiboot.sh
Source output/USB/Linux-i386/300_create_extlinux.sh
Source output/USB/Linux-i386/830_copy_kernel_initrd.sh
Source output/USB/Linux-i386/850_make_USB_bootable.sh

For OUTPUT=USB on POWER you need equivalent /USB/ scripts.

leonardottl commented 5 years ago

Dear @jsmeix and @schabrolles , we are still unable to get rear to backup the entire system, if you can comment on our issue it would be most appreciated, believe us that we are working on our side to solve it but just cannot find what we are doing wrong.

To define which directories to include we are using: BACKUP_PROG_INCLUDE=( '/*' ) BACKUP_ONLY_INCLUDE=y

should these options force rear to backup the entire file system, or is something else required?

leonardottl commented 5 years ago

Here is one example where we are using the EXCLUDE option instead, but still not getting all the directories, such as /home and others.

I attach the local.conf and the log file from rear -Dv mkbackup rear-koza.log rear-koza.log

OUTPUT=ISO BACKUP=NETFS BACKUP_URL="file:///media/externo"

BACKUP_PROG_INCLUDE=( '/' )

BACKUP_PROG_EXCLUDE=( '/media' '/var/lib/rear' '/tmp' '/proc' '/sys/' '/dev' 'lost+found')

BACKUP_ONLY_INCLUDE="true"

BACKUP_ONLY_EXCLUDE=y

jsmeix commented 5 years ago

@leonardottl before I try to dig around in your logs (as time permits) I would like to know what output you get for the following commands:

lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT

and

findmnt -a

to get an overview of your block devices and filesystems structure plus your disklayout.conf file that "rear mkrescue/mkbackup" created.

In general when you use BACKUP_ONLY_INCLUDE=y you need to explicitly specify in the BACKUP_PROG_INCLUDE array all filesystem-like thingies (usually the mountpoints of filesystems and - if you have - btrfs subvolumes) you want to get in your backup.

leonardottl commented 5 years ago

First, thanks again or the continued support. Here are the outputs and the disklayout file.

findmnt.txt lsblk.txt

leonardottl commented 5 years ago

Sorry, disklayout was missing

# Disk /dev/sda
# Format: disk <devname> <size(bytes)> <partition label type>
disk /dev/sda 960197124096 gpt
# Partitions on /dev/sda
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
part /dev/sda 7340032 1048576 rear-noname prep /dev/sda1
part /dev/sda 255852544 8388608 rear-noname none /dev/sda2
part /dev/sda 959932530688 264241152 rear-noname lvm /dev/sda3
# Disk /dev/sdb
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdb 960197124096 msdos
# Partitions on /dev/sdb
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
#part /dev/sdb 960196075520 1048576 extended none /dev/sdb1
#part /dev/sdb 960195026944 2097152 logical none /dev/sdb5
# Disk /dev/sdc
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdc 0 
# Partitions on /dev/sdc
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sdd
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdd 0 
# Partitions on /dev/sdd
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sde
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sde 0 
# Partitions on /dev/sde
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sdf
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdf 0 
# Partitions on /dev/sdf
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sdg
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdg 0 
# Partitions on /dev/sdg
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sdh
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdh 0 
# Partitions on /dev/sdh
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sdi
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdi 0 
# Partitions on /dev/sdi
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sdj
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdj 0 
# Partitions on /dev/sdj
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sdk
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdk 2000398934016 msdos
# Partitions on /dev/sdk
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
#part /dev/sdk 2000397795328 1048576 primary none /dev/sdk1
# Format for LVM PVs
# lvmdev <volume_group> <device> [<uuid>] [<size(bytes)>]
lvmdev /dev/koza-vg /dev/sda3 Jrj4gZ-jB4Q-MptE-WqNU-Hb6p-bFbX-WukoVT 1874868224
# Format for LVM VGs
# lvmgrp <volume_group> <extentsize> [<size(extents)>] [<size(bytes)>]
lvmgrp /dev/koza-vg 4096 228865 937431040
# Format for LVM LVs
# lvmvol <volume_group> <name> <size(bytes)> <layout> [key:value ...]
lvmvol /dev/koza-vg root 956825600000b linear 
lvmvol /dev/koza-vg swap_1 3070230528b linear 
# Filesystems (only ext2,ext3,ext4,vfat,xfs,reiserfs,btrfs are supported).
# Format: fs <device> <mountpoint> <fstype> [uuid=<uuid>] [label=<label>] [<attributes>]
fs /dev/mapper/koza--vg-root / ext4 uuid=ba64eb10-3b19-4d13-ad53-038f08c8d6cf label= blocksize=4096 reserved_blocks=5% max_mounts=-1 check_interval=0d bytes_per_inode=16383 default_mount_options=user_xattr,acl options=rw,relatime,errors=remount-ro,data=ordered
fs /dev/sda2 /boot ext2 uuid=4680ee82-6a53-486c-a7ce-c1989b4986c5 label= blocksize=1024 reserved_blocks=4% max_mounts=-1 check_interval=0d bytes_per_inode=4093 default_mount_options=user_xattr,acl options=rw,relatime,block_validity,barrier,user_xattr,acl
#fs /dev/sdk1 /media/externo ext4 uuid=24945c87-1ef0-4ba9-8a0d-e0b199c85416 label=externo blocksize=4096 reserved_blocks=4% max_mounts=-1 check_interval=0d bytes_per_inode=16383 default_mount_options=user_xattr,acl options=rw,relatime,data=ordered
# Swap partitions or swap files
# Format: swap <filename> uuid=<uuid> label=<label>
swap /dev/mapper/koza--vg-swap_1 uuid=61858b20-22b9-43e7-a7c0-06ac2bea4614 label=
jsmeix commented 5 years ago

Ugh - looks somewhat complicated what is mounted there, in particular I know nothing at all about container stuff like kubernetes and kubelet or docker.

I see nothing about home at all so it seems the /home/ directory belongs to the root filesystem so that it should normally just be automatically included in the tar backup.

Currently I have no idea why it is not included in the backup in this particular case here.

Is perhaps what appears as home not a regular directory of the root filesystem but some kind of indirectly provided stuff via some container thingy like kubernetes, docker and whatnot?

leonardottl commented 5 years ago

In fact home is just an example of a directory that is missing, in fact home is basically empty in our setup, because most of the work is done on containers, and stored on a data disk, so home is basically configuration stuff for each user, not much else. But other directories are also missing. So yes, home is in the root (which it should not be we know), but it is not being included in the bakup, in general the backup is only a fraction of the size of the whole system .

Some other points to consider:

jsmeix commented 4 years ago

@leonardottl I think it will not provide a real solution for your issue so mainly FYI you may have a look at a similar issue https://github.com/rear/rear/issues/2344

Perhaps it might even help you a bit nevertheless.

Regarding huge files on a ISO i.e. with OUTPUT=ISO on PPC64le (in particular having a backup.tar.gz that is bigger than 4GiB in the ISO) see in particular https://github.com/rear/rear/issues/2344#issuecomment-602507147 and subsequent comments and the links therein that are about using the -iso-level 3 option also on PPC64le, cf. https://github.com/rear/rear/issues/2344#issuecomment-601655379

With current ReaR GitHub master code the -iso-level 3 option is now also used on PPC64le, cf. https://github.com/rear/rear/pull/2346

If you like to try that out see the section "Testing current ReaR upstream GitHub master code" https://en.opensuse.org/SDB:Disaster_Recovery

jsmeix commented 4 years ago

@leonardottl right now I did https://github.com/rear/rear/issues/2348

Please have a look there if I described the issue correctly. I am not a PPC user so I may misunderstand things.

leonardottl commented 4 years ago

@jsmeix Thanks for all the tips. I did not update, but our issue was resolved, we updated all of our system, including the OS and everything else. REAR is now working as it should, and backs up all of our docker system. Let me get back to you with the currently installed versions and other details.

pcahyna commented 4 years ago

It is strange though because according to #2344 a PreP partition is missing, so the system should not even start booting, while here it starts booting and can not mount root.

schabrolles commented 4 years ago

@pcahyna, the reason why the system can boot is because it is using petitboot (https://github.com/open-power/petitboot) used on Power BareMetal system (like LC models). In a nuteshell petitboot is micro-linux in firmware which load and scan disk and network to find grub configuration and aggregate then into a single menu (for disk, SAN, network, dvd, usb). ... So, in that case, PreP partition is not needed.

pcahyna commented 4 years ago

@schabrolles thanks for the explanation. So it looks that this and #2344 are two different issues.

schabrolles commented 4 years ago

@pcahyna, yes you are right... #2344 is PowerVM LPAR, in that case, Prep is needed.

github-actions[bot] commented 4 years ago

Stale issue message