rear / rear

Relax-and-Recover - Linux bare metal disaster recovery and system migration solution (cfr. mksysb, ignite)
http://relax-and-recover.org/
GNU General Public License v3.0
951 stars 256 forks source link

Rear multipath issue #1899

Closed samurdhi closed 6 years ago

samurdhi commented 6 years ago

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue (quick response is not guaranteed with free support):

This is a first time I'm going to use REAR as a backup system in my company. I have tested with servers having only internal disks, rear work well with that. But this time I'm going to create recovery iso for following server, it has both internal disks(sda) and SAN multipath disks(mpatha, mpathb, mpathc).

mpatha has been used to create a volume group (sdcsit26-openvvg) mpathb,mpathac used to create a volume group (openvvg)

Those multipath LVM VGs are used to create LV and mounted on the server. I want to know that doest rear support recovery of multipath LVMs ? and I believe I dont need to take backup of those LVMs because those are created using LUN disks, is that true ?

[root@srv ~]# pvs
  PV                 VG               Fmt  Attr PSize    PFree   
  /dev/mapper/mpatha sdcsit26-openvvg lvm2 a--    <1.40t       0 
  /dev/mapper/mpathb openvvg          lvm2 a--  <500.00g       0 
  /dev/mapper/mpathc openvvg          lvm2 a--  <500.00g <100.00g
  /dev/sda2          rootvg           lvm2 a--  <557.88g  470.88g

[root@srv ~]# vgs
  VG               #PV #LV #SN Attr   VSize    VFree   
  openvvg            2   2   1 wz--n-  999.99g <100.00g
  rootvg             1   5   0 wz--n- <557.88g  470.88g
  sdcsit26-openvvg   1   1   0 wz--n-   <1.40t       0 
[root@srv ~]# 

[root@srv ~]# lvs
  LV               VG               Attr       LSize    Pool Origin  Data%  Meta%  Move Log Cpy%Sync Convert
  openv_bak        openvvg          swi-a-s---  400.00g      openvlv 35.99                                  
  openvlv          openvvg          owi-aos--- <500.00g                                                     
  homelv           rootvg           -wi-ao----    5.00g                                                     
  rootlv           rootvg           -wi-ao----   23.99g                                                     
  swaplv           rootvg           -wi-ao----   32.00g                                                     
  tmplv            rootvg           -wi-ao----   10.00g                                                     
  varlv            rootvg           -wi-ao----   16.00g                                                     
  sdcsit26-openvlv sdcsit26-openvvg -wi-ao----   <1.40t    

[root@srv ~]# df -kh
Filesystem                                       Size  Used Avail Use% Mounted on
/dev/mapper/rootvg-rootlv                         24G  1.4G   23G   6% /
devtmpfs                                          63G     0   63G   0% /dev
tmpfs                                             63G     0   63G   0% /dev/shm
tmpfs                                             63G   67M   63G   1% /run
tmpfs                                             63G     0   63G   0% /sys/fs/cgroup
/dev/sda1                                       1016M  149M  867M  15% /boot
/dev/mapper/sdcsit26--openvvg-sdcsit26--openvlv  1.4T   33M  1.4T   1% /usr/openv_sync
/dev/mapper/rootvg-homelv                        5.0G   55M  5.0G   2% /home
/dev/mapper/rootvg-tmplv                          10G   44M   10G   1% /tmp
/dev/mapper/rootvg-varlv                          16G  547M   16G   4% /var
/dev/mapper/openvvg-openvlv                      500G   99G  401G  20% /usr/openv
tmpfs                                             13G     0   13G   0% /run/user/36
tmpfs                                             13G     0   13G   0% /run/user/9697 
jsmeix commented 6 years ago

@samurdhi you use Relax-and-Recover 2.00 which is meanwhile a bit old in particular because there have been many enhancements and fixes regarding multipath since ReaR 2.00 was released in January 2017, see https://raw.githubusercontent.com/rear/rear/master/doc/rear-release-notes.txt

Could you try out our current ReaR upstream GitHub master code so that you and we at ReaR upstream use the same code which makes debugging easier for us when there are issues.

To use our current ReaR upstream GitHub master code do the following:

Basically "git clone" it into a separated directory and then configure and run ReaR from within that directory like:

# git clone https://github.com/rear/rear.git

# mv rear rear.github.master

# cd rear.github.master

# vi etc/rear/local.conf

# usr/sbin/rear -D mkbackup

Note the relative paths "etc/rear/" and "usr/sbin/".

I recommend to use our current ReaR upstream GitHub master code because that is the only place where we at ReaR upstream fix bugs. I.e. bugs in older ReaR versions are not fixed by us at ReaR upstream.

I am not a multipath user and I have no personal experience with it so that I cannot really help with multipath issues.

Nevertheless I try as far as I can guess how things work:

When your server has both internal disks and SAN multipath disks then - as far as I see - by default via AUTOEXCLUDE_MULTIPATH=y in usr/share/rear/conf/default.conf multipath disks are excluded because in general SAN storage devicers are normally not meant to be recovered by ReaR, see https://raw.githubusercontent.com/rear/rear/master/doc/user-guide/06-layout-configuration.adoc

I think normally you have a separated recovery tool for your SAN storage e.g. a specific recovery tool from the manufacturer or vendor of your particular SAN storage. I think things might go even terribly wrong on your SAN storage if "rear recover" that runs locally on one single machine would do partitioning, creating filesystems, and things like that on your SAN storage devices.

Perhaps @schabrolles who is a multipath user could have a look here because likely he can answer your question much better than I can do.

samurdhi commented 6 years ago

@jsmeix Thanks you for the reply and it's very helpful. Actually what I need to know is that I don't want to backup those multipath disks but after recovering the server I want them to present in the server.

jsmeix commented 6 years ago

I have no personal experience with SAN and multipath so that I cannot really help with such issues. Nevertheless I try as far as I know how things should work:

During "rear recover" the backup of the files of the original system gets restored onto the replacement system so that in particular all config files of the original system get restored and afterwards the initrd gets recreated and the bootloader gets reinstalled.

Accordingly after "rear recover" finished when the recreated replacement system is rebooted it should boot and start up in the same way as the original system did so that in particular the recreated replacement system should be able to access the SAN storage in the same way as the original system did - provided the replacement system is fully compatible with the original system, cf. the section "Fully compatible replacement hardware is needed" in https://en.opensuse.org/SDB:Disaster_Recovery

In practice things are more complicated because there are certain kind of hardware specific IDs used on a system like filesystem UUIDs or MAC addresses and things like that so that the restored config files of the original system could contain some old hardware specific IDs of the original system that do not match the new hardware specific IDs of the replacement hardware.

Therefore during the "finalize" stage of "rear recover" ReaR adapts some known kind of hardware specific IDs in some known config files so that usually things should "just work" after "rear recover" on the recreated replacement system in the same way as on the original system.

For some details how that usually works but could fail under special conditions in particular regarding the bootloader see the initial description in https://github.com/rear/rear/issues/1854 and in particular regarding config files like etc/fstab see https://github.com/rear/rear/pull/1758

Accordingly there is no guarantee that all will always "just work" after "rear recover", cf. the section "Inappropriate expectations" and its sub-sections "Disaster recovery does not just work" and "No disaster recovery without testing and continuous validation" in https://en.opensuse.org/SDB:Disaster_Recovery

samurdhi commented 6 years ago

Should I remove AUTOEXCLUDE_MULTIPATH=n ?

jsmeix commented 6 years ago

In general I would start with the default settings (because one can assume the defaults are usually right) and see how things work by default unless I know already in advance that in my particular case the defaults are not right.

@samurdhi in the end you need to test how it works with your particular original system in your particular environment on your replacement hardware, see https://en.opensuse.org/SDB:Disaster_Recovery

jsmeix commented 6 years ago

@schabrolles in usr/share/rear/conf/examples/RHEL7-PPC64LE-Mulitpath-PXE-GRUB.conf there is (excerpt)

AUTOEXCLUDE_MULTIPATH=n
BOOT_OVER_SAN=y

Could you explain why AUTOEXCLUDE_MULTIPATH=n (and BOOT_OVER_SAN=y) must be used in this particular example in contrast to AUTOEXCLUDE_MULTIPATH=y in default.conf because the matching issue https://github.com/rear/rear/pull/1339 mentiones neither SAN nor multipath but is only about PXE based on GRUB2 booting on PPC64/PPC64LE.

I guess BOOT_OVER_SAN=y requires AUTOEXCLUDE_MULTIPATH=n because when the system should boot from a SAN device with multipath the "SAN via multipath devices" should not be autoexcluded but I wonder what happens when SAN is used without multipath?

I wonder why and how PXE boot is interconnected with BOOT_OVER_SAN and multipath or is the connection in this example config file only by chance?

I guess that also a system with a local harddisk (i.e. no SAN) could be booted via PXE (because I think PXE boot is not only for thin clients) or do I misunderstand something here?

schabrolles commented 6 years ago

@jsmeix, You are right, there is no direct link between Multipath/Boot_on_SAN and PXE (even on POWER). I use them in my example because PowerVM LPARs usually need Multipath (90% of the case in my point of view)

PowerVM LPAR with single VIOS (not common) => nothing special PowerVM LPAR with DUAL VIOS (HA configuration) => NEED multipath for dual disk access (vscsi over 2 VIOS) or NPIV (Fiber Channel virtualization) Note: NPIV (which is quite popular) needs also BOOT_ON_SAN

jsmeix commented 6 years ago

@schabrolles thanks for your explanation! Accordingly I added some more explanatory comments to ttps://github.com/rear/rear/blob/master/usr/share/rear/conf/examples/RHEL7-PPC64LE-Mulitpath-PXE-GRUB.conf via https://github.com/rear/rear/commit/e75b86a7d0cc5d043e0712f0d9b6846797167525

gdha commented 6 years ago

@samurdhi is this solved?

samurdhi commented 6 years ago

@gdha Yes, Gratien, It's resolved. Thank you so much for the support.

jsmeix commented 7 months ago

There is a typo 'BOOT_ON_SAN' in https://github.com/rear/rear/issues/1899#issuecomment-413777499 which I didn't notice so I copy&pasted it in my https://github.com/rear/rear/commit/e75b86a7d0cc5d043e0712f0d9b6846797167525 but that will now get fixed via https://github.com/rear/rear/pull/3211