oamg / leapp-repository

Leapp repositories containing actors for the Leapp framework (https://github.com/oamg/leapp). Currently provides leapp repositories for in-place upgrades of RHEL systems.
Apache License 2.0
48 stars 144 forks source link

mount /usr: Implement try-sleep loop (SAN + FC) #1218

Closed pirat89 closed 3 months ago

pirat89 commented 3 months ago

This problem is typical for SAN + FC when the storage needs sometimes more time for the initialisation. Implemented try-sleep loop. Retry the activation of the storage + /usr mounting in 15s. The loop can be repeated 10 times, so total time is 150s right now for the activation.

Note that this is not proper solution for the storage initialisation, however we have discovered some obstacles in the bootup process to be able to do it correctly as we would like to. Regarding limited time, we are going to deliver this solution, that should improve the experience and should be safe to not cause regressions for already working functionality. We expect to provide better solution for newer upgrades paths in future (IPU 8->9 and newer)

jira: https://issues.redhat.com/browse/RHEL-3344

Note the related PR where we will try to address this problem properly will be most likely #1202

github-actions[bot] commented 3 months ago

Thank you for contributing to the Leapp project!

Please note that every PR needs to comply with the Leapp Guidelines and must pass all tests in order to be mergeable. If you want to request a review or rebuild a package in copr, you can use following commands as a comment:

Packit will automatically schedule regression tests for this PR's build and latest upstream leapp build. If you need a different version of leapp, e.g. from PR#42, use /packit test oamg/leapp#42 Note that first time contributors cannot run tests automatically - they will be started by a reviewer.

It is possible to schedule specific on-demand tests as well. Currently 2 test sets are supported, beaker-minimal and kernel-rt, both can be used to be run on all upgrade paths or just a couple of specific ones. To launch on-demand tests with packit:

See other labels for particular jobs defined in the .packit.yaml file.

Please open ticket in case you experience technical problem with the CI. (RH internal only)

Note: In case there are problems with tests not being triggered automatically on new PR/commit or pending for a long time, please contact leapp-infra.

pirat89 commented 3 months ago

Manual testing on my LVM VMs:

Lsblk from tested machines (used multiple VGs):

[root@localhost ~]# lsblk
NAME           MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda            252:0    0   30G  0 disk 
├─vda1         252:1    0  200M  0 part /boot/efi
├─vda2         252:2    0    1G  0 part /boot
├─vda3         252:3    0    7G  0 part 
│ └─rhel00-var 253:3    0    7G  0 lvm  /var
├─vda4         252:4    0    4G  0 part 
│ ├─rhel-root  253:0    0    2G  0 lvm  /
│ └─rhel-swap  253:1    0    2G  0 lvm  [SWAP]
└─vda5         252:5    0    6G  0 part 
  └─rhel01-usr 253:2    0    6G  0 lvm  /usr
[root@localhost ~]# lsblk
NAME           MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda            252:0    0  30G  0 disk 
├─vda1         252:1    0   1G  0 part /boot
├─vda2         252:2    0   8G  0 part 
│ └─rhel00-var 253:3    0   7G  0 lvm  /var
├─vda3         252:3    0   7G  0 part 
│ └─rhel01-usr 253:2    0   6G  0 lvm  /usr
├─vda4         252:4    0   1K  0 part 
└─vda5         252:5    0   6G  0 part 
  ├─rhel-root  253:0    0   3G  0 lvm  /
  └─rhel-swap  253:1    0   2G  0 lvm  [SWAP]
vdb            252:16   0  20G  0 disk 
[root@localhost ~]# lsblk
NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda           252:0    0   20G  0 disk 
├─vda1        252:1    0  500M  0 part /boot/efi
├─vda2        252:2    0  500M  0 part /boot
└─vda3        252:3    0   19G  0 part 
  ├─rhel-root 253:0    0    4G  0 lvm  /
  ├─rhel-swap 253:1    0    2G  0 lvm  [SWAP]
  ├─rhel-usr  253:2    0    8G  0 lvm  /usr
  └─rhel-var  253:3    0    5G  0 lvm  /var
rmetrich commented 3 months ago

Sorry I don't like that proposal much, because it happens the sleep xxx helps in many other cases than having a dedicated /usr, in fact customer cases showed that it helps as soon as more than a few LVM LVs are present. Hence I would suggest that even if there is no /usr to wait for, sleep at least a round (15 seconds).

pirat89 commented 3 months ago

@rmetrich thanks for the feedback. Shouldn't be that covered by the LVM activation that's checked and happens before the mount_usr is called? I hoped that it could catch it. Here is #1219 PR applying sleep always.