Closed MichalHe closed 4 months ago
Please note that every PR needs to comply with the Leapp Guidelines and must pass all tests in order to be mergeable. If you want to request a review or rebuild a package in copr, you can use following commands as a comment:
Packit will automatically schedule regression tests for this PR's build and latest upstream leapp build. If you need a different version of leapp, e.g. from PR#42, use /packit test oamg/leapp#42
Note that first time contributors cannot run tests automatically - they will be started by a reviewer.
It is possible to schedule specific on-demand tests as well. Currently 2 test sets are supported, beaker-minimal
and kernel-rt
, both can be used to be run on all upgrade paths or just a couple of specific ones.
To launch on-demand tests with packit:
kernel-rt
tests set for all upgrade pathskernel-rt
and beaker-minimal
test sets for 8.10->9.4 upgrade pathSee other labels for particular jobs defined in the .packit.yaml
file.
Please open ticket in case you experience technical problem with the CI. (RH internal only)
Note: In case there are problems with tests not being triggered automatically on new PR/commit or pending for a long time, please contact leapp-infra.
Why parted instead of fdisk? Smaller rpm (but both seem to be preinstalled).
@MichalHe parted
does not have to be installed on the system and can be uninstalled. however, fdisk is part of util-linux and util-linux package must be always installed, so fdisk
is always present. From this point, fdisk is better option as we do not increase the surface of dependencies - and in this project, we want to limit introduction of new dependencies unless they are necessary. However, I am ok to be less strict in this case as parted is really by default present - at least on some systems.
# dnf remove parted
Updating Subscription Management repositories.
Dependencies resolved.
==============================================================================================================================================================================================================================================
Package Architecture Version Repository Size
==============================================================================================================================================================================================================================================
Removing:
parted x86_64 3.2-39.el8 @System 2.0 M
Transaction Summary
==============================================================================================================================================================================================================================================
Remove 1 Package
Freed space: 2.0 M
Is this ok [y/N]: n
Operation aborted.
[root@localhost ~]# rpm -qf `which fdisk`
util-linux-2.32.1-44.el8_9.1.x86_64
[root@localhost ~]# dnf remove util-linux-2.32.1-44.el8_9.1.x86_64
Updating Subscription Management repositories.
Error:
Problem: The operation would result in removing the following protected packages: systemd
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
@pirat89 I have rewritten the code to use fdisk
instead. I am against introducing needless dependencies and taking shortcuts.
@pirat89 I've manually tested the patch on RHEL7. On a VM with a disk formatted in a way so that the offset of the first partition is 63 sectors (the old way causing problems) the upgrade is inhibited due to the offset of the first partition being too small. With our vagrant box the upgrade was not inhibited (the offset of the first partition is 2048sectors (=1M)).
@oamg/developers This code is ready for a review, imho.
The solution works, tested on my upgraded 6 -> 7 system:
============================================================
REPORT OVERVIEW
============================================================
Upgrade has been inhibited due to the following problems:
...
3. Found GRUB devices with too little space reserved before the first partition
...
...
### report:
Risk Factor: high (inhibitor)
Title: Found GRUB devices with too little space reserved before the first partition
Summary: GRUB devices with too little space reserved before the first partition were found. Such devices can present a problem when updating the boot loader. Problematic devices:
- /dev/vda
Remediation: [hint] Reformat the devices so that there is at least 1024 kiB space before the first partition.
Key: 98eb0d6d72263cb21ac172fc6a612e27e48a9e91
Seems good:
Risk Factor: high (inhibitor)
Title: Found GRUB devices with too little space reserved before the first partition
Summary: On the system booting by using BIOS, the in-place upgrade fails when upgrading the GRUB2 bootloader if the boot disk's embedding area does not contain enough space for the core image installation. This results in a broken system, and can occur when the disk has been partitioned manually, for example using the RHEL 6 fdisk utility.
The list of devices with small embedding area:
- /dev/vda.
Remediation: [hint] We recommend to perform a fresh installation of the RHEL 8 system instead of performing the in-place upgrade.
Another possibility is to reformat the devices so that there is at least 1024 kiB space before the first partition. Note that this operation is not supported and does not have to be always possible.
Key: 98eb0d6d72263cb21ac172fc6a612e27e48a9e91
I am going to execute this yet on IBM Z and efi (as I have them by hand now) and merge it if passing. And possibly after tests are finished. Thanks!!!
@MichalHe I have bad karma I guess. It seems that CI caught FP inhibitor on AWS. we should check that. Checking logs, it seems to me that everything should ok, but I see the report.
The last one? :) Actually, the GPT (and the prepboot in general) is really the trouble maker in the end.
2024-04-23 20:01:03.901 INFO PID: 11755 leapp.workflow.Checks: Executing actor check_first_partition_offset
Process Process-386:
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 74, in _do_run
actor_instance.run(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/leapp/actors/__init__.py", line 289, in run
self.process(*args)
File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/checkfirstpartitionoffset/actor.py", line 24, in process
check_first_partition_offset.check_first_partition_offset()
File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/checkfirstpartitionoffset/libraries/check_first_partition_offset.py", line 19, in check_first_partition_offset
first_partition = min(grub_dev.partitions, key=lambda partition: partition.start_offset)
ValueError: min() arg is an empty sequence
2024-04-23 20:01:03.952 ERROR PID: 11755 leapp.workflow.Checks: Actor check_first_partition_offset has crashed: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 74, in _do_run
actor_instance.run(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/leapp/actors/__init__.py", line 289, in run
self.process(*args)
File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/checkfirstpartitionoffset/actor.py", line 24, in process
check_first_partition_offset.check_first_partition_offset()
File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/checkfirstpartitionoffset/libraries/check_first_partition_offset.py", line 19, in check_first_partition_offset
first_partition = min(grub_dev.partitions, key=lambda partition: partition.start_offset)
ValueError: min() arg is an empty sequence
The msg has no partitions at all:
----------------------------------------------------------------------
Stamp: 2024-04-23T20:00:59.709122Z
Actor: scan_grub_device_partition_layout
Phase: FactsCollection
Type: GRUBDevicePartitionLayout
Message_data:
{
"device": "/dev/xvda",
"partitions": []
}
----------------------------------------------------------------------
In the actor we can see that it's caused by using GPT (and prep boot partition) instead of MBR for BIOS boot.
2024-04-23 20:00:59.682 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout: External command has started: ['fdisk', '-l', '-u=sectors', u'/dev/xvda']
2024-04-23 20:00:59.689 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout: WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.
2024-04-23 20:00:59.691 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout:
2024-04-23 20:00:59.692 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout: Disk /dev/xvda: 268.4 GB, 268435456000 bytes, 524288000 sectors
2024-04-23 20:00:59.693 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout: Units = sectors of 1 * 512 = 512 bytes
2024-04-23 20:00:59.695 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout: Sector size (logical/physical): 512 bytes / 512 bytes
2024-04-23 20:00:59.696 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout: I/O size (minimum/optimal): 512 bytes / 512 bytes
2024-04-23 20:00:59.697 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout: Disk label type: gpt
2024-04-23 20:00:59.699 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout: Disk identifier: 2D3B54F1-2833-4F75-9BB7-C3A39B53EE7C
2024-04-23 20:00:59.699 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout:
2024-04-23 20:00:59.700 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout:
2024-04-23 20:00:59.701 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout: # Start End Size Type Name
2024-04-23 20:00:59.703 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout: 1 2048 4095 1M BIOS boot
2024-04-23 20:00:59.704 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout: 2 4096 524287965 250G Linux filesyste
2024-04-23 20:00:59.707 DEBUG PID: 15396 leapp.workflow.FactsCollection.scan_grub_device_partition_layout: External command has finished: ['fdisk', '-l', '-u=sectors', u'/dev/xvda']
As it is affecting even the CI, let's fix it before the merge too.
Created separate commit to handle the previousl described issue. The solution has 2 parts:
As described here:
Btw, disks with GPT could be also reported in some cases as "dos" instead:
I have also exteded a little bit original test cases, but the current new solution is not completely covered by unit-tests. I am testing this manually on my setups right now + CI set. So I believe we could call it good enough if all tests will pass. The early deliver allow us to discover possible issues sooner, so I am in favor of merging this under the current circumstances when we do not expect so much to extend the solution in future unless we find a good reason for it.
My manual testing results after the last changes:
Check that the first partition starts at least at 1MiB (2048 cylinders), as too small first-partition offsets lead to failures when doing grub2-install. The limit (1MiB) has been chosen as it is a common value set by the disk formatting tools nowadays.
Tips to reproduce
fdisk -c=dos -u=cylinders </dev/drive>
to force fdisk to allow you to format the target disk in a way that the first partition starts at an offset <= 1MiBjira: https://issues.redhat.com/browse/RHEL-3341