plesk / centos2alma

CentOS 7 to AlmaLinux 8 conversion tool
Apache License 2.0
39 stars 11 forks source link

Upgrade failue with centos2alma-1.4.3 #388

Open gokhand opened 1 week ago

gokhand commented 1 week ago

Describe the bug I fixed some of the issues in pre upgrade then it is started to do changes but it failed and I had to revert back otherwise server not responsive.

Here is the last lines before exit from the process and I also attached the feedback file.

2024-11-23 14:02:55,582 - INFO - stdout: Debug output written to /var/log/leapp/leapp-preupgrade.log 2024-11-23 14:02:55,583 - ERROR - Command ['/usr/bin/leapp', 'preupgrade'] failed with return code 1 2024-11-23 14:02:55,583 - ERROR - Failed: doing the conversion. The reason: Command '['/usr/bin/leapp', 'preupgrade']' returned non-zero exit status 1.

Please attach a feedback archive to the bug report. centos2alma_feedback.zip

nmnsa commented 1 week ago

Same here. 2024-11-24 01:25:55,693 - INFO - stderr: File "/usr/lib64/python2.7/multiprocessing/managers.py", line 625, in _finalize_manager 2024-11-24 01:25:55,693 - INFO - stderr: process.terminate() 2024-11-24 01:25:55,882 - INFO - stderr: File "/usr/lib64/python2.7/multiprocessing/process.py", line 137, in terminate 2024-11-24 01:25:55,882 - INFO - stderr: self._popen.terminate() 2024-11-24 01:25:55,882 - INFO - stderr: File "/usr/lib64/python2.7/multiprocessing/forking.py", line 171, in terminate 2024-11-24 01:25:55,882 - INFO - stderr: os.kill(self.pid, signal.SIGTERM) 2024-11-24 01:25:56,206 - INFO - stdout: Creates the upgrade initramfs 2024-11-24 01:25:56,206 - INFO - stderr: OSError: [Errno 3] No such process 2024-11-24 01:25:56,207 - ERROR - Command ['/usr/bin/leapp', 'upgrade'] failed with return code 1 2024-11-24 01:25:56,207 - ERROR - Failed: doing the conversion. The reason: Command '['/usr/bin/leapp', 'upgrade']' returned non-zero exit status 1.

SandakovMM commented 5 days ago

Hello @gokhand, I cannot find a specific reason for the failure of the preupgrade. Is it possible that there was a lack of available RAM, and the process was terminated by the OOM killer, for example? Could you please run leapp preupgrade manually and share the results with me? Please include the return code

gokhand commented 5 days ago

Hi @SandakovMM, there is plenty of RAM. As you asked I did run leapp preupgrade, in that exit code it says: No such file or directory: '/etc/named-user-options.conf';

You can see the details below, there are also some high and medium severities also reported. Any suggestion would be helpful. Thanks.

Process Process-534:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 75, in _do_run
    actor_instance.run(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/leapp/actors/__init__.py", line 296, in run
    self.process(*args)
  File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/checkbind/actor.py", line 32, in process
    facts = iscmodel.get_facts('/etc/named.conf')
  File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/checkbind/libraries/iscmodel.py", line 76, in get_facts
    parser = isccfg.IscConfigParser(path)
  File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 409, in __init__
    self.load_config(config)
  File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 954, in load_config
    self.load_included_files()
  File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 940, in load_included_files
    self.on_include_error(ConfigParseError(e, include))
  File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 919, in on_include_error
    raise e
ConfigParseError: Cannot open the configuration file "/etc/named-user-options.conf": [Errno 2] No such file or directory: '/etc/named-user-options.conf'; included from "/etc/named-user-options.conf"
2024-11-25 10:03:03.450 ERROR    PID: 16200 leapp.workflow.Checks: Actor check_bind has crashed: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 75, in _do_run
    actor_instance.run(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/leapp/actors/__init__.py", line 296, in run
    self.process(*args)
  File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/checkbind/actor.py", line 32, in process
    facts = iscmodel.get_facts('/etc/named.conf')
  File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/checkbind/libraries/iscmodel.py", line 76, in get_facts
    parser = isccfg.IscConfigParser(path)
  File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 409, in __init__
    self.load_config(config)
  File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 954, in load_config
    self.load_included_files()
  File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 940, in load_included_files
    self.on_include_error(ConfigParseError(e, include))
  File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 919, in on_include_error
    raise e
ConfigParseError: Cannot open the configuration file "/etc/named-user-options.conf": [Errno 2] No such file or directory: '/etc/named-user-options.conf'; included from "/etc/named-user-options.conf"

===========================================================================================
Actor check_bind unexpectedly terminated with exit code: 1 - Please check the above details
===========================================================================================

Debug output written to /var/log/leapp/leapp-preupgrade.log

============================================================
                      REPORT OVERVIEW                       
============================================================

Upgrade has been inhibited due to the following problems:
    1. Detected incorrect order of entries or duplicate entries in /etc/fstab, preventing a successful in-place upgrade.

HIGH and MEDIUM severity reports:
    1. Difference in Python versions and support in RHEL 8
    2. Detected customized configuration for dynamic linker.
    3. Detected custom leapp actors or files.
    4. Packages not signed by Red Hat found on the system
    5. GRUB2 core will be automatically updated during the upgrade
    6. spamc no longer allows specifying the TLS version and no longer supports SSLv3
    7. spamd no longer allows specifying the TLS version and no longer supports SSLv3
    8. The type of the spamassassin systemd service has changed
    9. Module pam_pkcs11 will be removed from PAM configuration

Reports summary:
    Errors:                      1
    Inhibitors:                  1
    HIGH severity reports:       5
    MEDIUM severity reports:     4
    LOW severity reports:        5
    INFO severity reports:       2

Before continuing, review the full report below for details about discovered problems and possible remediation instructions:
    A report has been generated at /var/log/leapp/leapp-report.txt
    A report has been generated at /var/log/leapp/leapp-report.json

============================================================
                   END OF REPORT OVERVIEW                   
============================================================
SandakovMM commented 5 days ago
  1. Detected incorrect order of entries or duplicate entries in /etc/fstab, preventing a successful in-place upgrade.

There appears to be an issue with your /etc/fstab configuration that is preventing conversion. Could you please share the file? I will add a pre-checker to verify the /etc/fstab file on our end. However, I believe we will not be able to automatically fix these types of problems (nor should we, as it might be unexpected for users).

gokhand commented 5 days ago

Automatically fix maybe not easy to do, but better to have suggestion could be helpful.

Here is the fstab contents:

proc /proc proc defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 tmpfs /dev/shm tmpfs defaults 0 0 sysfs /sys sysfs defaults 0 0 /dev/md/0 none swap sw 0 0 /dev/md/1 /boot ext3 defaults 0 0 /dev/md/2 / ext4 defaults 0 0

SandakovMM commented 5 days ago

The issue is related to the order of entries. Leapp prevents incorrect ordering that might cause overshadowing. In your case, you should move /dev/md/2 / ext4 defaults 0 0 from the end of the configuration file to the start.

gokhand commented 5 days ago

Thanks @SandakovMM, leapp preupgrade output changed one thing, now it is not given this one

Upgrade has been inhibited due to the following problems:

1. Detected incorrect order of entries or duplicate entries in /etc/fstab, preventing a successful in-place upgrade.

but still exiting with exit code 1, this is the latest one:

`====> check_bind Actor parsing BIND configuration and checking for known issues in it. Process Process-538: Traceback (most recent call last): File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run self._target(self._args, *self._kwargs) File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 75, in _do_run actor_instance.run(args, kwargs) File "/usr/lib/python2.7/site-packages/leapp/actors/init.py", line 296, in run self.process(args) File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/checkbind/actor.py", line 32, in process facts = iscmodel.get_facts('/etc/named.conf') File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/checkbind/libraries/iscmodel.py", line 76, in get_facts parser = isccfg.IscConfigParser(path) File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 409, in init self.load_config(config) File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 954, in load_config self.load_included_files() File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 940, in load_included_files self.on_include_error(ConfigParseError(e, include)) File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 919, in on_include_error raise e ConfigParseError: Cannot open the configuration file "/etc/named-user-options.conf": [Errno 2] No such file or directory: '/etc/named-user-options.conf'; included from "/etc/named-user-options.conf" 2024-11-25 13:43:24.632 ERROR PID: 8865 leapp.workflow.Checks: Actor check_bind has crashed: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 75, in _do_run actor_instance.run(args, kwargs) File "/usr/lib/python2.7/site-packages/leapp/actors/init.py", line 296, in run self.process(*args) File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/checkbind/actor.py", line 32, in process facts = iscmodel.get_facts('/etc/named.conf') File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/checkbind/libraries/iscmodel.py", line 76, in get_facts parser = isccfg.IscConfigParser(path) File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 409, in init self.load_config(config) File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 954, in load_config self.load_included_files() File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 940, in load_included_files self.on_include_error(ConfigParseError(e, include)) File "/usr/share/leapp-repository/repositories/system_upgrade/el7toel8/libraries/isccfg.py", line 919, in on_include_error raise e ConfigParseError: Cannot open the configuration file "/etc/named-user-options.conf": [Errno 2] No such file or directory: '/etc/named-user-options.conf'; included from "/etc/named-user-options.conf"

=========================================================================================== Actor check_bind unexpectedly terminated with exit code: 1 - Please check the above details

Debug output written to /var/log/leapp/leapp-preupgrade.log

============================================================ REPORT OVERVIEW

HIGH and MEDIUM severity reports:

  1. Difference in Python versions and support in RHEL 8
  2. Detected customized configuration for dynamic linker.
  3. Detected custom leapp actors or files.
  4. Packages not signed by Red Hat found on the system
  5. GRUB2 core will be automatically updated during the upgrade
  6. spamc no longer allows specifying the TLS version and no longer supports SSLv3
  7. spamd no longer allows specifying the TLS version and no longer supports SSLv3
  8. The type of the spamassassin systemd service has changed
  9. Module pam_pkcs11 will be removed from PAM configuration

Reports summary: Errors: 1 Inhibitors: 0 HIGH severity reports: 5 MEDIUM severity reports: 4 LOW severity reports: 5 INFO severity reports: 2

Before continuing, review the full report below for details about discovered problems and possible remediation instructions: A report has been generated at /var/log/leapp/leapp-report.txt A report has been generated at /var/log/leapp/leapp-report.json

============================================================ END OF REPORT OVERVIEW
============================================================`

SandakovMM commented 5 days ago

No such file or directory: '/etc/named-user-options.conf'; included from "/etc/named-user-options.conf"

This problem is usually handled by centos2alma. Therefore, I suggest you try using centos2alma itself now to perform the conversion

gokhand commented 5 days ago

I proceeded with centos2alma, it didn't stop like before it passed and in the phase of reboot the server. But unfortunately the server cannot be remotely accessible now. I will try to use KVM to access it, but before that do you have any idea what could be the issue or it is about the wait till the expected process finish by the script? (but now it is more than 30 minutes)

Before reboot what I had on the screen was this:

`** WARNING ***
The conversion is ready to begin. The server will be rebooted in 45 seconds. The conversion process will take approximately 25 minutes. If you wish to prevent the reboot, simply terminate the centos2alma process. Please note that Plesk functionality is currently unavailable.


( stage Pause before reboot / action pause before reboot ) 23:53 / 30:45


The dist-upgrade process needs to reboot the server. It will be rebooted in several seconds. The process will resume automatically after the reboot. Current server time: 16:13:06. To monitor the disupgrade status use one of the following commands: /root/centos2alma --status or /root/centos2alma --monitor **`

gokhand commented 5 days ago

An update, I connected with KVM and it turns out the mounting process was messed up. It was in dracut emergency shell.

Warning: /dev/md/2 does not exist Generating "/run/initramfs/rdsosreport.txt Entering emergency mode. Exit the shell to continue. Type "journalctI" to view system logs. You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot after mounting them and attach it to a bug report. dracut:/#

so when I check, I saw instead of md/2 - > md125 main raid (there is also md126, md127) exist on the system. I don't know centos2alma made these changes/ I temporarily mounted it and saw that everything was in there but I don't know why the md's names are changed. When I temporarily mounted I saw fstab file still md/2 . And tried to change it md125 and reboot it but at the end the issue still persists.

This is really bad situation that I am in right now, and trying to find a solution for this. If you have any idea how we can resolve it that will be awesome.

SandakovMM commented 4 days ago

The issue seems to occur because the new kernel version changes the names of the RAID devices in the system.

We haven't encountered this issue previously, so I suggest manually setting the names of the RAID devices via the command line in the kernel. Alternatively, you could use the UUID of the RAID devices in the /etc/fstab file, which I believe is more reliable (as explained by Jeff Geerling, for example). Here’s how you can do it:

  1. Find out the UUID of the RAID device using the command blkid /dev/md/125.
  2. Replace /dev/md/125 with the UUID in the /etc/fstab file.

It might also be a good idea to revert to a system snapshot taken before the conversion and modify the /etc/fstab before starting the conversion process.

gokhand commented 2 days ago

I can give an update, In the end, I tried a lot of different solutions but it ended up with the md 2 not found even though I changed it on /etc/fstab. Btw I changed /etc/fstab first mounting the md125 and all files there over the dracut shell. After reboot it is becoming again asking md2. I think this was the best approach because there is no /etc/fstab as you can imagine on Dracut shell status.

So, I am not a huge expert on these mounting issues and don't want to lose data from my server by accident. I have limited time for KVM and left 1 hour of access and time passing with the server down. So unfortunately I ended up deciding to add a new server preinstalled with Alma, connect my old one in rescue and move all data. It was the best, cleanest way and fastest approach in my situation.

I left the old server open. Now I have 1 hour left for KVM, maybe I can try some fixes for the old one. But I don't think I can manage within 1 hour time frame.

SandakovMM commented 2 days ago

I am grateful for your insights on the RAID issue. The problem seems significant as there are many difficulties in fixing it. I intend to implement a pre-checker to prevent the conversion when the fstab is configured in this manner. It may also be helpful to inform the developers of the Elevate tools about this issue. I will contact them as well. Thank you again for the information