pop-os / upgrade

Utility for upgrading Pop!_OS and its recovery partition to new releases.
GNU General Public License v3.0
95 stars 29 forks source link

Upgrade from 20.04 to 22.04 failed and removed applications #332

Open thomas-zimmerman opened 2 months ago

thomas-zimmerman commented 2 months ago

Customer report on a failed upgrade from 20.04 to 22.04

Yesterday, I finally took the long-deferred step of trying to get on the latest LTS release of Pop_OS. It was an awful, very stressful experience, and I'm wondering if you can help ascertain what caused things to go so awry. I'm happy to provide logs and diagnostic data if that would help.

The impetus for upgrading was that I had been experiencing some instability with 20.04. Notably, a few times in the last few weeks, my desktop environment has suddenly crashed without warning. The occasional freeze I'm used to, but this was different: the GUI disappeared and was replaced by a blinking cursor, then the login screen appeared.

So I decided to switch to 22.04. I made a backup and moved ahead. I first attempted the upgrade from the GUI, but it got stuck "downloading" with no discernible progress after more than an hour. So I switched over to the command line using Konsole. Although I didn't think that anything had changed in my system, I soon discovered that the repos had been updated to 22.04, which struck me as odd. The documentation gives the impression that nothing changes until you click upgrade - to me, "download" doesn't mean "start upgrade" - but I perhaps I drew the wrong conclusion.

Well, I ran the commands in the article here: https://support.system76.com/articles/upgrade-pop/

Then the real trouble began. The upgrade kept getting interrupted, with dpkg reporting errors, especially dependency problems with libglib2.0-0:i386. I kept trying to fix the broken packages and restarting the upgrade with sudo apt full-upgrade. That worked for a while - I had to keep starting over, though.

Then, all of the sudden, the desktop crashed and the system went into emergency mode. I opened the logs to see what might be the matter, but it wasn't clear. So I rebooted. I decrypted my NVMe drives like usual, but then the system went into emergency mode again, without letting me decrypt my auxiliary SSD, which mainly holds ISOs for virtualization.

I couldn't do much of anything in emergency mode and seemed to be quite stuck. So I fetched a Cruzer USB stick with Pop OS 22.04 on it and got that operational. I decrypted my drives, chrooted into the system, fixed the remaining broken packages, and completed the upgrade.

However, the system still would not boot up normally. So I went back to the live USB and examined the logs closely. There were messages saying that ext4-fs vfs can't find the ext4 filesystem on /dev/sda1. That's the auxiliary SSD in the system, a 2.5mm SSD.

I thought this was strange. A fsck showed no errors, though I accepted an offer to optimize some extent trees.

With the system still not booting, I opted to disable /dev/sda1 in crypttab and fstab by commenting it out. This worked, and finally, I got to the login screen.

Upon logging in, I discovered my speakers weren't working and VMware Workstation wouldn't start. So then I had to spend significant time diagnosing those problems. The speakers started working again after I switched the 3.5mm cable to a different jack on the back of the Thelio. VMware Workstation worked again after I installed gcc-10 and used kernelstub to activate an older kernel.

Ridiculously, the upgrade also wiped out a lot of packages that I wanted upgraded rather than removed. It nuked a lot of my KDE packages (I prefer KDE to the default Pop interface) and also removed KDE Plasma itself. I had to reinstall. Other packages, like VueScan and Zoho Notebook, were also forcibly removed.

I can't understand why this happened.

And I still don't understand why the upgrade failed so badly at a critical juncture. Why should the system get so tripped up on a storage medium that is set to auto-mount? Is that just a symptom of a deeper undiagnosed problem? I'm too exhausted at this point to dig into it. But I wanted to alert you to my experience while the memories - and the anxieties - were still fresh.

I don't think I've ever had an upgrade go this badly before. I know it's not the experience you want your users to have. This is your hardware and your software - the idea is that it's supposed to just work. If I lacked the skill to troubleshoot the problems, I'd have a totally bricked workstation right now. Scary thought.


The major applications that were removed during this upgrade were: digikam, KRename, Zoom, VueScan, Zoho Notebook, 1Password, and more.

pop-upgrade journal output for this: pop-upgrade.zip

jacobgkau commented 2 months ago

Ok, so for trying to reproduce this, we'd want to set up:

This is almost a list of everything that would complicate an upgrade. It would still be good if we can reproduce and fix the interruptions, though.

Please pass at least the following info back to the customer:

As of https://github.com/pop-os/upgrade/pull/293, orphaned packages (a.k.a. packages that are installed but do not have a configured apt repo that would update them) are intentionally removed because they can break dependency resolution for the upgrade (or could prevent the system from working after the upgrade), and there is no way to update them automatically. Reinstalling them from the third-party location (with an up-to-date version supporting the new OS version, where applicable) is a necessary manual part of upgrading.

Third-party apt repos are also removed for the same reason (they can break dependency resolution during the upgrade, and we can't reliably update them or know if they're able to be updated to a newer version string since we don't maintain those repos). We comment them out so they can be re-enabled and updated manually in repoman (the sources config dialog accessible through the Pop!_Shop) after the upgrade.

We do not touch files in the home directory, so user data and configuration information (and even cache) are not removed. Apps should pick the relevant data back up after they're re-installed. As a general best practice, the work of re-setting things up after a major upgrade can be minimized by using flatpaks and/or default apt repo packages instead of adding third-party apt repos, where possible.

Also, if you could get a copy of their /etc/fstab to see how the extra drive was configured, that may help to recreate that emergency mode issue.