ostreedev / ostree

Operating system and container binary deployment and upgrades
https://ostreedev.github.io/ostree/
Other
1.31k stars 300 forks source link

Silverblue 31: `finalize-staged` taking over 15 minutes to complete #1924

Open returntrip opened 5 years ago

returntrip commented 5 years ago

@jlebon As discussed yesterday. Thanks for your help.

OS: Fedora Silverblue 31

Description

I have noticed that, after running an rpm-ostree operation which alters the deployment and rebooting to apply the changes, the system would wait for the ostree-finalize-staged job to complete, unfortunately this would take longer than the set timeout and the job would be killed.

rpm-ostree would then complain with:

Warning: failed to finalize previous deployment
         check `journalctl -b -1 -u ostree-finalize-staged.service`

Running journalctl -b -1 -u ostree-finalize-staged.service would produce no specific errors.

When I run ostree admin finalize-staged from the shell, this takes over 15 minutes to complete on a system with SSD drives.

Could it be a problem with probing of the other OSs (Windows 10 and Antergos) I have installed?

Logs

jlebon commented 5 years ago

OK, so the timestamp jump appears here:

Sep 19 08:34:19 rauros.figura.io returntrip[26622]: 50mounted-tests: debug: running subtest /usr/libexec/os-probes/mounted/90linux-distro
Sep 19 08:52:45 rauros.figura.io returntrip[26622]: 90linux-distro: result: /dev/nvme0n1p5:Antergos Linux:Antergos:linux

Which lead me to https://savannah.gnu.org/bugs/?50702. So yeah,

Could it be a problem with probing of the other OSs (Windows 10 and Antergos) I have installed?

sounds accurate. Or IOW, I think this is os-prober doing who-knows-what trying to figure out what Linux distro is installed there.

Not sure there's much we can do here. You could try opening a bug against GRUB2?

Note in FCOS at least, we've moved away from grub2-mkconfig entirely and rely on GRUB2 reading the BLS config entries directly. I know there's a push in the rest of Fedora as well towards BLS, though I'm not sure how that intersects with dual-booting machines like yours.

returntrip commented 5 years ago

Thanks for your analysis! One odd thing is that the issue started after rebasing to F31, with F30 there was no issue.

On Thu, 19 Sep 2019, at 15:47, Jonathan Lebon wrote:

OK, so the timestamp jump appears here:

Sep 19 08:34:19 rauros.figura.io returntrip[26622]: 50mounted-tests: debug: running subtest /usr/libexec/os-probes/mounted/90linux-distro Sep 19 08:52:45 rauros.figura.io returntrip[26622]: 90linux-distro: result: /dev/nvme0n1p5:Antergos Linux:Antergos:linux Which lead me to https://savannah.gnu.org/bugs/?50702. So yeah,

Could it be a problem with probing of the other OSs (Windows 10 and Antergos) I have installed?

sounds accurate. Or IOW, I think this is os-prober doing who-knows-what trying to figure out what Linux distro is installed there.

Not sure there's much we can do here. You could try opening a bug against GRUB2?

Note in FCOS at least, we've moved away from grub2-mkconfig entirely and rely on GRUB2 reading the BLS config entries directly. I know there's a push in the rest of Fedora as well towards BLS, though I'm not sure how that intersects with dual-booting machines like yours.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ostreedev/ostree/issues/1924?email_source=notifications&email_token=ADLHVOXJPCOVE6LUV4NTT43QKN7GRA5CNFSM4IYHOZE2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7DQYYI#issuecomment-533138529, or mute the thread https://github.com/notifications/unsubscribe-auth/ADLHVOUU7JZUX5PVUFLBPMTQKN7GRANCNFSM4IYHOZEQ.

p1u3o commented 5 years ago

I too have this issue, running a Windows install with Bitlocker on another drive, although I had it under Fedora 30 too. I thought Fedora moved to BLS for 30? I can see the ostree BLS entries at /boot/loader/entries

I use the UEFI boot screen to boot between Fedora and Windows, so I don't need mkconfig.

returntrip commented 5 years ago

@jlebon I do not use Antergos anymore, so will be getting rid of it soon and see if that makes a difference

returntrip commented 5 years ago

Got rid of Antergos from /dev/nvme0n1p5. Removed Antergos GRUB stuff from the Windows EFI Partition.

The system is now executing finalize-staged in a few seconds and I do not see any OS probing messages in the log file.

The only minor problem is that Windows 10 is now missing from GRUB menu. Anyway, I see it on the UEFI boot menu.

Is there any reason why Windows is not being probed anymore?

Windows 10 is now also showing on the GRUB menu, I guess it was updated on shutdown by the finalize-stagedservice or something?!?

zerodogg commented 4 years ago

I'm experiencing what I assume is this issue on Silverblue 31. I ran ostree admin finalize-staged and it just hung indefinitely (I gave up after two hours). I can't find anything of relevance in the journal. I don't have any other OSes installed. Is there anything I can do to help debug this?

jlebon commented 4 years ago

What kind of setup do you have (e.g. what storage device)? Do other disk operations take a long time to finish? Or is it isolated to OSTree finalization? Is it reproducible?

zerodogg commented 4 years ago

It's an SSD. No other operations take a long time.

I've done a bit of digging. It doesn't appear to be ostree itself that's hanging, it appears to be ostree launching something to generate grub, and it, in turn, launching os-prober. os-prober then hangs indefinitely waiting for blkid. So, at least for me, this issue is invalid as far as ostree is concerned. I'll submit an issue on os-prober/blkid (blkid on its own also hangs).

As a workaround, removing /etc/grub.d/30_os-prober to disable os-prober makes finalize-staged take a few seconds and successfully complete.

gitbobbed commented 4 years ago

Sorry for 'ressucitating' this, but I've just run into this issue in Fedora SB 32. Upon investigating I realized that if I manually mounted the root partition of one of the other distros I run (namely Arch, running on an ext4-formatted nvme), os-prober finishes in a couple of seconds and all is well.

Arch Linux runs on a separate disk, if that's of any importance.

I also noticed that this partition was set up with 'automatic options' in gnome-disk-utility, but it wasn't working. I then set it up to be mounted at startup, but that didn't work either. Upon checking that, I realized the entry in fstab had /mnt/[uuid], not /var/mnt/[uuid]. Once I made this change, the partition mounts normally on boot and os-prober works again.

My guess is that the os-prober script is trying to do the same thing gnome-disk-utility did (i.e. trying to mount my Arch partition on /mnt) and the standard symlink SB sets up isn't working.

FTR, opensuse is installed on yet another hard drive, and I didn't need to do any manual mounting for it to get picked up by os-prober.

SampsonF commented 3 years ago

Silverblue 34 still have this problem.

ostree://fedora:fedora/34/x86_64/silverblue Version: 34.20210423.n.0 (2021-04-23T08:10:25Z) BaseCommit: 7b99463136830fd9b18f8daf5f7973f3e15eaa3532f8dbcbb4f7eb9673170461 GPGSignature: Valid signature by 8C5BA6990BDB26E19F2A1A801161AE6945719A39 LayeredPackages: nc terminator wl-clipboard

For simple install, it might not timeout, but os-prober takes a lot of time to run.

zephyros-dev commented 2 months ago

It's an SSD. No other operations take a long time.

I've done a bit of digging. It doesn't appear to be ostree itself that's hanging, it appears to be ostree launching something to generate grub, and it, in turn, launching os-prober. os-prober then hangs indefinitely waiting for blkid. So, at least for me, this issue is invalid as far as ostree is concerned. I'll submit an issue on os-prober/blkid (blkid on its own also hangs).

As a workaround, removing /etc/grub.d/30_os-prober to disable os-prober makes finalize-staged take a few seconds and successfully complete.

Thank you for this pointer. I was having trouble with ostree-finalize-stage since it tries to scan ALL the disk with the os-prober (I have a few large HDD disks) which means that os-prober will always timeout. Removing the files stops the os-prober from running, and it seems that does not affect ostree from creating its boot entries in grub.