ostreedev / ostree

Operating system and container binary deployment and upgrades
https://ostreedev.github.io/ostree/
Other
1.3k stars 296 forks source link

filesystem space checks for /boot/ #1648

Open dustymabe opened 6 years ago

dustymabe commented 6 years ago

I hit a case today where I ran out of disk space on my /boot/ partition. This was mainly because I had pinned some deployments that I wanted to keep around, but I still ended up with a failure:

[dustymabe@dhcp137-98 logs]$ rpm-ostree upgrade
==== AUTHENTICATING FOR org.projectatomic.rpmostree1.upgrade ====
Authentication is required to update software
Authenticating as: Dusty Mabe (dustymabe)
Password:
==== AUTHENTICATION COMPLETE ====
6 delta parts, 4 loose fetched; 112481 KiB transferred in 31 seconds                                                                                                                                                                                            Checking out tree fbed0e2... done                                                                                         
Updating metadata for 'fedora': [=============] 100%
rpm-md repo 'fedora'; generated: 2018-04-25 04:27:32
Updating metadata for 'updates': [=============] 100%
rpm-md repo 'updates'; generated: 2018-06-25 10:46:00
Importing metadata [=============] 100%
Resolving dependencies... done
Will download: 7 packages (15.2 MB)
  Downloading from updates: [=============] 100%
Importing (7/7) [=============] 100%
Checking out packages (287/287) [=============] 100%
Running pre scripts... 0 done
Running post scripts... 7 done
Writing rpmdb... done
Writing OSTree commit... done
Copying /etc changes: 20 modified, 1 removed, 48 added
error: Installing kernel: regfile copy: No space left on device
[dustymabe@dhcp137-98 logs]$ sudo df -kh /boot/
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       283M  267M     0 100% /boot

Here is the status after the failure:

[dustymabe@dhcp137-98 logs]$ rpm-ostree status
State: idle; auto updates enabled (check; last run 15h ago)
Deployments:
● ostree://unifiedrepo:fedora/28/x86_64/atomic-host
                   Version: 28.20180613.0 (2018-06-13 13:52:10)
                BaseCommit: c51100f14cf12b25c16562cede7455191e536c0534e3b2ef87e66be9e12899ae
              GPGSignature: Valid signature by 128CF232A9371991C8A65695E08E7E629DB62FB1
           LayeredPackages: aria2 git git-annex mosh pciutils tig vim

  ostree://unifiedrepo:fedora/28/x86_64/updates/atomic-host
                   Version: 28.20180527.0 (2018-05-27 19:05:29)
                BaseCommit: 291ea90da29bc5abe757b5a50813b3de1396b08412939a89b3b671aba9856093
              GPGSignature: Valid signature by 128CF232A9371991C8A65695E08E7E629DB62FB1
           LayeredPackages: aria2 git git-annex mosh pciutils tig vim

  ostree://unifiedrepo:fedora/28/x86_64/atomic-host
                   Version: 28.20180515.1 (2018-05-15 16:32:35)
                BaseCommit: a29367c58417c28e2bd8306c1f438b934df79eba13706e078fe8564d9e0eb32b
              GPGSignature: Valid signature by 128CF232A9371991C8A65695E08E7E629DB62FB1
           LayeredPackages: aria2 git git-annex mosh pciutils tig vim weechat
                    Pinned: yes

  ostree://fedora-atomic-27:fedora/27/x86_64/atomic-host
                   Version: 27.122 (2018-04-18 23:34:24)
                BaseCommit: 931ebb3941fc49af706ac5a90ad3b5a493be4ae35e85721dabbfd966b1ecbf99
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4
           LayeredPackages: aria2 git git-annex mosh pciutils tig vim weechat
                    Pinned: yes

Available update:
           Diff: 7 upgraded
[dustymabe@dhcp137-98 logs]$ rpm -q ostree rpm-ostree
ostree-2018.5-1.fc28.x86_64
rpm-ostree-2018.5-1.fc28.x86_64

I unpinned a deployment and ran a rpm-ostree cleanup -r, so I'm unblocked. This is something to consider, though.

rfairley commented 6 years ago

Any update on this? I can look into this if it'd be handy to have. I imagine there would be some way to query the disk space available, then give an error if it is less than the size of the packages to download. @cgwalters what do you think of the complexity of this?

dustymabe commented 5 years ago

I can look into this if it'd be handy to have.

thanks robert. Funny enough, I actually hit this again today. @cgwalters I know we just discussed that labeling the difficulty of tasks is arbitrary, but I figure I'll ask, do you think this is something @rfairley could pick up with some guidance?

rfairley commented 5 years ago

Had a look into reproducing this - when pinning deployments with different BaseCommits i.e. after upgrading, I can see /boot go up to > 200MB used. I just need to figure out how to set the partition of the virtual machine to lower so that I can hit the max space used for /boot and reproduce the issue (right now when I spin up a F28AH Vagrant box the partition where /boot is mounted defaults to a size of 1GB).

rfairley commented 5 years ago

Reproduced this after:

  1. vagrant init fedora/27-atomic-host && vagrant up && vagrant ssh
  2. follow instructions to shrink the boot partition to 128M
  3. # ostree admin pin 0
  4. try to upgrade to F28AH: # ostree remote add --set=gpgkeypath=/etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-28-primary fedora-atomic-28 https://kojipkgs.fedoraproject.org/atomic/repo/ && rpm-ostree rebase fedora-atomic-28:fedora/28/x86_64/atomic-host

Output is:

# rpm-ostree rebase fedora-atomic-28:fedora/28/x86_64/atomic-host

2956 metadata, 15046 content objects fetched; 426106 KiB transferred in 295 seconds              
Copying /etc changes: 21 modified, 0 removed, 57 added
error: Installing kernel: regfile copy: No space left on device

Now will look into adding a check. Both cases run into the error in postprocessing scripts I think, will start from there:

rpm-ostree-2018.9/src/libpriv/rpmostree-postprocess.c:1422:        return glnx_throw_errno_prefix (error, "regfile copy");
rpm-ostree-2018.9/src/libpriv/rpmostree-postprocess.c:1779:        return glnx_throw_errno_prefix (error, "regfile copy");
libglnx/glnx-fdio.c:942:    return glnx_throw_errno_prefix (error, "regfile copy");
cgwalters commented 5 years ago

The rpm-ostree code isn't involved here - except when rpm-ostree initramfs --enable. This is a libostree issue.

Probably the best approach to fixing this would be to add the size of the kernel/initramfs as metadata on the commit object (along with the bootcsum).

rfairley commented 5 years ago

Ah, thanks! Makes sense, now looking at src/libostree/ostree-sysroot-deploy.c. Adding to the metadata of the ostree commit (and checking the size of /boot against the kernel/initramfs size in the metadata) sounds good.

Can later see if a check at rpm-ostree initramfs --enable can/needs to be done.

dustymabe commented 1 year ago

hmm. I wonder if https://github.com/ostreedev/ostree/pull/2847 means we can close this? or at least modify the scope.