harvester / harvester

Open source hyperconverged infrastructure (HCI) software
https://harvesterhci.io/
Apache License 2.0
3.86k stars 325 forks source link

[DOC/FEATURE] Clean unused images after upgrade #2132

Closed bk201 closed 1 year ago

bk201 commented 2 years ago

Is your feature request related to a problem? Please describe.

After a user upgrades Harvester to a newer version, many old images will not be used again. We can provide a method to remove those images to free some space.

Describe the solution you'd like

Describe alternatives you've considered

Do nothing and let kubelet remove unused images when disk pressure is high.

Additional context

Martin-Weiss commented 2 years ago

This also needs to be related to the documentation stating ">25 GB free" required in /usr/local and in relation to the minimum and recommended disk size for Harvester in general..

starbops commented 1 year ago

For clusters upgraded from v1.1.0/v1.1.1 to v1.1.2, here's a simple script to help list the unneeded images to the stdout. Please place and run the script with root privilege on each node directly.

$ cat <<"EOF" > images-to-cleanup.sh
#!/usr/bin/env sh
set -e

prev_ver=$1
cur_ver=$2

tmp_dir=$(mktemp -d)
trap cleanup EXIT

cleanup() {
    rm -rf $tmp_dir
}

mkdir $tmp_dir/$prev_ver
mkdir $tmp_dir/$cur_ver

curl -sfL https://releases.rancher.com/harvester/$prev_ver/image-lists.tar.gz -o $tmp_dir/$prev_ver/image-lists.tar.gz
curl -sfL https://releases.rancher.com/harvester/$cur_ver/image-lists.tar.gz -o $tmp_dir/$cur_ver/image-lists.tar.gz
tar -xf $tmp_dir/$prev_ver/image-lists.tar.gz -C $tmp_dir/$prev_ver/
tar -xf $tmp_dir/$cur_ver/image-lists.tar.gz -C $tmp_dir/$cur_ver/

prev_image_list=$tmp_dir/prev_image_list.txt
cur_image_list=$tmp_dir/cur_image_list.txt

cat $tmp_dir/$prev_ver/image-lists/*.txt | sort | uniq > $prev_image_list
cat $tmp_dir/$cur_ver/image-lists/*.txt | sort | uniq > $cur_image_list

comm -23 $prev_image_list $cur_image_list
EOF

For example, if you have successfully upgraded from v1.1.1 to v1.1.2, you may want to remove the images only required for v1.1.1 to save disk space. This shows the images which could be removed:

$ sh images-to-cleanup.sh v1.1.1 v1.1.2
docker.io/rancher/fleet-agent:v0.4.0
docker.io/rancher/fleet:v0.4.0
docker.io/rancher/hardened-calico:v3.24.1-build20221011
docker.io/rancher/hardened-flannel:v0.19.1-build20221011
docker.io/rancher/hardened-k8s-metrics-server:v0.6.1-build20221011
docker.io/rancher/hardened-kubernetes:v1.24.7-rke2r1-build20221013
docker.io/rancher/hardened-multus-cni:v3.8-build20221011
docker.io/rancher/harvester-cluster-repo:v1.1.1
docker.io/rancher/harvester-load-balancer:v0.1.2
docker.io/rancher/harvester-network-controller:v0.3.1
docker.io/rancher/harvester-network-helper:v0.3.1
docker.io/rancher/harvester-network-webhook:v0.3.1
docker.io/rancher/harvester-node-disk-manager:v0.4.8
docker.io/rancher/harvester-node-manager:v0.1.3
docker.io/rancher/harvester-pcidevices:v0.2.3
docker.io/rancher/harvester-upgrade:v1.1.1
docker.io/rancher/harvester:v1.1.1
docker.io/rancher/harvester-vm-import-controller:v0.1.2
docker.io/rancher/harvester-webhook:v1.1.1
docker.io/rancher/klipper-helm:v0.7.3-build20220613
docker.io/rancher/klipper-lb:v0.3.5
docker.io/rancher/nginx-ingress-controller:nginx-1.2.1-hardened9
docker.io/rancher/rancher:v2.6.9
docker.io/rancher/rancher-webhook:v0.2.7
docker.io/rancher/rke2-cloud-provider:v1.25.3-build20221017
docker.io/rancher/rke2-runtime:v1.24.7-rke2r1
docker.io/rancher/shell:v0.1.18
docker.io/rancher/support-bundle-kit:v0.0.12
docker.io/rancher/system-agent-installer-rancher:v2.6.9
docker.io/rancher/system-agent-installer-rke2:v1.24.7-rke2r1
ghcr.io/k8snetworkplumbingwg/whereabouts:v0.5.4-amd64
ghcr.io/kube-vip/kube-vip:v0.4.4
registry.suse.com/bci/bci-base:15.3
registry.suse.com/suse/sles/15.4/libguestfs-tools:0.54.0-150400.3.7.1
registry.suse.com/suse/sles/15.4/virt-api:0.54.0-150400.3.7.1
registry.suse.com/suse/sles/15.4/virt-controller:0.54.0-150400.3.7.1
registry.suse.com/suse/sles/15.4/virt-handler:0.54.0-150400.3.7.1
registry.suse.com/suse/sles/15.4/virt-launcher:0.54.0-150400.3.7.1
registry.suse.com/suse/sles/15.4/virt-operator:0.54.0-150400.3.7.1

Usually, it's safe to run it with the following directly to remove unused images:

$ crictl rmi $(sh images-to-cleanup.sh v1.1.1 v1.1.2)
ERRO[0000] no such image docker.io/rancher/harvester-cluster-repo:v1.1.1
Deleted: docker.io/rancher/klipper-lb:v0.3.5
Deleted: docker.io/rancher/rke2-cloud-provider:v1.25.3-build20221017
Deleted: docker.io/rancher/system-agent-installer-rke2:v1.24.7-rke2r1
Deleted: ghcr.io/k8snetworkplumbingwg/whereabouts:v0.5.4-amd64
Deleted: registry.suse.com/suse/sles/15.4/virt-handler:0.54.0-150400.3.7.1
Deleted: docker.io/rancher/fleet:v0.4.0
Deleted: docker.io/rancher/hardened-multus-cni:v3.8-build20221011
Deleted: docker.io/rancher/harvester-webhook:v1.1.1
Deleted: ghcr.io/kube-vip/kube-vip:v0.4.4
Deleted: registry.suse.com/bci/bci-base:15.3
Deleted: registry.suse.com/suse/sles/15.4/virt-api:0.54.0-150400.3.7.1
Deleted: docker.io/rancher/klipper-helm:v0.7.3-build20220613
Deleted: docker.io/rancher/rancher-webhook:v0.2.7
Deleted: docker.io/rancher/harvester-network-helper:v0.3.1
Deleted: docker.io/rancher/harvester-pcidevices:v0.2.3
Deleted: docker.io/rancher/harvester-vm-import-controller:v0.1.2
Deleted: docker.io/rancher/harvester:v1.1.1
Deleted: docker.io/rancher/shell:v0.1.18
Deleted: registry.suse.com/suse/sles/15.4/libguestfs-tools:0.54.0-150400.3.7.1
Deleted: docker.io/rancher/hardened-kubernetes:v1.24.7-rke2r1-build20221013
Deleted: docker.io/rancher/harvester-load-balancer:v0.1.2
Deleted: docker.io/rancher/harvester-network-controller:v0.3.1
Deleted: docker.io/rancher/harvester-network-webhook:v0.3.1
Deleted: docker.io/rancher/rancher:v2.6.9
Deleted: registry.suse.com/suse/sles/15.4/virt-controller:0.54.0-150400.3.7.1
Deleted: docker.io/rancher/hardened-flannel:v0.19.1-build20221011
Deleted: docker.io/rancher/harvester-node-disk-manager:v0.4.8
Deleted: docker.io/rancher/harvester-upgrade:v1.1.1
Deleted: docker.io/rancher/support-bundle-kit:v0.0.12
Deleted: registry.suse.com/suse/sles/15.4/virt-operator:0.54.0-150400.3.7.1
Deleted: docker.io/rancher/hardened-calico:v3.24.1-build20221011
Deleted: docker.io/rancher/rke2-runtime:v1.24.7-rke2r1
Deleted: docker.io/rancher/system-agent-installer-rancher:v2.6.9
Deleted: registry.suse.com/suse/sles/15.4/virt-launcher:0.54.0-150400.3.7.1
Deleted: docker.io/rancher/fleet-agent:v0.4.0
Deleted: docker.io/rancher/hardened-k8s-metrics-server:v0.6.1-build20221011
Deleted: docker.io/rancher/harvester-node-manager:v0.1.3
Deleted: docker.io/rancher/nginx-ingress-controller:nginx-1.2.1-hardened9
FATA[0007] unable to remove the image(s)

If there are errors and the command complains that the image could not be found, it's likely the image garbage collection mechanism has already purged the images.

P.S. If the cluster is upgraded from a version before v1.1.0, e.g., v1.0.3, we don't have the image list distributed on the Internet, the above script will fail. Please download the corresponding ISO image and mount it on a path to extract the image list manually.

bk201 commented 1 year ago

@starbops Please help create the script in https://github.com/harvester/upgrade-helpers, and it would be nice to prompt a question before cleaning the images.

harvesterhci-io-github-bot commented 1 year ago

Pre Ready-For-Testing Checklist

starbops commented 1 year ago

As we have the cleanup script and the instructions for using it documented, we're closing out this issue.

For auto-cleanup, it's further tracked in #4425.