Open gaetansnl opened 5 years ago
Thanks for the kind words!
I don't know of a way to do this right now. Most heavy users of Tilt use live_update (https://docs.tilt.dev/live_update_tutorial.html). It copies files and runs build commands in-place, so you don't run out of disk space copying around lots of short-lived images.
I'm sure a garbage collector would help others. But I'm not totally sure what it might look like. Maybe something that deletes all image tags with the "tilt-" prefix, then runs the registry garbage collector?
I use live_update but a new image is created when I restart tilt, or when I do changes in package.json for example. I know skaffold prune images but I'm not sure how they are pruning, because I still have a lot skaffold images.
I was thinking it can be nice to have, at any time, only one tag per docker_build()
. And don't remove it when tilt is closed. So for the next start: 1) the cache is still there 2) old tags are removed.
Removing by "tilt-" prefix will work locally but can maybe cause issues if multiple developpers are using the registry ?
To add to this, for me at the very least, I found that using live_update
wasn't worth it with Rust as the performance difference is negligible and it would require me to have a completely different build pipeline for development. Because of that, I always rebuild images on each change and have been hit with disk pressure quite a lot.
Hi all, sorry for the late follow-up, but do you happen to know what k8s cluster you were running on when you hit these errors? @dotellie @gaetansnl
@maiamcc No worries! I assume you're asking for minikube in my case? I have basically as good as default settings running on Linux with KVM if that helps.
👋 hey folks! Tilt now has a built-in docker prune
-er.
By default. the docker pruner runs once after startup (as soon as all of your pending builds are done) and once every hour thereafter, and prunes
This is out in the latest release, hopefully it'll help with @dotellie's issue -- tho @gaetansnl it seems like yours is potentially different and won't just be solved by docker images prune
, is that correct?
Hello. Thank you for this feature 🎉 , I'll try it as soon as I can. It will at least partially solve the problem, If I understand correctly, images inside the registry are not removed. But it will at least avoid disk issues localy
This becomes a bigger issue when you're using a local registry set up by ctlptl, because those images never get cleaned up :\
Not entirely sure if this is a recent development or not, but it seems that using a local registry is recommended/default for most setups (kind, k3d, minikube) — see examples in ctlptl README. And indeed, stuff in the local registry isn't being cleaned up: it isn't even possible to delete stuff from the local registry in its default configuration, and even if it was, a manual blob garbage collector invocation would be necessary afterwards. No wonder tilt doesn't attempt to clean up anything in the local registry.
On the other hand, tilt's FAQ claims that images are built in-cluster if possible. This certainly isn't the case with kind and a local registry, but the FAQ entry mentions Minikube as well. https://docs.tilt.dev/choosing_clusters.html#minikube says local registry is supported there, so I'm wondering if perhaps the FAQ entry is outdated? Are there any benefits of using a local registry instead of building in-cluster that I'm missing? Is this a speed vs space tradeoff?
Anyway, assuming deleting and blob GC in the registry itself is enabled (https://github.com/tilt-dev/ctlptl/issues/247), I think what tilt could do is this:
re: "I'm wondering if perhaps the FAQ entry is outdated" - the minikube team completely broke insecure registries in v1.26 - https://github.com/kubernetes/minikube/issues/14480 . This is very recent. I've been telling people to use v1.25 until it's fixed upstream.
Hey all, bumping this as it's been over a year since the last comment on this thread.
I'm using ctlptl to create a registry and cluster.
apiVersion: ctlptl.dev/v1alpha1
kind: Registry
name: neosync-dev-registry
---
apiVersion: ctlptl.dev/v1alpha1
kind: Cluster
product: kind
registry: neosync-dev-registry
name: kind-neosync-dev # must have kind- prefix
But roughly every few days or so (+-, depending on how much active development I'm doing) I have to nuke my cluster and remove all of the images in docker to free up space.
I run this command: docker rmi -f $(docker images -aq)
once everything is shut down to remove all images.
Then I go into docker desktop and reclaim space using the "Disk usage" extension to further clean up my drive.
Seems @liskin is dealing with the same issue that I'm having and I'm wondering if there is any updates on how to get this rectified? I love using tilt and have been using it for some time, but this is probably my biggest pain with it as of today.
If using podman, this will remove all images that have tags beginning with tilt-
:
podman rmi -f $(podman image ls --format '{{.ID}}' '*:tilt-*')
I just wanted to add how this issue affects our team. I've been monitoring my Docker Desktop (on mac) environment over time from "fresh" starts to try to figure out why disk space gets sucked up. We use Tilt with minikube and the docker registry (as mentioned in a previous comment) so Docker Desktop is running a container for the registry and one for tilt.
I've noticed that over time, even if I'm regularly doing various forms of prune:
docker system prune -a --volumes
docker volume prune -a
tilt docker-prune
That eventually all the disk allocated to docker desktop is consumed to the point where I can't rebuild a container on change without getting "no space" error. Note that the last two don't really do anything if I do the first one but I include it for completeness. My usual procedure once I start getting frequent "no space" issues despite doing a prune is to just restart it all: shut down tilt, stop its container, remove its container, stop and remove the local docker registry container and then run all the prunes. Then start up the whole thing again, let it rebuild our application images and then go back to whatever I was doing. Unsurprisingly this starts with plenty of free space (even after the rebuild of our application images).
I decided to not restart the entire setup today when I was getting to no space issues. Instead, I first did a normal prune which reclaimed a fair bit of space but not all the way up to how much space is available after a "fresh" start. Then I exec-ed into the tilt container and ran a bunch of du -sh *
commands on various directories to suss out where the tilt container was using up disk. Eventually I realized it was /var/lib/containerd/
and specifically /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs
.
So I ran a containerd prune (crictl rmi --prune
) from within the tilt container. This deleted a lot of stuff, but our application containers were still running fine, and it got the available disk space (as reported by Docker Desktop) back up to where it starts at after a "fresh" start of our Tilt setup. However, it also deleted a lot of stuff I wouldn't want it to – all the intermediate layers for the current "active" images, so the next rebuild of an application image missed cache. But at least I didn't have to restart everything or lose the images stored in the local docker registry (which is mostly the base images downloaded from external places). So I think tilt is keeping intermediate layers for image versions that are no longer being used and they start to add up over time. Many of the layers were many days old so I don't think it's just that our use of default docker_prune_settings was keeping around just all the layers for the 2 most recent builds & within 6 hours. I especially don't think it's just prune settings since prune over time (may days) running tilt reclaims less and less. Nonetheless, I am setting docker_prune_settings max_age_minutes so it only keeps around an hour worth vs 6 hours by default.
Running out of space has been an issue for our team as well.
We've settled on periodically deleting everything in the registry and pruning everything, so we just run something like the following:
tilt down
docker exec -it -u root ctlptl-registry sh -c 'rm -r /var/lib/registry/docker/registry/v2/repositories/*; registry garbage-collect /etc/docker/registry/config.yml'
docker restart ctlptl-registry
docker exec -it kind-control-plane crictl rmi --prune
docker system prune -af
tilt up
Depending on your setup, this may not be safe. Not sure if we're making some bad assumptions, but it seems like there's some copy of data in Docker, Kind, and the local registry, so this tries to nuke the images (and CAUTION, potentially more in Docker).
re: "I'm wondering if perhaps the FAQ entry is outdated" - the minikube team completely broke insecure registries in v1.26 - kubernetes/minikube#14480 . This is very recent. I've been telling people to use v1.25 until it's fixed upstream.
This seems like a misunderstanding, I wasn't concerned about insecure registries at all, but rather about building in cluster. If I understand it correctly, a local registry isn't needed at all if images are built in cluster, as they're immediately available for k8s to run and don't need to be copied all around (push from docker to registry, then pull from registry to k8s).
The https://docs.tilt.dev/faq.html#q-all-the-tilt-examples-store-the-image-at-gcrio-isnt-it-really-slow-to-push-images-up-to-googles-remote-repository-for-local-development FAQ entry implies that building in cluster is desirable (faster!) and is the norm, but https://docs.tilt.dev/choosing_clusters confusingly talks a lot about local registries, making it seem that somehow not having to push/pull over the internet and only copying several gigabytes of data locally is all we can hope for.
So yeah I suppose "outdated" wasn't worded well. Should have gone for "confusing".
Anyway, now that almost 2 years have somehow managed to pass by while I wasn't looking (and I mean that quite literally, life happened…), the entry happens to be outdated indeed:
ImageNeverPull
is not a thing any more – https://github.com/tilt-dev/tilt/pull/6277And yeah, now that I got back to this and dug a bit deeper to be able to answer, I realised that we (the company I work for where we happen to use tilt) don't configure stuff right, so despite using Colima, images aren't built in cluster, but instead get copied to and from a local registry, and therefore we run into this very problem discussed in this issue (https://github.com/tilt-dev/tilt/issues/2102#issuecomment-739984779). Perhaps for no good reason, because if we just used colima --kubernetes
, everything would be fine? (well, I've been on Linux the whole time, just running kind in a rootless docker, so it wouldn't be fine for me, but it could be fine for everyone else)
https://docs.tilt.dev/choosing_clusters could use an entry for Colima btw… And for Orbstack :-)
Seems @liskin is dealing with the same issue that I'm having and I'm wondering if there is any updates on how to get this rectified? I love using tilt and have been using it for some time, but this is probably my biggest pain with it as of today.
Apologies for the late reply, but there hasn't been much news so perhaps I might have something useful still.
First, as I realised halfway through writing the above reply to Nick, one thing is that we're all probably using Tilt wrong. Apparently it can be used without a registry, building images directly in the k8s cluster, which saves space and time and also possibly makes this whole garbage collection problem go away. So if one doesn't need a registry for some other reason (and I don't know what that reason might be but I'm somewhat certain we don't have one), then configuring things so images aren't pushed/pulled to/from a registry would be the best way to avoid this problem.
If you, however, do run a local registry (like I do, because I happen to use ctlptl with kind on Linux, mostly because of being completely clueless about k8s and wanting to keep it in an isolated box as much as possible, so letting it use my host's docker is certainly not an option), then… well then you could do all sorts of weird shenanigans like we resorted to doing because we had no idea we could have avoided it.
Like… (and I'm not making this up, we really do all this)
docker exec
into the kind container and crictl ps
to find all the images that are currently in use and crictl rmi
those that aren't (which k8s should do itself but maybe doesn't or doesn't enough or whatever I don't know)docker exec crictl ps …
so we know what needs to stay in the local registrylist-tags
and skopeo delete
to remove everything that shouldn't staydocker exec ctlptl-registry registry garbage-collect /etc/docker/registry/config.yml
because there's no API for actually removing the data from disk (of course, why would there be, docker registries are meant to run with infinite storage, why would you assume otherwise?)docker image ls
and docker rmi
because while tilt does try to garbage collect images it builds it fails to do it correctly (https://github.com/tilt-dev/tilt/issues/4596) so images pile up anywaySo yeah it's dumb and I'm maybe a bit bitter about it. Now let me go and find out how to switch to building in cluster so I can forget about all this. :-)
Hi @liskin, did you get anywhere with your quest? I'm in the same boat - kind's registry eating up disk space until there's none left, and the only guaranteed solution seems to be to nuke then recreate the cluster.
Not sure if it's me doing something wrong though; I would assume a system like Tilt would be better at managing its resources.
Hi @liskin, did you get anywhere with your quest? I'm in the same boat - kind's registry eating up disk space until there's none left, and the only guaranteed solution seems to be to nuke then recreate the cluster.
It's been on the back burner for a while but finally I have enough info to share:
It doesn't seem to be possible/easy to build images in-cluster with kind, but it's possible and well-supported with Minikube and OrbStack and perhaps a few others. And funnily enough, it seems to "just work" - you don't actually need to configure tilt in any way, it just automatically detects and builds images using the dockerd running inside of Minikube or OrbStack. Or perhaps it doesn't, if things aren't set up just right, and it doesn't really tell you anything… 😆
So, in reality, to actually make it work both I (trying Minikube) and my colleague (trying OrbStack) had to resort to reading the source code to figure out the correct alignment of stars for tilt's autodetection to work. But we did, and it just works. Anyway, here's the relevant bits:
https://github.com/tilt-dev/tilt/blob/37be1ded69d09e97791335ef6cf2bacd2bbf1ebb/internal/docker/env.go#L184-L284 https://github.com/tilt-dev/tilt/blob/37be1ded69d09e97791335ef6cf2bacd2bbf1ebb/internal/docker/env.go#L346-L421
We haven't tried to make this work using Colima (which needs to be told to run k8s) but it appears to be supported as well.
Thank you for you the update @liskin !
I suppose I either misread the docs, or they're a bit confusing about the role of the registry; somehow I got the (wrong) impression from the docs that --registry=ctlptl-registry
option is effectively required if one wants live updates -- while it really isn't required, at least not for Minikube.
Funnily enough, I only tried launching a cluster without that option when I exhausted all ideas on how to make Minikube cluster with Docker container runtime (which I needed because I wanted to use NVIDIA GPU, which requires nvidia-container-toolkit that only works with Docker runtime) work with a ctlptl registry.
Oh well, live and learn. Thanks again!
I suppose I either misread the docs, or they're a bit confusing about the role of the registry
I'd tend to agree with the latter. They're also not up to date—there's no mention of either Colima or OrbStack (they only say "If you’re using Docker for Desktop, there’s no registry at all. You build directly into the container runtime."), and the section about Minikube talks about local registry instead of saying "Built images are immediately available in-cluster. No pushing and pulling from image registries." as it should.
I guess it's a bit more complicated with Minikube because you can configure a different container runtime than Docker and then you can't build in cluster, but the default configuration is Docker driver and Docker runtime, and stuff just works without a registry then.
but the default configuration is Docker driver and Docker runtime
For me, the default for Minikube is somehow Docker driver but Containerd runtime. I haven't tried this default configuration without registry because I wanted Docker runtime due to NVIDIA CTK support.
I agree with your point that the docs are slightly out of date and don't cover all possible permutations. Can't blame them though, there are simply too many.
fwiw, that guide originally recommended people should use the embedded builder in the container runtime when using minikube.
The result was that people screwed up their minikube configs a lot and tilt-team was spending a ton of time helping folks debug their busted minikube configs. (There are a lot of minikube flags that are incompatible with an embedded builder.)
I expect there will always be a place for using a separate registry; it's a more portable architecture in general.
(Tho we should keep this issue focused on pruning images from the registry, rather than bushwhacking a tangent about alternative clusters )
Hello, First I wanted to thank you for your work on tilt. I tried multiple tools before, but most of them don't support custom scripts and such advanced sync capabilities... I can now rebuild dependencies on each server without a full server rebuild 😄
I build images locally, and push them to a registry on another server. And I noticed that they are not removed. It seems they are also kept locally. So I'm afraid to explode the server storage capacity even if most of time the image use previous images. I just got "Kubelet has disk pressure"
It here a suggested approach to remove some of them ? Or should I write a script manually ?
Thank you