openedx / wg-devops

Issue repository for the DevOps Working Group
1 stars 1 forks source link

Add `docker prune` guidance to the troubleshooting guide #24

Open kdmccormick opened 1 year ago

kdmccormick commented 1 year ago

Context

After using Tutor for a while (especially, after using different versions of it and/or using it alongside Devstack), it is common for one's Docker build cache (on Linux, /var/lib/docker) to become very large. Like, tens of gigabytes. After a while, this can fill up the developer's disk, causing a variety of strange system-wide problems.

There is a very simple solution to this problem: pruning. In particular, it is good to run the following every month or so:

tutor ... start        # start Tutor so that the images you need are considered "in use"
docker system prune -a # deletes every container and image that isn't in use

This can free up dozens of gigabytes of disk space.

Acceptance Criteria

Add this information to the troubleshooting guide in the official docs: https://docs.tutor.overhang.io/troubleshooting.html

regisb commented 1 year ago

I suggest to clear only the oldest images:

docker image prune --all --force --filter until=72h
ARMBouhali commented 1 year ago

@regisb @kdmccormick it appears docker uses image creation time for the until filter rather than last used time; I think we need to employ an LRU cache eviction policy.

I found this tinteresting tool which does the LRU part, so I am adding it to the discussion https://github.com/stepchowfun/docuum

regisb commented 1 year ago

@ARMBouhali I followed your suggestion and started using docuum on my CI server a couple weeks ago. So far it's working great! Here's my docker-compose.yml:

services:
  docuum:
    image: docker.io/stephanmisc/docuum:latest
    init: true
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./data/docuum:/root/.local/share/docuum
    command: "--threshold=100GB"
    restart: unless-stopped

This would make a great contrib plugin ;)

ARMBouhali commented 1 year ago

Thanks, @regis. that's a very elegant solution! It never crossed my thought to use a docker container to handle docker's own problems.

With this discovery, I can qualify the issue as solved. But maybe there is more to it.

One of my struggles with docker is when the cache build accumulate, and there's no easy way to achieve selective pruning, and there is no clear feedback using docker image prune. Something like a tool (not necessarily a plugin) that can feed on tutor's build configuration might be the solution to achieve ideal selective pruning. That's more work to do and I'm not sure if it's worth it in the long run.