synpse-hq / synpse

Synpse is an all-in-one solution to manage your servers and IoT devices providing declarative app deployment, SSH access and TCP tunnels
https://synpse.com
Apache License 2.0
20 stars 3 forks source link

AGENT_IMAGE_GC_AGE not enforced. #32

Open hrfuller opened 1 year ago

hrfuller commented 1 year ago

Based on some conversations on discord I have setup the following environment variables on the synpse-agent service on a host.

Environment=AGENT_IMAGE_GC_AGE="48h"
Environment=AGENT_IMAGE_GC_FORCE="true"

But I see images that are much older than 48 hours on the host. This is a bit of a pain point because as we deploy new images we have to manually prune the docker system on our hosts. Is there something obvious I'm missing about how to setup the image garbage collection?

The agent version of the host is 0.21.18 The docker version info is:

Client:
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.2
 Git commit:        20.10.12-0ubuntu2~20.04.1
 Built:             Wed Apr  6 02:16:12 2022
 OS/Arch:           linux/arm64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.2
  Git commit:       20.10.12-0ubuntu2~20.04.1
  Built:            Thu Feb 10 15:03:35 2022
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.5.9-0ubuntu1~20.04.4
  GitCommit:
 nvidia:
  Version:          1.1.0-0ubuntu1~20.04.1
  GitCommit:        629a689
 docker-init:
  Version:          0.19.0
  GitCommit:

cc @oezdemir @emersonknapp

mjudeikis commented 1 year ago

Hey @hrfuller. I was under the impression we fixed this. Can you please provide from the node you running this below:

uname -a
uptime
docker inspect <image_no_purged>

Are your devices being restarted in that 48h window or they are powered on all the time?

hrfuller commented 1 year ago

I was under the impression we fixed this.

The fix does seem to work on test hosts that remain on during the 48h window. But most of our hosts are edge devices that are powered on and off frequently. It seems like that would explain the lack of enforcement.

Are your devices being restarted in that 48h window?

Yes they are. Is there anyway you can use the image age information from docker to do the GC?

mjudeikis commented 1 year ago

So this is bit different usecase. I had an idea how we can try to mitigate this. for now Docker when image is created does not have any timestamp. Only metadata is "image inception date" but not when it was created on the host.

We could inject metadata into labels and try pruning based on that.

Bare with me few days until I can try this and ship something for testing.

hrfuller commented 1 year ago

Thanks! It seems like the docker daemon knows how old the images are based on something when you run docker images but I suspect that is the inception date you're talking about. Any solution would be very welcome.

mjudeikis commented 1 year ago

Yes, you should see dates which are non-realistic. Just a question, does setting something like

Environment=AGENT_IMAGE_GC_AGE="2h"
Environment=AGENT_IMAGE_GC_FORCE="true"

Where it would purge unused images each 2 hours does not work?

hrfuller commented 1 year ago

Where it would purge unused images each 2 hours does not work?

I believe it does work but I will try it out.

mjudeikis commented 1 year ago

Let me know. I suspect code might be very tricky, so solving with something like this would be easier

hrfuller commented 10 months ago

Following up @mjudeikis . Tried using this

ExecStart=/usr/local/bin/synpse-agent run
Environment=AGENT_IMAGE_GC_AGE="30m"
Environment=AGENT_IMAGE_GC_FORCE="true"

Doesn't seem to work. The machine definitely stays on longer than 30 minutes at a time. Any ideas what to try next?