offen / docker-volume-backup

Backup Docker volumes locally or to any S3, WebDAV, Azure Blob Storage, Dropbox or SSH compatible storage
https://offen.github.io/docker-volume-backup/
Mozilla Public License 2.0
2.05k stars 86 forks source link

Consider using a dedicated cron package #268

Closed m90 closed 9 months ago

m90 commented 1 year ago

The limitations of BusyBox's cron implementation when it comes to defining schedules crop up again and again e.g. #267 #185 #117

There seems to be a standalone apk package for cron https://pkgs.alpinelinux.org/package/edge/main/x86/apk-cron which seems lightweight enough to install it in the image if:

Since this the core feature of this image, possible incompatibilities should be scrutinized closely before actually implementing.

m90 commented 1 year ago

That apk-cron package doesn't do what I thought it'd do, it seems to install a cronjob, not a cron implementation. In any case, maybe there is a way to get another cron that works in Alpine as well.

m90 commented 1 year ago

This https://github.com/dubiousjim/dcron seems to exist and is closer to what's needed. It can be added using apk add dcron.

m90 commented 1 year ago

Also https://github.com/aptible/supercronic, but this seems way to heavy.

m90 commented 1 year ago

FWIW these are all packages in the apk index containing the string cron https://pkgs.alpinelinux.org/packages?page=1&name=%2acron%2a&branch=edge

MaxJa4 commented 1 year ago

What about adding an additional command like ./backup schedule and use that as docker entrypoint? E.g. with: https://github.com/go-co-op/gocron Cron syntax can be used, we stay in the Go ecosystem without needing external tools, it can still block and wait for the cron job, it should be able to handle SIGINT calls so it stops properly (unlike the current cron daemon), complex scheduling can be implemented by ourselves in the future if we like, ...

m90 commented 1 year ago

What about adding an additional command like ./backup schedule and use that as docker entrypoint?

I'm not entirely sure how this would work when people use conf.d and we have multiple schedules. Would this then spawn multiple processes off backup schedule?

E.g. with: https://github.com/go-co-op/gocron

I couldn't find anything in the docs: do you know if this supports @daily, @weekly (and so on) syntax?

MaxJa4 commented 1 year ago

when people use conf.d and we have multiple schedules

Run the Cron() func once for each schedule so they run in one instance or spawn multiple processes of backup schedule. Having only one instance could avoid race conditions if schedules overlap (e.g. the daily schedule overlaps with the hourly one at midnight).

@daily, @weekly (and so on) syntax?

Dug a little in the source code. They use robfig/cron for parsing which indeed supports this (docs).

m90 commented 1 year ago

Run the Cron() func once for each schedule so they run in one instance or spawn multiple processes of backup schedule.

Trouble is each conf file will have it's own BACKUP_CRON_EXPRESSION so it won't be possible to munge them together (or at least easily).

Dug a little in the source code. They use robfig/cron for parsing which indeed supports this (docs).

Sounds like an option then. I'd still like to have a look if we can get a dedicated cron into the image as this would be the easiest and would keep surface area around here smaller. I wonder if it's possible to compile "Vixie" cron in Alpine and use it. Then again, there's probably a reason it does not exist on the apk index.

complex scheduling can be implemented by ourselves in the future if we like

Do you already have something in mind that could be done that's not possible with cron?

MaxJa4 commented 1 year ago

Trouble is each conf file will have it's own BACKUP_CRON_EXPRESSION so it won't be possible to munge them together (or at least easily).

Why wouldn't it be easy? Right now, entrypoint.sh just reads all config files and adds a cron job for each (afaik). With go-cron it would be the same, but in Go instead and using Cron() instead of crontab -l .... Haven't used that feature yet, maybe I'm missing something.

Sounds like an option then. I'd still like to have a look if we can get a dedicated cron into the image as this would be the easiest and would keep surface area around here smaller.

I understand, a simple drop-in replacement for the cron daemon would be the most efficient route.

Do you already have something in mind that could be done that's not possible with cron?

Nothing specific, just thought about #93 and the general idea of being able to add additional rules if they can't be done with plain cron syntax.

m90 commented 1 year ago

270 would probably be another good argument for a non-cron/roll your own scheduling solution

pixxon commented 9 months ago

The above mentioned robfig/cron package seems to be a good fit for this task. From what I've seen out of the box it would support the already supported syntax, but it also allows us to create our own parser if needed. Only issue is that the project seems to be a abandoned. However it is still imported by many other packages and has significant number of stars on Github. Maybe it is mature enough to be used for our cases.

Another issue is related to the manual call to the backup script. It would be possible to introduce a flag, --server to the application, when it is missing the behaviour would be as before, we just parse the env vars and execute a one time backup.

m90 commented 9 months ago

@pixxon If you would like to start working on this, please feel free and go ahead.

I think robfig/cron might be a good start, even if it's abandoned. Considering cron is an "almost standardized" interface, we might consider swapping halfway through still. It's only important we always support what the busybox cron supports.

Adding subcommands is probably needed then, but it should be pretty straightforward using the default Go flags package.

Let me know if you need any further input.

m90 commented 9 months ago

One more thing I just thought of is: Now that there soon shall be no need to run crond anymore, this also allows for the option of running the binary in the container as a non-root user. Users who don't interact with the Docker daemon definitely don't need it anymore and users that do use the Docker features might not need root anymore when they use rootless Docker outside of the container.

There are quite a few issues about this, e.g. #36 and #270

pixxon commented 9 months ago

Could also make the whole tool more lightweight by using minimal docker image:

FROM golang:1.21.6-alpine as builder

ARG TARGETOS
ARG TARGETARCH
ARG TARGETVARIANT=""

ENV CGO_ENABLED=0 \
    GOOS=${TARGETOS} \
    GOARCH=${TARGETARCH} \
    GOARM=${TARGETVARIANT}
WORKDIR /build
COPY . .
RUN go build -ldflags "-w -extldflags '-static' " -o backup /build/cmd/backup

FROM gcr.io/distroless/static:nonroot
COPY --from=builder --chown=nonroot:nonroot /build/backup /backup
ENTRYPOINT [ "/backup", "-foreground" ]
m90 commented 9 months ago

Could also make the whole tool more lightweight by using minimal docker image

Do you have numbers on how much smaller this will be? I have to admit I'm not a big fan of distroless images as they really suck when you need to debug things going wrong. In such case it's nice to have a proper userland as supplied by busybox.

pixxon commented 9 months ago

I used my local branch with the cron modifications, there the original one is 32.1MB while the distroless is 19.5MB. ( Seems like the binary size got a bit chunkier with those changes, since on the dockerhub the image is only 15MB. Or I am doing something wrong with the builds. )

m90 commented 9 months ago

Docker Hub shows gzipped sizes. The recent changes seem to have a negligible effect on image size only image

m90 commented 9 months ago

One thing that just occured to me is that using a distroless image would be a breaking change because:

Maybe it's better to stick to alpine for now.