utilize a config.yaml instead of args

sbrudenell commented 5 months ago

the cli args are already annoying to manage and I'm planning to add more. we should move to a config file.

yaml seems like a good choice since I expect the config to be maintained by hand.

I don't want to maintain parity between a config file and cli args, so this should just be a one-way move to config.

and I can picture some hierarchical structure, or global defaults with local overrides such as is used in btrbk.conf.

sbrudenell commented 3 months ago

proposed example config:

timezone: America/Los_Angeles # required
sources: 
- path: /source # required
  snapshots: /snapshots # required
  preserve: 1y 1m # required
remotes: 
- id: aws # required
  s3: # required for now
    bucket: my-backups # required
    endpoint:
      # optional, includes region_name etc
    transfer:
      # optional transfer config
pipe_through:
- [gzip]
- [gpg, -r, me@example.com]

by default all sources are backed up to all remotes

the config for remotes is separated, so you can do operations like restore on the remote just specifying its id

sbrudenell commented 3 months ago

I'm having a hard time landing on a decision about the config schema. I'll try to talk it out here. this will be tedious and irrelevant for others, but it might be helpful for me.

for the case of updating backups, there are a lot of possible contingencies for config (e.g. override pipe_through for a given source and remote).

so, the simplest implementation is to have configuration for each contingency. repeated elements could be dealt with via yaml anchors. something like:

sources:
- path: /a
  snapshots: /a/.snapshots
  remotes: &remotes
  - s3:
      bucket: my_bucket
      endpoint:
        endpoint_url: https://example.com
        profile_name: some_profile
    pipe_through:
    - pigz
    - gpg
  - s3:
      bucket: aws_bucket
    pipe_through:
    - pigz
    - gpg
- path: /b
  snapshots: /b/.snapshots
  remotes: *remotes

but, we should also be able to refer to an endpoint easily from the cli, for list and restore. how do we do that in the above scenario? we need to refer to the same config via multiple paths of contingency.

I'm sure this is a common kind of problem. but I tend to work by repeating best practices, and I learn these from either listening to others describe a rule or watching it be consistently executed. I don't know of any best practices here. one just writes config schema with intuition alone.

I need to pick some rules and go for it.

I'll start with this:

for each use case:
- define the minimum set of top level config relevant to that use case (for update, we only care about sources; we don't care about enumerating the full set of remotes)
- all the other config for that use case should be defined as (sub) properties
- any new top level config not previously defined must be pulled out of other config, and that config will refer to the new config by id

so in my example above, we've done these steps for the update use case. the config only defines the top level noun sources. next we repeat the process for the list case.

we get something like

sources:
- path: /a
  snapshots: /a/.snapshots
  remotes: &remotes
  - id: aws  # corresponds to top level section
    pipe_through:
    - pigz
    - gpg
  - id: other
    pipe_through:
    - pigz
    - gpg
- path: /b
  snapshots: /b/.snapshots
  remotes: *remotes

remotes:
- id: aws
  s3:
    bucket: aws_bucket
- id: other
  s3:
    bucket: my_bucket
    endpoint:
      endpoint_url: https://example.com
      profile_name: some_profile

sbrudenell / btrfs2s3

utilize a config.yaml instead of args #12