Alert operator on duplicate jobs, schedule, rentention and targets

axelfauvel commented 7 years ago

Hi guys,

We've seen that these parameters seem to be unique among multiple shield agents but can be easily overriden. Let me clarify :)

I have 2 shield agents on 2 separate deployments that communicate to a shield daemon. We'll call them node_1 and node_2. Shield agent is configured this way :

node_1

properties:
  shield:
    agent:
      autoprovision: https://xxxx/
    provisioning_key: xxxxx
    targets:
      node1-data-dir:
        plugin: fs
        config:
          base_dir: /var/vcap/data
    schedules:
      daily: daily 3am
    retention-policies:
      weekly: 7d
    jobs:
      backup_data_dir:
        schedule:  daily
        retention: weekly
        target:    node1-data-dir
        store:     shield-local-storage

node_2

properties:
  shield:
    agent:
      autoprovision: https://xxxx/
    provisioning_key: xxxxx
    targets:
      node2-data-dir:
        plugin: fs
        config:
          base_dir: /var/vcap/data
    schedules:
      daily: daily 4am
    retention-policies:
      weekly: 7d
    jobs:
      backup_data_dir:
        schedule:  daily
        retention: weekly
        target:    node2-data-dir
        store:     shield-local-storage

When I deploy both nodes, it results getting only 1 job and 1 schedule with deployment success. I understand that each keys is meant to be unique, it's a proper behaviour. Problem is that one shield agent can override another one's configuration, it can result in canceled backups and data loss.

What do you guys think ?

best

/CC @nvekemans-ext-orange

geofffranks commented 7 years ago

After a lengthy discussion in #shield, there isn't an easy way to determine when running the post-start if the deployment should a) bail out because the resource's name already existed from a different deployment, or b) update the resource with any changes that were supposed to be applied.

However, to make it easier to port configurations across deployments/environments, and less likely to have non-unique resource names, we can provide an optional templating mechanism for resource names, that we render in the post-start script prior to updating the resources.. Something like

properties:
  shield:
    targets:
      ${deployment}-${job}-mysql: {stuff}

Things we'll want to template off of - deployment name (spec.deployment), job name (spec.name), job id (spec.id)

(spec.id can be relied upon to not change upon VM recreation, unless the VM is forced to relocate across AZs)

alexanelli commented 7 years ago

@axelfauvel just checking in to see if you've looked into this recently, thanks 👍

jhunt commented 7 years ago

If @geofffranks suggestion above is acceptable, we can re-purpose this ticket to implement the templating language.

axelfauvel commented 7 years ago

Hi all,

I'm ok with @geofffranks idea

drnic commented 7 years ago

@axelfauvel the shield CLI now supports a --update-if-exists on most create-* commands; is this helpful for this problem? (from my interpretation of the title; I might not have understood/followed the description as successfully)

jhunt commented 6 years ago

SHIELD v8 features an idempotent import errand, that uses buckler import to import stuff. It handles this use case quite well.

shieldproject / shield-boshrelease

Alert operator on duplicate jobs, schedule, rentention and targets #73

node_1

node_2