Using scheduling system to run a highstate in batch mode

saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:

https://repo.saltproject.io/

Apache License 2.0

14.15k stars 5.48k forks source link

Using scheduling system to run a highstate in batch mode #30214

Open sebastian-cb opened 8 years ago

sebastian-cb commented 8 years ago

I would like to schedule a highstate to run every night but not in parallel. Is there a way to add batch option to the scheduling system?

For example:

schedule:
  highstate:
    enabled: True
    function: state.highstate
    maxrunning: 1
    when: 3:00am
    kwargs:
      batch: 1

Ideally I would like to also randomize the time when it runs. I guess I could schedule it to run every two hours within one hour time range so it will run only once:

schedule:
  highstate:
    enabled: True
    function: state.highstate
    maxrunning: 1
    range:
      start: 3:00am
      end: 4:00am
    hours: 2
    kwargs:
      batch: 1

jfindlay commented 8 years ago

@sebastian-cb, I'm not sure what you're asking with your first question. There is a maxrunning scheduler parameter, which defaults to 1, meaning that only one copy of a job can run concurrently. For your second question, there is a splay parameter which randomizes the start time to somewhere within the range from the scheduled time and the splay range. I'm not sure if this can be found on the documentation site, but there is scheduler documentation in the reference doc of that module.

sebastian-cb commented 8 years ago

@jfindlay, I just double checked this and it looks like a maxrunning scheduler parameter is to ensure that there are no more than N copies of a particular routine running however I'm asking about batch size so I could schedule a highstate to be executed on only a specified number of minions at a time. I could do that with a system cronjob but I was wondering if this can be done with SaltStack scheduling system using key-word arguments.

With regards to my second question, I think splay only works with time intervals (minutes, hours etc.) but my requirement is to run only once between specified time range (3am-4am, day 1: 3:15am, day 2: 3:25am etc.).

jfindlay commented 8 years ago

@sebastian-cb, that makes more sense to me. I am not sure what the answers are. @garethgreenaway, do you have any comments on @sebastian-cb's questions?

garethgreenaway commented 8 years ago

@sebastian-cb I'm not clear with your concern about the high states running in parallel.

jfindlay commented 8 years ago

I can see a concern with scalability on the master, similar to how a manually triggered highstate can be batched.

sebastian-cb commented 8 years ago

@garethgreenaway I've just noticed a high CPU usage caused by pkg.installed (repoquery) so I'm concern that if this will run on multiply VMs on the same host it may cause a high load.

garethgreenaway commented 8 years ago

@sebastian-cb Ahh that's a fair concern. A combination of seconds, range and splay should work. Have the job run every 3600 seconds (one hour), use a range of 3am - 4am so it only runs in that time span. It will try to run at 3am, but the range will prevent it from running again at 4am. Then use splay to offset the run time across minions.

sebastian-cb commented 8 years ago

@garethgreenaway Ahh, so splay actually randomize execution time on minions and not job/state invocation time on the master?

So there is no way to use batch size with scheduling system to make sure highstate is not overlapping on minions?

sebastian-cb commented 8 years ago

@garethgreenaway so I tested your recommended solution and this doesn't work as expected:

State:

test:
  cmd.run:
    - name: date >> /tmp/test.txt

Pillar:

schedule:
  highstate_test:
    enabled: True
    function: state.highstate
    hours: 1
    range:
      start: 2:00am
      end: 3:00am
    maxrunning: 1
    splay:
      start: 300
      end: 1800

Although splay was added command was executed at the same time:

# cat /tmp/test.txt 
Thu Jan 14 02:00:06 GMT 2016
Fri Jan 15 02:00:05 GMT 2016

2016-01-14 02:00:00,780 [salt.utils.schedule                      ][DEBUG   ][2199] schedule.handle_func: Adding splay of 1559 seconds to next run.
2016-01-14 02:00:00,781 [salt.utils.schedule                      ][INFO    ][2199] Running scheduled job: highstate_test
...
2016-01-15 02:00:00,380 [salt.utils.schedule                      ][DEBUG   ][2199] schedule.handle_func: Adding splay of 996 seconds to next run.
2016-01-15 02:00:00,381 [salt.utils.schedule                      ][INFO    ][2199] Running scheduled job: highstate_test

sebastian-cb commented 8 years ago

@jfindlay can we actually convert this question to a feature request? I'm finding more situations where batch-size in scheduling system would be very useful, for instance any operations on HA clusters where only one service/node can be taken down at a time.

jfindlay commented 8 years ago

Sure, batch seems like a good feature to have.

rickh563 commented 8 years ago

ZD-964

marcocaberletti commented 6 years ago

As a (nasty) workaround, I use Unix cron syntax to schedule highstate every hours but on different minute on each host. The minute is computed using IP address of the machine: I take the IP address from grains, stripe off the subnet, convert to integer and then calculate % 60.

Pillar

{%- set ipaddr = grains['ip4_interfaces']['eth0'] -%}
schedule:
  highstate:
    enabled: True
    function: state.highstate
    cron: '{{ ipaddr|first|replace("172.31.","")|replace(".","")|int % 60 }} * * * *'
    maxrunning: 1

Result:

$ salt '*' pillar.get schedule:highstate:cron
xxxx:
    18 * * * *
xxxx:
    14 * * * *
xxxx:
    55 * * * *
xxxx:
    10 * * * *
xxxx:
    43 * * * *
...etc...

It's not an elegant solution, but works.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

stale[bot] commented 5 years ago

Thank you for updating this issue. It is no longer marked as stale.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

sagetherage commented 4 years ago

@garethgreenaway is this something we want to look at for future releases?

stale[bot] commented 4 years ago

Thank you for updating this issue. It is no longer marked as stale.

sagetherage commented 4 years ago

@doesitblend the ZD ticket is archived so I removed the label, feel free to correct if I am wrong, here.