saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Install Salt from the Salt package repositories here:
https://docs.saltproject.io/salt/install-guide/en/latest/
Apache License 2.0
14.21k stars 5.48k forks source link

[BUG] Schedule jobs deleted or cleaned up on multimaster failover #66378

Open amalaguti opened 7 months ago

amalaguti commented 7 months ago

Description When multimaster set to failover, and the minion failovers to the second master, the tasks in the Salt Scheduler (conf\minion.d_schedule.conf) are gon, cleaned up. So the minion does not run the expected scheduled tasks when connected to the failover master

Here a sequence on a minion configured in multimaster failover, two masters .10 and .11. A state is executed to add the task to the scheduler, confirmed its presence in the schedule file, then the minion failover to the other master and the task is gone

PS C:\Users\adrian> salt-call status.master master=172.21.0.10
local:
    True
PS C:\Users\adrian> salt-call status.master master=172.21.0.11
local:
    False
PS C:\Users\adrian> Get-Content 'C:\ProgramData\Salt Project\Salt\conf\minion.d\_schedule.conf'
schedule:
  __master_alive_172.21.0.10:
    enabled: true
    function: status.master
    jid_include: true
    kwargs: {connected: true, master: 172.21.0.10}
    maxrunning: 1
    return_job: false
    seconds: 30
  __mine_interval: {enabled: true, function: mine.update, jid_include: true, maxrunning: 2,
    minutes: 60, return_job: false, run_on_start: true}

PS C:\Users\adrian> salt-call state.sls utils/set_salt_schedule_reboot -l quiet
local:
----------
          ID: schedule_new_task
    Function: schedule.present
      Result: True
     Comment: Adding new job schedule_new_task to schedule
     Started: 12:45:42.712122
    Duration: 62.501 ms
     Changes:
              ----------
              schedule_new_task:
                  added

Summary for local
------------
Succeeded: 1 (changed=1)
Failed:    0
------------
Total states run:     1
Total run time:  62.501 ms

PS C:\Users\adrian> salt-call schedule.list
local:
    schedule:
      schedule_new_task:
        args:
        - utils.reboot_system_module
        enabled: true
        function: state.sls
        kwargs:
          queue: true
          saltenv: base
        maxrunning: 1
        name: schedule_new_task
        return_job: false
        saved: true
        splay: 10
        when:
        - '2024-04-16 12:50:42'

PS C:\Users\adrian> Get-Content 'C:\ProgramData\Salt Project\Salt\conf\minion.d\_schedule.conf'
schedule:
  __master_alive_172.21.0.10:
    enabled: true
    function: status.master
    jid_include: true
    kwargs: {connected: true, master: 172.21.0.10}
    maxrunning: 1
    name: __master_alive_172.21.0.10
    return_job: false
    run: true
    seconds: 30
    splay: null
  __mine_interval: {enabled: true, function: mine.update, jid_include: true, maxrunning: 2,
    minutes: 60, name: __mine_interval, return_job: false, run: true, run_on_start: true,
    splay: null}
  schedule_new_task:
    args: [utils.reboot_system_module]
    enabled: true
    function: state.sls
    kwargs:
      queue: true
      saltenv: base
    maxrunning: 1
    name: schedule_new_task
    return_job: false
    splay: 10
    when: ['2024-04-16 12:50:42']

# Master .10 is disconnected and the minion did the failover to the .11 master
PS C:\Users\adrian> salt-call status.master master=172.21.0.10
12:48:11,637 [salt.minion                                                              :187 ][WARNING ][1984] Master ip address changed from 172.21.0.10 to 172.21.0.11
local:
    False
PS C:\Users\adrian> salt-call status.master master=172.21.0.11
12:48:24,072 [salt.minion                                                              :187 ][WARNING ][4280] Master ip address changed from 172.21.0.10 to 172.21.0.11
local:
    True

# The schedule is empty
PS C:\Users\adrian> Get-Content 'C:\ProgramData\Salt Project\Salt\conf\minion.d\_schedule.conf'
schedule:
  __master_alive_172.21.0.11:
    function: status.master
    jid_include: true
    kwargs: {connected: true, master: 172.21.0.11}
    maxrunning: 1
    return_job: false
    seconds: 30
  __mine_interval: {enabled: true, function: mine.update, jid_include: true, maxrunning: 2,
    minutes: 60, return_job: false, run_on_start: true}

PS C:\Users\adrian> salt-call schedule.list
12:48:51,702 [salt.minion                                                              :187 ][WARNING ][4740] Master ip address changed from 172.21.0.10 to 172.21.0.11
local:
    schedule: {}

Setup 3006.1 Minion set in multimaster failover master:

Steps to Reproduce the behavior Described in the issue description

Expected behavior I would expect the schedule jobs to be retained

Screenshots If applicable, add screenshots to help explain your problem.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) ```yaml PASTE HERE ```

Additional context Add any other context about the problem here.

dwoz commented 6 months ago

@amalaguti Please provide us a versions report.