saltstack-formulas / salt-formula

Yes, Salt can Salt itself!
http://docs.saltstack.com/en/latest/topics/development/conventions/formulas.html
Other
197 stars 423 forks source link

fix(service): service salt-master and salt-minion to restart last #482

Closed remichristiaan closed 3 years ago

remichristiaan commented 3 years ago

When running a high-state on the salt-master to deploy itself, the run fails with an Authentication error occurred because the master restarts half way though.

PR progress checklist (to be filled in by reviewers)


What type of PR is this?

Primary type

Secondary type

Does this PR introduce a BREAKING CHANGE?

No.

Related issues and/or pull requests

Describe the changes you're proposing

Pillar / config required to test the proposed changes

Debug log showing how the proposed changes work

Documentation checklist

Testing checklist

Additional context

pull-assistant[bot] commented 3 years ago
Score: 1.00

Best reviewed: commit by commit


Optimal code review plan

     fix(service): service salt-master and salt-minion to restart last

Powered by Pull Assistant. Last update d71cf0c ... d71cf0c. Read the comment docs.

baby-gnu commented 3 years ago

Hello @remichristiaan.

Do you have debug outputs of the problems and debug outputs with the fixes?

I'm wondering if the tests could trigger a master/minion restart to check this…

n-rodriguez commented 3 years ago

I'm wondering if the tests could trigger a master/minion restart to check this…

I think you could by using Inspec remote command module (https://docs.chef.io/inspec/resources/command/) ... but it would be very tricky...

baby-gnu commented 3 years ago

I think you could by using Inspec remote command module ... but it would be very tricky...

I was more thinking about adding some pillars for the formula to do a restart of the service but the configuration files are copied using a file.recurse so instead the kitchen should create/copy some configuration (master.d / minion.d) in the file_roots for the formula to process the service restart.

remichristiaan commented 3 years ago

I do not have debug outputs unfortunately ... I ran into this problem on the go and tried a fix that seems to help in 8 out of 10 runs which probably gets caused by a requisite declaration that overrides the order option. So this needs a bit more digging, but then again it might be an edge case that I ran into.

Similar problem with the openvpn-formula ... if the vpn restarts before the states could all finish, it stops in mid-way.

myii commented 3 years ago

Thanks for this PR, @remichristiaan. We discussed it briefly in the Formulas' working group meeting yesterday and @baby-gnu mentioned that he will check what can be done with Kitchen first.

baby-gnu commented 3 years ago

Sorry, I could only do manual testing since I don't find a way to run the formula twice in a kitchen run.

  1. run ./bin/kitchen converge v3001-py3-debian-10-3001-py3
  2. modify kitchen.yaml

    --- kitchen.yml.orig 2020-07-31 12:12:45.572817927 +0200
    +++ kitchen.yml  2020-07-31 12:14:45.640441305 +0200
    @@ -148,6 +148,9 @@
      salt_copy_filter:
        - .kitchen
        - .git
    +  init_environment: |
    +    echo 'log_level: debug' > /tmp/kitchen/srv/salt/salt/files/master.d/log-level.conf
    +    echo 'log_level: debug' > /tmp/kitchen/srv/salt/salt/files/minion.d/log-level.conf
    
    verifier:
      # https://www.inspec.io/
  3. run again ./bin/kitchen converge v3001-py3-debian-10-3001-py3

Without this patch, this results in the following:

local:
  Name: deb http://repo.saltstack.com/py3/debian/10/amd64/3001 buster main - Function: pkgrepo.managed - Result: Clean Started: - 10:17:02.405550 Duration: 33.394 ms
  Name: salt-master - Function: pkg.installed - Result: Clean Started: - 10:17:03.390722 Duration: 23.288 ms
----------
          ID: salt-master
    Function: file.recurse
        Name: /etc/salt/master.d
      Result: True
     Comment: Recursively updated /etc/salt/master.d
     Started: 10:17:03.418373
    Duration: 263.417 ms
     Changes:   
       ----------
       /etc/salt/master.d/log-level.conf:
           ----------
           diff:
               New file
           mode:
               0644
  Name: /etc/salt/master.d/_defaults.conf - Function: file.absent - Result: Clean Started: - 10:17:03.695305 Duration: 1.246 ms
----------
          ID: salt-master
    Function: service.running
      Result: True
     Comment: Service restarted
     Started: 10:17:03.731467
    Duration: 1510.131 ms
     Changes:   
       ----------
       salt-master:
           True
  Name: salt-minion - Function: pkg.installed - Result: Clean Started: - 10:17:05.244984 Duration: 10.765 ms
----------
          ID: salt-minion
    Function: file.recurse
        Name: /etc/salt/minion.d
      Result: True
     Comment: Recursively updated /etc/salt/minion.d
     Started: 10:17:05.255942
    Duration: 234.212 ms
     Changes:   
       ----------
       /etc/salt/minion.d/log-level.conf:
           ----------
           diff:
               New file
           mode:
               0644
  Name: /etc/salt/minion.d/_defaults.conf - Function: file.absent - Result: Clean Started: - 10:17:05.503972 Duration: 0.826 ms
  Name: salt-minion - Function: service.running - Result: Clean Started: - 10:17:05.505057 Duration: 31.264 ms
----------
          ID: salt-minion
    Function: cmd.run
        Name: salt-call --local service.restart salt-minion --out-file /dev/null
      Result: True
     Comment: Command "salt-call --local service.restart salt-minion --out-file /dev/null" run
     Started: 10:17:05.539735
    Duration: 5.567 ms
     Changes:   
       ----------
       pid:
           3256
       retcode:
           None
       stderr:
       stdout:

Summary for local
-------------
Succeeded: 10 (changed=4)
Failed:     0
-------------
Total states run:     10
Total run time:    2.114 s
Downloading files from <v3001-py3-debian-10-3001-py3>
Finished converging <v3001-py3-debian-10-3001-py3> (0m9.78s).

So, the master is restarted just after it's configuration is modified and the minion states are done after.

With the patch, the services are restarted after all configuration are done:

local:
  Name: deb http://repo.saltstack.com/py3/debian/10/amd64/3001 buster main - Function: pkgrepo.managed - Result: Clean Started: - 10:38:25.283063 Duration: 32.998 ms
  Name: salt-master - Function: pkg.installed - Result: Clean Started: - 10:38:26.270636 Duration: 23.125 ms
----------
          ID: salt-master
    Function: file.recurse
        Name: /etc/salt/master.d
      Result: True
     Comment: Recursively updated /etc/salt/master.d
     Started: 10:38:26.298228
    Duration: 265.613 ms
     Changes:   
       ----------
       /etc/salt/master.d/log-level.conf:
           ----------
           diff:
               New file
           mode:
               0644
  Name: /etc/salt/master.d/_defaults.conf - Function: file.absent - Result: Clean Started: - 10:38:26.566110 Duration: 0.72 ms
  Name: salt-minion - Function: pkg.installed - Result: Clean Started: - 10:38:26.572119 Duration: 819.962 ms
----------
          ID: salt-minion
    Function: file.recurse
        Name: /etc/salt/minion.d
      Result: True
     Comment: Recursively updated /etc/salt/minion.d
     Started: 10:38:27.392369
    Duration: 217.93 ms
     Changes:   
       ----------
       /etc/salt/minion.d/log-level.conf:
           ----------
           diff:
               New file
           mode:
               0644
  Name: /etc/salt/minion.d/_defaults.conf - Function: file.absent - Result: Clean Started: - 10:38:27.614029 Duration: 0.742 ms
----------
          ID: salt-minion
    Function: cmd.run
        Name: salt-call --local service.restart salt-minion --out-file /dev/null
      Result: True
     Comment: Command "salt-call --local service.restart salt-minion --out-file /dev/null" run
     Started: 10:38:27.615051
    Duration: 7.44 ms
     Changes:   
       ----------
       pid:
           2361
       retcode:
           None
       stderr:
       stdout:
----------
          ID: salt-master
    Function: service.running
      Result: True
     Comment: Service restarted
     Started: 10:38:27.673946
    Duration: 2606.813 ms
     Changes:   
       ----------
       salt-master:
           True
----------
          ID: salt-minion
    Function: service.running
      Result: True
     Comment: Service salt-minion is already enabled, and is running
     Started: 10:38:30.281587
    Duration: 241.509 ms
     Changes:   
       ----------
       salt-minion:
           True

Summary for local
-------------
Succeeded: 10 (changed=5)
Failed:     0
-------------
Total states run:     10
Total run time:    4.217 s
Downloading files from <v3001-py3-debian-10-3001-py3>
Finished converging <v3001-py3-debian-10-3001-py3> (0m11.84s).
baby-gnu commented 3 years ago

So, I can't find a way to automate the testing. Sorry.

n-rodriguez commented 3 years ago

Sorry, I could only do manual testing since I don't find a way to run the formula twice in a kitchen run.

That why I thought it could be done with Inspec.

daks commented 3 years ago

I don't think it's possible to run something (highstate or formula depending of the specific config) twice if it succeeds. If it fails, it's possible using max_retries.

myii commented 3 years ago

Thanks for the patience, @remichristiaan -- merged based on approving reviews (x2).

saltstack-formulas-travis commented 3 years ago

:tada: This PR is included in version 1.5.1 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket: