saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.11k stars 5.47k forks source link

salt-minion not thread safe [BUG] #61830

Open tsaridas opened 2 years ago

tsaridas commented 2 years ago

Description When configuration option multiprocessing is set to False the minion may return an error code which is wrong when multiple commands are being ran. I assume this is because __context__ is being overwritten by another running command.

The reason multiprocessing is useful in windows is due to the fact that the agent tries to authenticate every time it returns the result.

Setup configure minion with multiprocessing: False

Please be as specific as possible and give set-up details.

Steps to Reproduce the behavior Run the below commands multiple times on a salt master (10 times (5-5) should be enough to reproduce quickly)

# The idea here is to send a command that will fail 100% and watch for the return code to be positive 
while true; do if salt -L ThreadedMinion cmd.run 'dir blah'; then echo "Reproduced"; break; fi; sleep 0.$RANDOM; done
# The idea here is to send a command that will always succeed and watch if the return code is negative
while true; do ! if salt -L ThreadedMinion cmd.run 'echo blah'; then echo "Reproduced"; break; fi; sleep 0.$RANDOM; done

The return of the looks should look like this when it fails

[ThreadedMinion](ThreadedMinion):
    The system cannot find the file specified.
ERROR: Minions returned with non-zero exit code

[ThreadedMinion](ThreadedMinion):
    The system cannot find the file specified.
ERROR: Minions returned with non-zero exit code

[ThreadedMinion](ThreadedMinion):
    The system cannot find the file specified.
ERROR: Minions returned with non-zero exit code

[ThreadedMinion](ThreadedMinion):
    The system cannot find the file specified.
ERROR: Minions returned with non-zero exit code

[ThreadedMinion](ThreadedMinion):
    The system cannot find the file specified.
ERROR: Minions returned with non-zero exit code

[ThreadedMinion](ThreadedMinion):
    The system cannot find the file specified.
ERROR: Minions returned with non-zero exit code

[ThreadedMinion](ThreadedMinion):
    The system cannot find the file specified.

Reproduced
[ThreadedMinion](ThreadedMinion):
    abc

[ThreadedMinion](ThreadedMinion):
    abc

[ThreadedMinion](ThreadedMinion):
    abc

[ThreadedMinion](ThreadedMinion):
    abc

[ThreadedMinion](ThreadedMinion):
    abc

ERROR: Minions returned with non-zero exit code

Reproduced

The minion log looks normal and has the actual results of the commands being sent.

2022-03-22 17:20:56,902 [salt.minion      :2165][INFO    ][11840] Returning information for job: 20220322172056719786
2022-03-22 17:20:58,355 [salt.minion      :1686][INFO    ][11840] User root Executing command cmd.run with jid 20220322172058317569
2022-03-22 17:20:58,448 [salt.minion      :1882][INFO    ][11840] Starting a new job 20220322172058317569 with PID 11840
2022-03-22 17:20:58,466 [salt.loaded.int.module.cmdmod:415 ][INFO    ][11840] Executing command 'echo' in directory 'C:\Windows\system32\config\systemprofile'
2022-03-22 17:20:58,488 [salt.minion      :2165][INFO    ][11840] Returning information for job: 20220322172058317569
2022-03-22 17:21:00,632 [salt.minion      :1686][INFO    ][11840] User root Executing command cmd.run with jid 20220322172100597047
2022-03-22 17:21:00,742 [salt.minion      :1882][INFO    ][11840] Starting a new job 20220322172100597047 with PID 11840
2022-03-22 17:21:00,765 [salt.loaded.int.module.cmdmod:415 ][INFO    ][11840] Executing command 'dir' in directory 'C:\Windows\system32\config\systemprofile'
2022-03-22 17:21:00,794 [salt.loaded.int.module.cmdmod:856 ][ERROR   ][11840] Command 'dir' failed with return code: 2
2022-03-22 17:21:00,794 [salt.loaded.int.module.cmdmod:858 ][ERROR   ][11840] stdout: The system cannot find the file specified.
2022-03-22 17:21:00,794 [salt.loaded.int.module.cmdmod:862 ][ERROR   ][11840] retcode: 2
2022-03-22 17:21:00,795 [salt.loaded.int.module.cmdmod:1253][ERROR   ][11840] Command 'dir' failed with return code: 2
2022-03-22 17:21:00,795 [salt.loaded.int.module.cmdmod:1258][ERROR   ][11840] output: The system cannot find the file specified.

Expected behavior The minion should return the same error code as the command that was ran.

Screenshots

Versions Report

Salt Version:
          Salt: 3003.3.1
Dependency Versions:
          cffi: 1.14.5
      cherrypy: unknown
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: 4.0.5
     gitpython: 3.1.13
        Jinja2: 2.11.3
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: 1.1.4
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.20
      pycrypto: Not Installed
  pycryptodome: 3.9.8
        pygit2: Not Installed
        Python: 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)]
  python-gnupg: 0.4.6
        PyYAML: 5.4.1
         PyZMQ: 18.0.1
         smmap: 3.0.4
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.1
System Versions:
          dist:
        locale: cp1250
       machine: AMD64
       release: 2019Server
        system: Windows
       version: 2019Server

highest version tested is 3003.3 lowest version tested is 3000.9

Additional context I believe the multiprocessing config option should be removed until this issue is fixed.

OrangeDog commented 2 years ago

Can you provide a full reproduction and version report as requested by the issue template?

tsaridas commented 2 years ago

Can you provide a full reproduction and version report as requested by the issue template?

updated although not necessary imo.

OrangeDog commented 2 years ago

What is the error it returns? Is there some logging from the minion?

tsaridas commented 2 years ago

What is the error it returns? Is there some logging from the minion?

There is no errors (exceptions) on the minion logs as far as I can see except the command that failed and returns an error code of 2.

I believe you might not have understood the issue here.

The command that is supposed to fail returns as successful (or via versa) on the salt master and its probably because the __context__ on the minion is being overwritten by one of the commands that was successful.

Please try to understand what the two while loops are doing and try to run then multiple times ( different windows ) on the same master.

waynew commented 2 years ago

So... I just spent a bunch of time digging into this on today's Test Clinic and I have way more questions than answers :sweat_smile:

The answers that I have:

The questions I have:

There are probably some other ones that I've forgotten by now, but... yeah something is quite strange here :grimacing: