saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.1k stars 5.47k forks source link

Salt Master Performance #43872

Closed ssoto2 closed 4 years ago

ssoto2 commented 6 years ago

Description of Issue/Question

Recently we have noticed that the salt-master has becoming very slow to the point that it can take 10 to 15 minutes for small jobs to (complete couple lines in a state file and about 20-30 minions) . After reviewing the box and the troubleshooting documentation we could find any errors. We have spun up a fresh build of the master and time for one state to be executed is taking 1 minute to come back. If i did more the one box at a time will time will escalate up to 10 to 15 minutes.

The only error i can find is when running the debug the master continuously claims that a return code is missing from client

This happens when attempting against any box in the fleet IE CentOS6 or CentOS7

Please note the node id does say docker but that is name of the host the state is not trying to be committed to a docker container.

Any help would be grateful as salt and effectively slowed down to a crawl and not reliable any more

Setup

(Please provide relevant configs and/or SLS files (Be sure to remove sensitive info).)

mitigate.cve.2017.1000253:
  sysctl.present:
    - name: vm.legacy_va_layout
    - value: 1

Please note this is just a script i am testing with as i need to apply this, but it is not just this script

Steps to Reproduce Issue

time salt -v 'AMZ.ENG_Docker_02.i-598f95c9' state.sls ts.deploy.telesto.CVE-2017-1000253 -l debug test=True
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Including configuration from '/etc/salt/master.d/ext_pillars.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/ext_pillars.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/external_auth.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/external_auth.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/file_server.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/file_server.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/saltapi.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/saltapi.conf
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: salt.dn.com
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] Configuration file path: /etc/salt/master
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Including configuration from '/etc/salt/master.d/ext_pillars.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/ext_pillars.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/external_auth.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/external_auth.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/file_server.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/file_server.conf
[DEBUG   ] Including configuration from '/etc/salt/master.d/saltapi.conf'
[DEBUG   ] Reading configuration from /etc/salt/master.d/saltapi.conf
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: salt.dn.com
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'salt.dn.com_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] Initializing new IPCClient for path: /var/run/salt/master/master_event_pub.ipc
Executing job with jid 20171002173559837505
-------------------------------------------

[DEBUG   ] LazyLoaded local_cache.get_load
[DEBUG   ] Reading minion list from /var/cache/salt/master/jobs/a3/1686498e4e962dd4b6d5e81ae4285a6263dda4d125bcdaba5a971b87b5e1ce/.minions.p
[DEBUG   ] get_iter_returns for jid 20171002173559837505 sent to set(['AMZ.ENG_Docker_02.i-598f95c9']) will timeout at 17:36:04.845852
[DEBUG   ] Checking whether jid 20171002173559837505 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'salt.dn.com_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return
[DEBUG   ] Checking whether jid 20171002173559837505 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'salt.dn.com_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return
[DEBUG   ] Checking whether jid 20171002173559837505 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'salt.dn.com_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return
[DEBUG   ] Checking whether jid 20171002173559837505 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'salt.dn.com_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return
[DEBUG   ] Checking whether jid 20171002173559837505 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'salt.dn.com_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return
[DEBUG   ] Checking whether jid 20171002173559837505 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'salt.dn.com_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return
[DEBUG   ] Checking whether jid 20171002173559837505 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'salt.dn.com_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return
[DEBUG   ] jid 20171002173559837505 return from AMZ.ENG_Docker_02.i-598f95c9
[DEBUG   ] return event: {'AMZ.ENG_Docker_02.i-598f95c9': {'jid': '20171002173559837505', 'retcode': 0, 'ret': {'sysctl_|-mitigate.cve.2017.1000253_|-vm.legacy_va_layout_|-present': {'comment': 'Sysctl option vm.legacy_va_layout might be changed, we failed to check config file at /etc/sysctl.d/99-salt.conf. The file is either unreadable, or missing.', 'name': 'vm.legacy_va_layout', 'start_time': '17:37:11.461367', 'result': None, 'duration': 35.159, '__run_num__': 0, '__sls__': 'ts.deploy.telesto.CVE-2017-1000253', 'changes': {}, '__id__': 'mitigate.cve.2017.1000253'}}, 'out': 'highstate'}}
[DEBUG   ] LazyLoaded highstate.output
AMZ.ENG_Docker_02.i-598f95c9:
----------
          ID: mitigate.cve.2017.1000253
    Function: sysctl.present
        Name: vm.legacy_va_layout
      Result: None
     Comment: Sysctl option vm.legacy_va_layout might be changed, we failed to check config file at /etc/sysctl.d/99-salt.conf. The file is either unreadable, or missing.
     Started: 17:37:11.461367
    Duration: 35.159 ms
     Changes:   

Summary for AMZ.ENG_Docker_02.i-598f95c9
------------
Succeeded: 1 (unchanged=1)
Failed:    0
------------
Total states run:     1
Total run time:  35.159 ms
[DEBUG   ] jid 20171002173559837505 found all minions set(['AMZ.ENG_Docker_02.i-598f95c9'])

real    1m15.015s
user    0m1.480s
sys 0m0.115s

Versions Report

SALT MASTER

Salt Version:
           Salt: 2017.7.1

Dependency Versions:
           cffi: Not Installed
       cherrypy: unknown
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: 0.6.4
      gitpython: 1.0.1
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.8
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.5 (default, Aug  4 2017, 00:39:18)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: 0.9.0
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.4.1708 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-693.2.2.el7.x86_64
         system: Linux
        version: CentOS Linux 7.4.1708 Core

SALT MINION

Salt Version:
           Salt: 2017.7.1

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 1.5
      docker-py: 1.8.1
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: 0.21.1
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.8
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.5 (default, Aug  4 2017, 00:39:18)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.4.1708 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-693.2.2.el7.x86_64
         system: Linux
        version: CentOS Linux 7.4.1708 Core
ssoto2 commented 6 years ago

3 minions

real    2m19.597s
user    0m1.920s
sys 0m0.142s
Ch3LL commented 6 years ago

What about on the minion side? What do the debug logs show there?

Can you verify outside of salt that there wasn't any network slowdown possibly in your environment?

Ch3LL commented 6 years ago

Also what does the CPU/Memory Usage look like on the master when this occurs?

ssoto2 commented 6 years ago

Regarding the the CPU/MEM running salt now pings the processors and memory will go from ~600 megs to around 1.74GB when running this statement.. See the picture included (This is while running the state listed above)

selection_003

Additional Information: This server is only running the salt master it is not being used for anything else.

ssoto2 commented 6 years ago

@Ch3LL the network performance has never been a problem. The vast majority of the instances are in AWS in the same region and VPC. All the instances in the VPC have high to 10GB network capability on them. We have been using Salt in amazon for some time and no issues till about 1.5 month ago.. here is a iperf for documentation

MINION

iperf -c 10.0.8.18 -p 55555 -t 30 -P 8
------------------------------------------------------------
Client connecting to 10.0.8.18, TCP port 55555
TCP window size:  325 KByte (default)
------------------------------------------------------------
[ 10] local 10.0.2.83 port 57256 connected with 10.0.8.18 port 55555
[  4] local 10.0.2.83 port 57242 connected with 10.0.8.18 port 55555
[  5] local 10.0.2.83 port 57246 connected with 10.0.8.18 port 55555
[  3] local 10.0.2.83 port 57244 connected with 10.0.8.18 port 55555
[  7] local 10.0.2.83 port 57250 connected with 10.0.8.18 port 55555
[  6] local 10.0.2.83 port 57248 connected with 10.0.8.18 port 55555
[  8] local 10.0.2.83 port 57252 connected with 10.0.8.18 port 55555
[  9] local 10.0.2.83 port 57254 connected with 10.0.8.18 port 55555
[ ID] Interval       Transfer     Bandwidth
[ 10]  0.0-30.0 sec   167 MBytes  46.6 Mbits/sec
[  8]  0.0-30.0 sec   202 MBytes  56.3 Mbits/sec
[  4]  0.0-30.0 sec   187 MBytes  52.2 Mbits/sec
[  3]  0.0-30.0 sec   274 MBytes  76.6 Mbits/sec
[  7]  0.0-30.0 sec   212 MBytes  59.2 Mbits/sec
[  9]  0.0-30.0 sec   308 MBytes  86.0 Mbits/sec
[  6]  0.0-30.0 sec   196 MBytes  54.6 Mbits/sec
[  5]  0.0-30.0 sec   203 MBytes  56.6 Mbits/sec
[SUM]  0.0-30.0 sec  1.71 GBytes   488 Mbits/sec

MASTER

iperf -s -p 55555
------------------------------------------------------------
Server listening on TCP port 55555
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 10.0.8.18 port 55555 connected with 10.0.2.83 port 57244
[  5] local 10.0.8.18 port 55555 connected with 10.0.2.83 port 57246
[  7] local 10.0.8.18 port 55555 connected with 10.0.2.83 port 57248
[  6] local 10.0.8.18 port 55555 connected with 10.0.2.83 port 57242
[  8] local 10.0.8.18 port 55555 connected with 10.0.2.83 port 57250
[  9] local 10.0.8.18 port 55555 connected with 10.0.2.83 port 57252
[ 10] local 10.0.8.18 port 55555 connected with 10.0.2.83 port 57254
[ 11] local 10.0.8.18 port 55555 connected with 10.0.2.83 port 57256
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-30.1 sec   274 MBytes  76.4 Mbits/sec
[  9]  0.0-30.1 sec   202 MBytes  56.2 Mbits/sec
[  5]  0.0-30.1 sec   203 MBytes  56.5 Mbits/sec
[  6]  0.0-30.1 sec   187 MBytes  52.0 Mbits/sec
[ 11]  0.0-30.1 sec   167 MBytes  46.4 Mbits/sec
[  8]  0.0-30.1 sec   212 MBytes  59.0 Mbits/sec
[ 10]  0.0-30.1 sec   308 MBytes  85.7 Mbits/sec
[  7]  0.0-30.2 sec   196 MBytes  54.4 Mbits/sec
[SUM]  0.0-30.2 sec  1.71 GBytes   486 Mbits/sec

In Regards to your other question about the minion debugging. Here is the Debug logs from the minion. Here is the log from the minion listed in the above in debug mode:

2017-10-05 08:51:40,503 [salt.minion      ][INFO    ][2945] User sudo_ssoto Executing command state.sls with jid 20171005085143518712
2017-10-05 08:51:40,503 [salt.minion      ][DEBUG   ][2945] Command details {'tgt_type': 'glob', 'jid': '20171005085143518712', 'tgt': 'AMZ.ENG_Docker_02.i-598f95c9', 'ret': '', 'user': 'sudo_ssoto', 'arg': ['ts.deploy.telesto.CVE-2017-1000253', {'test': True, '__kwarg__': True}], 'fun': 'state.sls'}
2017-10-05 08:51:40,511 [salt.minion      ][INFO    ][3092] Starting a new job with PID 3092
2017-10-05 08:51:40,517 [salt.utils.lazy  ][DEBUG   ][3092] LazyLoaded state.sls
2017-10-05 08:51:40,519 [salt.utils.lazy  ][DEBUG   ][3092] LazyLoaded direct_call.get
2017-10-05 08:51:40,528 [salt.utils.lazy  ][DEBUG   ][3092] LazyLoaded saltutil.is_running
2017-10-05 08:51:40,529 [salt.utils.lazy  ][DEBUG   ][3092] LazyLoaded grains.get
2017-10-05 08:51:40,535 [salt.transport.zeromq][DEBUG   ][3092] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506', 'aes')
2017-10-05 08:51:40,535 [salt.crypt       ][DEBUG   ][3092] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506')
2017-10-05 08:51:45,520 [salt.minion      ][INFO    ][2945] User sudo_ssoto Executing command saltutil.find_job with jid 20171005085148532424
2017-10-05 08:51:45,521 [salt.minion      ][DEBUG   ][2945] Command details {'tgt_type': 'list', 'jid': '20171005085148532424', 'tgt': ['AMZ.ENG_Docker_02.i-598f95c9'], 'ret': '', 'user': 'sudo_ssoto', 'arg': ['20171005085143518712'], 'fun': 'saltutil.find_job'}
2017-10-05 08:51:45,528 [salt.minion      ][INFO    ][3098] Starting a new job with PID 3098
2017-10-05 08:51:45,541 [salt.utils.lazy  ][DEBUG   ][3098] LazyLoaded saltutil.find_job
2017-10-05 08:51:45,543 [salt.utils.lazy  ][DEBUG   ][3098] LazyLoaded direct_call.get
2017-10-05 08:51:45,544 [salt.minion      ][DEBUG   ][3098] Minion return retry timer set to 5 seconds (randomized)
2017-10-05 08:51:45,544 [salt.minion      ][INFO    ][3098] Returning information for job: 20171005085148532424
2017-10-05 08:51:45,545 [salt.transport.zeromq][DEBUG   ][3098] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506', 'aes')
2017-10-05 08:51:45,545 [salt.crypt       ][DEBUG   ][3098] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506')
2017-10-05 08:51:45,554 [salt.minion      ][DEBUG   ][3098] minion return: {'fun_args': ['20171005085143518712'], 'jid': '20171005085148532424', 'return': {'tgt_type': 'glob', 'jid': '20171005085143518712', 'tgt': 'AMZ.ENG_Docker_02.i-598f95c9', 'pid': 3092, 'ret': '', 'user': 'sudo_ssoto', 'arg': ['ts.deploy.telesto.CVE-2017-1000253', {'test': True, '__kwarg__': True}], 'fun': 'state.sls'}, 'retcode': 0, 'success': True, 'fun': 'saltutil.find_job'}
2017-10-05 08:51:55,615 [salt.minion      ][INFO    ][2945] User sudo_ssoto Executing command saltutil.find_job with jid 20171005085158620608
2017-10-05 08:51:55,615 [salt.minion      ][DEBUG   ][2945] Command details {'tgt_type': 'list', 'jid': '20171005085158620608', 'tgt': ['AMZ.ENG_Docker_02.i-598f95c9'], 'ret': '', 'user': 'sudo_ssoto', 'arg': ['20171005085143518712'], 'fun': 'saltutil.find_job'}
2017-10-05 08:51:55,623 [salt.minion      ][INFO    ][3103] Starting a new job with PID 3103
2017-10-05 08:51:55,637 [salt.utils.lazy  ][DEBUG   ][3103] LazyLoaded saltutil.find_job
2017-10-05 08:51:55,638 [salt.utils.lazy  ][DEBUG   ][3103] LazyLoaded direct_call.get
2017-10-05 08:51:55,639 [salt.minion      ][DEBUG   ][3103] Minion return retry timer set to 9 seconds (randomized)
2017-10-05 08:51:55,639 [salt.minion      ][INFO    ][3103] Returning information for job: 20171005085158620608
2017-10-05 08:51:55,640 [salt.transport.zeromq][DEBUG   ][3103] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506', 'aes')
2017-10-05 08:51:55,640 [salt.crypt       ][DEBUG   ][3103] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506')
2017-10-05 08:51:55,652 [salt.minion      ][DEBUG   ][3103] minion return: {'fun_args': ['20171005085143518712'], 'jid': '20171005085158620608', 'return': {'tgt_type': 'glob', 'jid': '20171005085143518712', 'tgt': 'AMZ.ENG_Docker_02.i-598f95c9', 'pid': 3092, 'ret': '', 'user': 'sudo_ssoto', 'arg': ['ts.deploy.telesto.CVE-2017-1000253', {'test': True, '__kwarg__': True}], 'fun': 'state.sls'}, 'retcode': 0, 'success': True, 'fun': 'saltutil.find_job'}
2017-10-05 08:52:05,678 [salt.minion      ][INFO    ][2945] User sudo_ssoto Executing command saltutil.find_job with jid 20171005085208682450
2017-10-05 08:52:05,679 [salt.minion      ][DEBUG   ][2945] Command details {'tgt_type': 'list', 'jid': '20171005085208682450', 'tgt': ['AMZ.ENG_Docker_02.i-598f95c9'], 'ret': '', 'user': 'sudo_ssoto', 'arg': ['20171005085143518712'], 'fun': 'saltutil.find_job'}
2017-10-05 08:52:05,687 [salt.minion      ][INFO    ][3111] Starting a new job with PID 3111
2017-10-05 08:52:05,700 [salt.utils.lazy  ][DEBUG   ][3111] LazyLoaded saltutil.find_job
2017-10-05 08:52:05,701 [salt.utils.lazy  ][DEBUG   ][3111] LazyLoaded direct_call.get
2017-10-05 08:52:05,702 [salt.minion      ][DEBUG   ][3111] Minion return retry timer set to 9 seconds (randomized)
2017-10-05 08:52:05,702 [salt.minion      ][INFO    ][3111] Returning information for job: 20171005085208682450
2017-10-05 08:52:05,703 [salt.transport.zeromq][DEBUG   ][3111] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506', 'aes')
2017-10-05 08:52:05,703 [salt.crypt       ][DEBUG   ][3111] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506')
2017-10-05 08:52:05,718 [salt.minion      ][DEBUG   ][3111] minion return: {'fun_args': ['20171005085143518712'], 'jid': '20171005085208682450', 'return': {'tgt_type': 'glob', 'jid': '20171005085143518712', 'tgt': 'AMZ.ENG_Docker_02.i-598f95c9', 'pid': 3092, 'ret': '', 'user': 'sudo_ssoto', 'arg': ['ts.deploy.telesto.CVE-2017-1000253', {'test': True, '__kwarg__': True}], 'fun': 'state.sls'}, 'retcode': 0, 'success': True, 'fun': 'saltutil.find_job'}
2017-10-05 08:52:15,794 [salt.minion      ][INFO    ][2945] User sudo_ssoto Executing command saltutil.find_job with jid 20171005085218806107
2017-10-05 08:52:15,795 [salt.minion      ][DEBUG   ][2945] Command details {'tgt_type': 'list', 'jid': '20171005085218806107', 'tgt': ['AMZ.ENG_Docker_02.i-598f95c9'], 'ret': '', 'user': 'sudo_ssoto', 'arg': ['20171005085143518712'], 'fun': 'saltutil.find_job'}
2017-10-05 08:52:15,802 [salt.minion      ][INFO    ][3116] Starting a new job with PID 3116
2017-10-05 08:52:15,816 [salt.utils.lazy  ][DEBUG   ][3116] LazyLoaded saltutil.find_job
2017-10-05 08:52:15,817 [salt.utils.lazy  ][DEBUG   ][3116] LazyLoaded direct_call.get
2017-10-05 08:52:15,818 [salt.minion      ][DEBUG   ][3116] Minion return retry timer set to 6 seconds (randomized)
2017-10-05 08:52:15,818 [salt.minion      ][INFO    ][3116] Returning information for job: 20171005085218806107
2017-10-05 08:52:15,819 [salt.transport.zeromq][DEBUG   ][3116] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506', 'aes')
2017-10-05 08:52:15,819 [salt.crypt       ][DEBUG   ][3116] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506')
2017-10-05 08:52:15,828 [salt.minion      ][DEBUG   ][3116] minion return: {'fun_args': ['20171005085143518712'], 'jid': '20171005085218806107', 'return': {'tgt_type': 'glob', 'jid': '20171005085143518712', 'tgt': 'AMZ.ENG_Docker_02.i-598f95c9', 'pid': 3092, 'ret': '', 'user': 'sudo_ssoto', 'arg': ['ts.deploy.telesto.CVE-2017-1000253', {'test': True, '__kwarg__': True}], 'fun': 'state.sls'}, 'retcode': 0, 'success': True, 'fun': 'saltutil.find_job'}
2017-10-05 08:52:25,904 [salt.minion      ][INFO    ][2945] User sudo_ssoto Executing command saltutil.find_job with jid 20171005085228916540
2017-10-05 08:52:25,904 [salt.minion      ][DEBUG   ][2945] Command details {'tgt_type': 'list', 'jid': '20171005085228916540', 'tgt': ['AMZ.ENG_Docker_02.i-598f95c9'], 'ret': '', 'user': 'sudo_ssoto', 'arg': ['20171005085143518712'], 'fun': 'saltutil.find_job'}
2017-10-05 08:52:25,912 [salt.minion      ][INFO    ][3121] Starting a new job with PID 3121
2017-10-05 08:52:25,925 [salt.utils.lazy  ][DEBUG   ][3121] LazyLoaded saltutil.find_job
2017-10-05 08:52:25,927 [salt.utils.lazy  ][DEBUG   ][3121] LazyLoaded direct_call.get
2017-10-05 08:52:25,928 [salt.minion      ][DEBUG   ][3121] Minion return retry timer set to 5 seconds (randomized)
2017-10-05 08:52:25,928 [salt.minion      ][INFO    ][3121] Returning information for job: 20171005085228916540
2017-10-05 08:52:25,928 [salt.transport.zeromq][DEBUG   ][3121] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506', 'aes')
2017-10-05 08:52:25,929 [salt.crypt       ][DEBUG   ][3121] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506')
2017-10-05 08:52:25,938 [salt.minion      ][DEBUG   ][3121] minion return: {'fun_args': ['20171005085143518712'], 'jid': '20171005085228916540', 'return': {'tgt_type': 'glob', 'jid': '20171005085143518712', 'tgt': 'AMZ.ENG_Docker_02.i-598f95c9', 'pid': 3092, 'ret': '', 'user': 'sudo_ssoto', 'arg': ['ts.deploy.telesto.CVE-2017-1000253', {'test': True, '__kwarg__': True}], 'fun': 'state.sls'}, 'retcode': 0, 'success': True, 'fun': 'saltutil.find_job'}
2017-10-05 08:52:35,968 [salt.minion      ][INFO    ][2945] User sudo_ssoto Executing command saltutil.find_job with jid 20171005085238980257
2017-10-05 08:52:35,968 [salt.minion      ][DEBUG   ][2945] Command details {'tgt_type': 'list', 'jid': '20171005085238980257', 'tgt': ['AMZ.ENG_Docker_02.i-598f95c9'], 'ret': '', 'user': 'sudo_ssoto', 'arg': ['20171005085143518712'], 'fun': 'saltutil.find_job'}
2017-10-05 08:52:35,976 [salt.minion      ][INFO    ][3129] Starting a new job with PID 3129
2017-10-05 08:52:35,990 [salt.utils.lazy  ][DEBUG   ][3129] LazyLoaded saltutil.find_job
2017-10-05 08:52:35,991 [salt.utils.lazy  ][DEBUG   ][3129] LazyLoaded direct_call.get
2017-10-05 08:52:35,992 [salt.minion      ][DEBUG   ][3129] Minion return retry timer set to 9 seconds (randomized)
2017-10-05 08:52:35,993 [salt.minion      ][INFO    ][3129] Returning information for job: 20171005085238980257
2017-10-05 08:52:35,993 [salt.transport.zeromq][DEBUG   ][3129] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506', 'aes')
2017-10-05 08:52:35,994 [salt.crypt       ][DEBUG   ][3129] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506')
2017-10-05 08:52:36,002 [salt.minion      ][DEBUG   ][3129] minion return: {'fun_args': ['20171005085143518712'], 'jid': '20171005085238980257', 'return': {'tgt_type': 'glob', 'jid': '20171005085143518712', 'tgt': 'AMZ.ENG_Docker_02.i-598f95c9', 'pid': 3092, 'ret': '', 'user': 'sudo_ssoto', 'arg': ['ts.deploy.telesto.CVE-2017-1000253', {'test': True, '__kwarg__': True}], 'fun': 'state.sls'}, 'retcode': 0, 'success': True, 'fun': 'saltutil.find_job'}
2017-10-05 08:52:46,090 [salt.minion      ][INFO    ][2945] User sudo_ssoto Executing command saltutil.find_job with jid 20171005085249103205
2017-10-05 08:52:46,091 [salt.minion      ][DEBUG   ][2945] Command details {'tgt_type': 'list', 'jid': '20171005085249103205', 'tgt': ['AMZ.ENG_Docker_02.i-598f95c9'], 'ret': '', 'user': 'sudo_ssoto', 'arg': ['20171005085143518712'], 'fun': 'saltutil.find_job'}
2017-10-05 08:52:46,099 [salt.minion      ][INFO    ][3136] Starting a new job with PID 3136
2017-10-05 08:52:46,112 [salt.utils.lazy  ][DEBUG   ][3136] LazyLoaded saltutil.find_job
2017-10-05 08:52:46,113 [salt.utils.lazy  ][DEBUG   ][3136] LazyLoaded direct_call.get
2017-10-05 08:52:46,114 [salt.minion      ][DEBUG   ][3136] Minion return retry timer set to 7 seconds (randomized)
2017-10-05 08:52:46,114 [salt.minion      ][INFO    ][3136] Returning information for job: 20171005085249103205
2017-10-05 08:52:46,115 [salt.transport.zeromq][DEBUG   ][3136] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506', 'aes')
2017-10-05 08:52:46,115 [salt.crypt       ][DEBUG   ][3136] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506')
2017-10-05 08:52:46,124 [salt.minion      ][DEBUG   ][3136] minion return: {'fun_args': ['20171005085143518712'], 'jid': '20171005085249103205', 'return': {'tgt_type': 'glob', 'jid': '20171005085143518712', 'tgt': 'AMZ.ENG_Docker_02.i-598f95c9', 'pid': 3092, 'ret': '', 'user': 'sudo_ssoto', 'arg': ['ts.deploy.telesto.CVE-2017-1000253', {'test': True, '__kwarg__': True}], 'fun': 'state.sls'}, 'retcode': 0, 'success': True, 'fun': 'saltutil.find_job'}
2017-10-05 08:52:56,312 [salt.minion      ][INFO    ][2945] User sudo_ssoto Executing command saltutil.find_job with jid 20171005085259324891
2017-10-05 08:52:56,312 [salt.minion      ][DEBUG   ][2945] Command details {'tgt_type': 'list', 'jid': '20171005085259324891', 'tgt': ['AMZ.ENG_Docker_02.i-598f95c9'], 'ret': '', 'user': 'sudo_ssoto', 'arg': ['20171005085143518712'], 'fun': 'saltutil.find_job'}
2017-10-05 08:52:56,320 [salt.minion      ][INFO    ][3141] Starting a new job with PID 3141
2017-10-05 08:52:56,333 [salt.utils.lazy  ][DEBUG   ][3141] LazyLoaded saltutil.find_job
2017-10-05 08:52:56,335 [salt.utils.lazy  ][DEBUG   ][3141] LazyLoaded direct_call.get
2017-10-05 08:52:56,335 [salt.minion      ][DEBUG   ][3141] Minion return retry timer set to 9 seconds (randomized)
2017-10-05 08:52:56,336 [salt.minion      ][INFO    ][3141] Returning information for job: 20171005085259324891
2017-10-05 08:52:56,336 [salt.transport.zeromq][DEBUG   ][3141] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506', 'aes')
2017-10-05 08:52:56,337 [salt.crypt       ][DEBUG   ][3141] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506')
2017-10-05 08:52:56,346 [salt.minion      ][DEBUG   ][3141] minion return: {'fun_args': ['20171005085143518712'], 'jid': '20171005085259324891', 'return': {'tgt_type': 'glob', 'jid': '20171005085143518712', 'tgt': 'AMZ.ENG_Docker_02.i-598f95c9', 'pid': 3092, 'ret': '', 'user': 'sudo_ssoto', 'arg': ['ts.deploy.telesto.CVE-2017-1000253', {'test': True, '__kwarg__': True}], 'fun': 'state.sls'}, 'retcode': 0, 'success': True, 'fun': 'saltutil.find_job'}
2017-10-05 08:52:59,976 [salt.pillar      ][DEBUG   ][3092] Determining pillar cache
2017-10-05 08:52:59,976 [salt.transport.zeromq][DEBUG   ][3092] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506', 'aes')
2017-10-05 08:52:59,976 [salt.crypt       ][DEBUG   ][3092] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506')
2017-10-05 08:53:00,165 [salt.crypt       ][DEBUG   ][3092] Loaded minion key: /etc/salt/pki/minion/minion.pem
2017-10-05 08:53:00,169 [salt.state       ][INFO    ][3092] Loading fresh modules for state activity
2017-10-05 08:53:00,194 [salt.utils.lazy  ][DEBUG   ][3092] LazyLoaded jinja.render
2017-10-05 08:53:00,195 [salt.utils.lazy  ][DEBUG   ][3092] LazyLoaded yaml.render
2017-10-05 08:53:00,760 [salt.fileclient  ][DEBUG   ][3092] In saltenv 'base', looking at rel_path 'ts/deploy/telesto/CVE-2017-1000253.sls' to resolve 'salt://ts/deploy/telesto/CVE-2017-1000253.sls'
2017-10-05 08:53:00,761 [salt.fileclient  ][DEBUG   ][3092] In saltenv 'base', ** considering ** path '/var/cache/salt/minion/files/base/ts/deploy/telesto/CVE-2017-1000253.sls' to resolve 'salt://ts/deploy/telesto/CVE-2017-1000253.sls'
2017-10-05 08:53:00,761 [salt.template    ][DEBUG   ][3092] compile template: /var/cache/salt/minion/files/base/ts/deploy/telesto/CVE-2017-1000253.sls
2017-10-05 08:53:00,762 [salt.utils.jinja ][DEBUG   ][3092] Jinja search path: ['/var/cache/salt/minion/files/base']
2017-10-05 08:53:00,765 [salt.template    ][PROFILE ][3092] Time (in seconds) to render '/var/cache/salt/minion/files/base/ts/deploy/telesto/CVE-2017-1000253.sls' using 'jinja' renderer: 0.00362491607666
2017-10-05 08:53:00,765 [salt.template    ][DEBUG   ][3092] Rendered data from file: /var/cache/salt/minion/files/base/ts/deploy/telesto/CVE-2017-1000253.sls:
mitigate.cve.2017.1000253:
  sysctl.present:
    - name: vm.legacy_va_layout
    - value: 1

2017-10-05 08:53:00,768 [salt.utils.lazy  ][DEBUG   ][3092] LazyLoaded config.get
2017-10-05 08:53:00,768 [salt.loaded.int.render.yaml][DEBUG   ][3092] Results of YAML rendering: 
OrderedDict([('mitigate.cve.2017.1000253', OrderedDict([('sysctl.present', [OrderedDict([('name', 'vm.legacy_va_layout')]), OrderedDict([('value', 1)])])]))])
2017-10-05 08:53:00,768 [salt.template    ][PROFILE ][3092] Time (in seconds) to render '/var/cache/salt/minion/files/base/ts/deploy/telesto/CVE-2017-1000253.sls' using 'yaml' renderer: 0.00246500968933
2017-10-05 08:53:00,770 [salt.utils.lazy  ][DEBUG   ][3092] LazyLoaded sysctl.show
2017-10-05 08:53:00,770 [salt.utils.lazy  ][DEBUG   ][3092] LazyLoaded sysctl.present
2017-10-05 08:53:00,770 [salt.state       ][INFO    ][3092] Running state [vm.legacy_va_layout] at time 08:53:00.770680
2017-10-05 08:53:00,770 [salt.state       ][INFO    ][3092] Executing state sysctl.present for [vm.legacy_va_layout]
2017-10-05 08:53:00,778 [salt.utils.lazy  ][DEBUG   ][3092] LazyLoaded cmd.run_stdout
2017-10-05 08:53:00,779 [salt.loaded.int.module.cmdmod][INFO    ][3092] Executing command 'sysctl -a' in directory '/root'
2017-10-05 08:53:00,856 [salt.state       ][INFO    ][3092] Sysctl option vm.legacy_va_layout might be changed, we failed to check config file at /etc/sysctl.d/99-salt.conf. The file is either unreadable, or missing.
2017-10-05 08:53:00,856 [salt.state       ][INFO    ][3092] Completed state [vm.legacy_va_layout] at time 08:53:00.856562 duration_in_ms=85.882
2017-10-05 08:53:00,856 [salt.state       ][DEBUG   ][3092] File /var/cache/salt/minion/accumulator/41143440 does not exist, no need to cleanup.
2017-10-05 08:53:00,858 [salt.minion      ][DEBUG   ][3092] Minion return retry timer set to 8 seconds (randomized)
2017-10-05 08:53:00,858 [salt.minion      ][INFO    ][3092] Returning information for job: 20171005085143518712
2017-10-05 08:53:00,859 [salt.transport.zeromq][DEBUG   ][3092] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506', 'aes')
2017-10-05 08:53:00,859 [salt.crypt       ][DEBUG   ][3092] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'AMZ.ENG_Docker_02.i-598f95c9', 'tcp://10.0.8.18:4506')
2017-10-05 08:53:00,867 [salt.minion      ][DEBUG   ][3092] minion return: {'fun_args': ['ts.deploy.telesto.CVE-2017-1000253', {'test': True}], 'jid': '20171005085143518712', 'return': {'sysctl_|-mitigate.cve.2017.1000253_|-vm.legacy_va_layout_|-present': {'comment': 'Sysctl option vm.legacy_va_layout might be changed, we failed to check config file at /etc/sysctl.d/99-salt.conf. The file is either unreadable, or missing.', 'name': 'vm.legacy_va_layout', 'start_time': '08:53:00.770680', 'result': None, 'duration': 85.882, '__run_num__': 0, '__sls__': u'ts.deploy.telesto.CVE-2017-1000253', 'changes': {}, '__id__': 'mitigate.cve.2017.1000253'}}, 'retcode': 0, 'success': True, 'fun': 'state.sls'}
johnpantini commented 6 years ago

I am experiencing the same problem. This has something to do with state.sls|state.apply (salt.modules.state) or saltutil.find_job.

Specifically, sudo salt 'minion' cmd.run 'echo 1' returns immediately, while sudo salt 'minion' state.apply some-state takes a while, where some-state.sls is:

echo 1:
  cmd.run:
    - shell: /bin/bash
    - runas: root

state.single is slow as well.

Master and minion stats are identical:

Salt Version:
           Salt: 2017.7.2

Dependency Versions:
           cffi: Not Installed
       cherrypy: unknown
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.8
   mysql-python: 1.2.5
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.5 (default, Aug  4 2017, 00:39:18)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.4.1708 Core
         locale: UTF-8
        machine: x86_64
        release: 3.10.0-693.5.2.el7.x86_64
         system: Linux
        version: CentOS Linux 7.4.1708 Core

Here's the complete log for the above state.apply

[INFO    ] User sudo_rcrrad Executing command state.apply with jid 20171115193457614955
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20171115193457614955', 'tgt': 'minion', 'ret': '', 'user': 'sudo_rcrrad', 'arg': ['some-state', {'cache': True, '__kwarg__': True}], 'fun': 'state.apply'}
[INFO    ] Starting a new job with PID 1909
[DEBUG   ] LazyLoaded state.apply
[DEBUG   ] LazyLoaded direct_call.get
[DEBUG   ] LazyLoaded saltutil.is_running
[DEBUG   ] LazyLoaded grains.get
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'minion', 'tcp://10.156.0.5:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'minion', 'tcp://10.156.0.5:4506')
[INFO    ] User sudo_rcrrad Executing command saltutil.find_job with jid 20171115193502678204
[DEBUG   ] Command details {'tgt_type': 'list', 'jid': '20171115193502678204', 'tgt': ['minion'], 'ret': '', 'user': 'sudo_rcrrad', 'arg': ['20171115193457614955'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 1915
[DEBUG   ] LazyLoaded saltutil.find_job
[DEBUG   ] LazyLoaded direct_call.get
[DEBUG   ] Minion return retry timer set to 5 seconds (randomized)
[INFO    ] Returning information for job: 20171115193502678204
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'minion', 'tcp://10.156.0.5:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'minion', 'tcp://10.156.0.5:4506')
[DEBUG   ] minion return: {'fun_args': ['20171115193457614955'], 'jid': '20171115193502678204', 'return': {'tgt_type': 'glob', 'jid': '20171115193457614955', 'tgt': 'minion', 'pid': 1909, 'ret': '', 'user': 'sudo_rcrrad', 'arg': ['some-state', {'cache': True, '__kwarg__': True}], 'fun': 'state.apply'}, 'retcode': 0, 'success': True, 'fun': 'saltutil.find_job'}
[INFO    ] User sudo_rcrrad Executing command saltutil.find_job with jid 20171115193512883122
[DEBUG   ] Command details {'tgt_type': 'list', 'jid': '20171115193512883122', 'tgt': ['minion'], 'ret': '', 'user': 'sudo_rcrrad', 'arg': ['20171115193457614955'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 1921
[DEBUG   ] LazyLoaded saltutil.find_job
[DEBUG   ] LazyLoaded direct_call.get
[DEBUG   ] Minion return retry timer set to 7 seconds (randomized)
[INFO    ] Returning information for job: 20171115193512883122
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'minion', 'tcp://10.156.0.5:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'minion', 'tcp://10.156.0.5:4506')
[DEBUG   ] minion return: {'fun_args': ['20171115193457614955'], 'jid': '20171115193512883122', 'return': {'tgt_type': 'glob', 'jid': '20171115193457614955', 'tgt': 'minion', 'pid': 1909, 'ret': '', 'user': 'sudo_rcrrad', 'arg': ['some-state', {'cache': True, '__kwarg__': True}], 'fun': 'state.apply'}, 'retcode': 0, 'success': True, 'fun': 'saltutil.find_job'}
[DEBUG   ] Determining pillar cache
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'minion', 'tcp://10.156.0.5:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'minion', 'tcp://10.156.0.5:4506')
[INFO    ] User sudo_rcrrad Executing command saltutil.find_job with jid 20171115193523095218
[DEBUG   ] Command details {'tgt_type': 'list', 'jid': '20171115193523095218', 'tgt': ['minion'], 'ret': '', 'user': 'sudo_rcrrad', 'arg': ['20171115193457614955'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 1929
[DEBUG   ] LazyLoaded saltutil.find_job
[DEBUG   ] LazyLoaded direct_call.get
[DEBUG   ] Minion return retry timer set to 8 seconds (randomized)
[INFO    ] Returning information for job: 20171115193523095218
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'minion', 'tcp://10.156.0.5:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'minion', 'tcp://10.156.0.5:4506')
[DEBUG   ] minion return: {'fun_args': ['20171115193457614955'], 'jid': '20171115193523095218', 'return': {'tgt_type': 'glob', 'jid': '20171115193457614955', 'tgt': 'minion', 'pid': 1909, 'ret': '', 'user': 'sudo_rcrrad', 'arg': ['some-state', {'cache': True, '__kwarg__': True}], 'fun': 'state.apply'}, 'retcode': 0, 'success': True, 'fun': 'saltutil.find_job'}
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] Loading fresh modules for state activity
[DEBUG   ] LazyLoaded jinja.render
[DEBUG   ] LazyLoaded yaml.render
[DEBUG   ] LazyLoaded config.option
[DEBUG   ] LazyLoaded cmd.run
[INFO    ] Running state [echo 1] at time 19:35:24.427859
[INFO    ] Executing state cmd.run for [echo 1]
[DEBUG   ] LazyLoaded cmd.run_all
[INFO    ] Executing command 'echo 1' as user 'root' in directory '/root'
[DEBUG   ] stdout: 1
[INFO    ] {'pid': 1944, 'retcode': 0, 'stderr': '', 'stdout': '1'}
[INFO    ] Completed state [echo 1] at time 19:35:24.466936 duration_in_ms=39.078
[DEBUG   ] File /var/cache/salt/minion/accumulator/39849552 does not exist, no need to cleanup.
[DEBUG   ] Minion return retry timer set to 10 seconds (randomized)
[INFO    ] Returning information for job: 20171115193457614955
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'minion', 'tcp://10.156.0.5:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'minion', 'tcp://10.156.0.5:4506')
[DEBUG   ] minion return: {'fun_args': ['some-state', {'cache': True}], 'jid': '20171115193457614955', 'return': {'cmd_|-echo 1_|-echo 1_|-run': {'comment': 'Command "echo 1" run', 'name': 'echo 1', 'start_time': '19:35:24.427858', 'result': True, 'duration': 39.078, '__run_num__': 0, '__sls__': 'some-state', 'changes': {'pid': 1944, 'retcode': 0, 'stderr': '', 'stdout': '1'}, '__id__': 'echo 1'}}, 'retcode': 0, 'success': True, 'fun': 'state.apply'}

Master:

sudo salt 'minion' state.apply some-state -l debug
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: gcp-eu-west3-a-console-1.c.innovative-solution.internal
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] Configuration file path: /etc/salt/master
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: gcp-eu-west3-a-console-1.c.innovative-solution.internal
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] Initializing new IPCClient for path: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] LazyLoaded local_cache.get_load
[DEBUG   ] Reading minion list from /var/cache/salt/master/jobs/0b/26f993f05f13f5ea1bd44e7bd1d06041c109dc05a38e13b786f501f868d43b/.minions.p
[DEBUG   ] get_iter_returns for jid 20171115193457614955 sent to set(['minion']) will timeout at 19:35:02.629067
[DEBUG   ] Checking whether jid 20171115193457614955 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return
[DEBUG   ] Checking whether jid 20171115193457614955 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return
[DEBUG   ] Checking whether jid 20171115193457614955 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/master', 'master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] retcode missing from client return
[DEBUG   ] jid 20171115193457614955 return from minion
[DEBUG   ] return event: {'minion': {'jid': '20171115193457614955', 'retcode': 0, 'ret': {'cmd_|-echo 1_|-echo 1_|-run': {'comment': 'Command "echo 1" run', 'name': 'echo 1', 'start_time': '19:35:24.427858', 'result': True, 'duration': 39.078, '__run_num__': 0, '__sls__': 'some-state', 'changes': {'pid': 1944, 'retcode': 0, 'stderr': '', 'stdout': '1'}, '__id__': 'echo 1'}}, 'out': 'highstate'}}
[DEBUG   ] LazyLoaded highstate.output
[DEBUG   ] LazyLoaded nested.output
minion:
----------
          ID: echo 1
    Function: cmd.run
      Result: True
     Comment: Command "echo 1" run
     Started: 19:35:24.427858
    Duration: 39.078 ms
     Changes:
              ----------
              pid:
                  1944
              retcode:
                  0
              stderr:
              stdout:
                  1

Summary for minion
------------
Succeeded: 1 (changed=1)
Failed:    0
------------
Total states run:     1
Total run time:  39.078 ms
[DEBUG   ] jid 20171115193457614955 found all minions set(['minion'])
aabouzaid commented 6 years ago

The same issue even with a small group of minions (~65 minions).

Salt Version:
           Salt: 2017.7.2

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.4.2
      docker-py: Not Installed
          gitdb: 0.6.4
      gitpython: 1.0.1
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: 1.0.3
   msgpack-pure: Not Installed
 msgpack-python: 0.4.6
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.12 (default, Nov 19 2016, 06:48:10)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.2.0
           RAET: Not Installed
          smmap: 0.9.0
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: Ubuntu 16.04 xenial
         locale: UTF-8
        machine: x86_64
        release: 4.8.0-32-generic
         system: Linux
        version: Ubuntu 16.04 xenial
dayyeung commented 6 years ago

How many files does your setup have for all your saltenvs combined? (Files in each saltenv are counted independently even if they point to the same directories) We found out the hard way it has huge impact on state performance.

aabouzaid commented 6 years ago

The point is, there were no major changes neither in config nor in files (not even in number of minions) ... and the performance dragging happened sometime between 2017.7.2 and 2017.7.0 or maybe starting of 2017.7.0.

Ch3LL commented 6 years ago

I'm having a hard time replicating this. In fact I noticed 2017.7.2 is faster then 2016.11.8 when testing with state.sls and the cmd.run state @vryabinin provided.

Can you either of you share a sanitized version of your master/minion config file?

Also is there anything else unique about your setup?

johnpantini commented 6 years ago

Confirming this as of 2018.3.0 (Oxygen)

Can you either of you share a sanitized version of your master/minion config file?

I use the default configuration.

Also is there anything else unique about your setup?

Nothing unique, but the issue seems to affect CentOS 6/7 machines. There is another discussion here https://github.com/saltstack/salt/issues/38538 mentitoning CentOS boxes

johnpantini commented 6 years ago

Turning off pillarstack external pillar helped to solve the issue

Ch3LL commented 6 years ago

anyone else here using external pillar and can confirm the same thing?

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.