saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Install Salt from the Salt package repositories here:
https://docs.saltproject.io/salt/install-guide/en/latest/
Apache License 2.0
14.19k stars 5.48k forks source link

Salt-minion hangs forever #18729

Closed alfredopalhares closed 5 years ago

alfredopalhares commented 9 years ago

Hello,

All my minions started hanging out forever, (on any state whatsoever), I have no Idea why.

Captured an example on a minion, the minion just returns a new job_id over and over. Output done with salt-minion -l debug 2>&1 | tee salt-minion.log:

[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_defaults.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_defaults.conf
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: salt-master.mydomain
[DEBUG   ] Configuration file path: /etc/salt/minion
[INFO    ] Setting up the Salt Minion "salt-master.mydomain"
[DEBUG   ] Created pidfile: /var/run/salt-minion.pid
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_defaults.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_defaults.conf
[DEBUG   ] Attempting to authenticate with the Salt Master at 172.31.13.85
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] Authentication with master at 172.31.13.85 successful!
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_defaults.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_defaults.conf
[DEBUG   ] schedule.clean_proc_dir: checking job {'tgt_type': 'glob', 'jid': '20141204171923172779', 'tgt': 'salt-master*', 'pid': 27441, 'ret': '', 'user': 'sudo_alfredo', 'arg': [], 'fun': 'state.highstate'} for process existence
[INFO    ] Added mine.update to schedular
[INFO    ] Added new job __mine_interval to scheduler
[DEBUG   ] I am salt-master.mydomain and I am not supposed to start any proxies. (Likely not a problem)
[INFO    ] Minion is starting as user 'root'
[DEBUG   ] Minion 'salt-master.mydomain' trying to tune in
[DEBUG   ] Minion PUB socket URI: ipc:///var/run/salt/minion/minion_event_8a1e39f66c_pub.ipc
[DEBUG   ] Minion PULL socket URI: ipc:///var/run/salt/minion/minion_event_8a1e39f66c_pull.ipc
[INFO    ] Starting pub socket on ipc:///var/run/salt/minion/minion_event_8a1e39f66c_pub.ipc
[INFO    ] Starting pull socket on ipc:///var/run/salt/minion/minion_event_8a1e39f66c_pull.ipc
[DEBUG   ] Generated random reconnect delay between '1000ms' and '11000ms' (6052)
[DEBUG   ] Setting zmq_reconnect_ivl to '6052ms'
[DEBUG   ] Setting zmq_reconnect_ivl_max to '11000ms'
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] Minion is ready to receive requests!
[INFO    ] Running scheduled job: __mine_interval
[DEBUG   ] schedule: This job was scheduled with jid_include, adding to cache (jid_include defaults to True)
[DEBUG   ] schedule: This job was scheduled with a max number of 2
[DEBUG   ] schedule.handle_func: adding this job to the jobcache with data {'fun': 'mine.update', 'jid': '20141204172102717063', 'pid': 28564, 'id': 'salt-master.mydomain', 'schedule': '__mine_interval'}
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command state.highstate with jid 20141204172115614337
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172115614337', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': [], 'fun': 'state.highstate'}
[INFO    ] Starting a new job with PID 28698
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Reading configuration from /etc/salt/minion
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172120642325
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172120642325', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_defaults.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_defaults.conf
[INFO    ] Starting a new job with PID 28755
[INFO    ] Returning information for job: 20141204172120642325
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] Loading fresh modules for state activity
[DEBUG   ] Fetching file from saltenv 'base', ** attempting ** 'salt://top.sls'
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172125662596
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172125662596', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 28819
[INFO    ] Returning information for job: 20141204172125662596
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172130720684
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172130720684', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 28875
[INFO    ] Returning information for job: 20141204172130720684
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172135811672
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172135811672', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 28931
[INFO    ] Returning information for job: 20141204172135811672
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172140927247
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172140927247', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 28993
[INFO    ] Returning information for job: 20141204172140927247
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172146022139
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172146022139', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29047
[INFO    ] Returning information for job: 20141204172146022139
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172151074278
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172151074278', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29113
[INFO    ] Returning information for job: 20141204172151074278
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172156130835
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172156130835', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29180
[INFO    ] Returning information for job: 20141204172156130835
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172201181574
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172201181574', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29236
[INFO    ] Returning information for job: 20141204172201181574
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172206307251
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172206307251', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29278
[INFO    ] Returning information for job: 20141204172206307251
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172211375105
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172211375105', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29342
[INFO    ] Returning information for job: 20141204172211375105
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172216427101
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172216427101', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29401
[INFO    ] Returning information for job: 20141204172216427101
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172221515162
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172221515162', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29480
[INFO    ] Returning information for job: 20141204172221515162
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172226599135
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172226599135', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29536
[INFO    ] Returning information for job: 20141204172226599135
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172231687247
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172231687247', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29604
[INFO    ] Returning information for job: 20141204172231687247
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172236767218
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172236767218', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29660
[INFO    ] Returning information for job: 20141204172236767218
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172241851246
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172241851246', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29722
[INFO    ] Returning information for job: 20141204172241851246
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172246914514
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172246914514', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29777
[INFO    ] Returning information for job: 20141204172246914514
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172251982140
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172251982140', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29847
[INFO    ] Returning information for job: 20141204172251982140
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172257034026
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172257034026', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29908
[INFO    ] Returning information for job: 20141204172257034026
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172302106339
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172302106339', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 29961
[INFO    ] Returning information for job: 20141204172302106339
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172307154171
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172307154171', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 30009
[INFO    ] Returning information for job: 20141204172307154171
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172312256352
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172312256352', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 30067
[INFO    ] Returning information for job: 20141204172312256352
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172317313808
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172317313808', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 30120
[INFO    ] Returning information for job: 20141204172317313808
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172322375032
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172322375032', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 30177
[INFO    ] Returning information for job: 20141204172322375032
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172327524371
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172327524371', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 30243
[INFO    ] Returning information for job: 20141204172327524371
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[INFO    ] User sudo_alfredo Executing command saltutil.find_job with jid 20141204172332619571
[DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20141204172332619571', 'tgt': 'salt-master*', 'ret': '', 'user': 'sudo_alfredo', 'arg': ['20141204172115614337'], 'fun': 'saltutil.find_job'}
[INFO    ] Starting a new job with PID 30309
[INFO    ] Returning information for job: 20141204172332619571
[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem

salt version: salt-minion 2014.7.0 (Helium)

rallytime commented 9 years ago

Hrm interesting. @masterkorp - can you give us some more information about your setup? What OS are your running your master and minions on? Can you give the output of salt --versions-report for you master? Is this a new setup, or did you upgrade? If you upgraded, do you remember which version you were running before the upgrade?

Also, is this only a problem state.sls/state.highstate runs or are other commands affected as well?

alfredopalhares commented 9 years ago

Hello @rallytime,

All machines are running Ubuntu 14.04

# salt --versions-report
           Salt: 2014.7.0
         Python: 2.7.6 (default, Mar 22 2014, 22:59:56)
         Jinja2: 2.7.2
       M2Crypto: 0.21.1
 msgpack-python: 0.3.0
   msgpack-pure: Not Installed
       pycrypto: 2.6.1
        libnacl: Not Installed
         PyYAML: 3.10
          ioflo: Not Installed
          PyZMQ: 14.0.1
           RAET: Not Installed
            ZMQ: 4.0.4
           Mako: 0.9.1

I did not upgrade. And only state.sls seems to be broken. grains.get and so on seem to be fine. Is there any more information I can provide ?

rallytime commented 9 years ago

Hrm. Ok - can you post an example state that is failing?

alfredopalhares commented 9 years ago

Any state fails. The following fails.

testing-output:
  cmd:
    - run
    - name: echo "Hello"

Further on, I've been analyzing some system calls There are a lot of clone, here is a small example:

1208052 15:48:19.745828639 0 python (643) < clone res=1877(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=68237 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1208055 15:48:19.746338236 0 python (1877) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1877(python) pid=1877(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1208083 15:48:19.748727700 0 python (1877) > clone 
1208097 15:48:19.766017096 0 python (1877) < clone res=1878(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1877(python) pid=1877(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=447 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1211817 15:48:24.794713887 0 python (643) > clone 
1211823 15:48:24.804249026 0 python (1884) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1884(python) pid=1884(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1211857 15:48:24.814213943 0 python (643) < clone res=1884(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=68722 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1211864 15:48:24.815239607 0 python (1884) > clone 
1211870 15:48:24.831880739 0 python (1884) < clone res=1885(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1884(python) pid=1884(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=446 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1221660 15:48:29.869514792 0 python (643) > clone 
1221670 15:48:29.887130749 0 python (643) < clone res=1897(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=69208 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1221673 15:48:29.887615017 0 python (1897) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1897(python) pid=1897(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1221701 15:48:29.889926818 0 python (1897) > clone 
1221710 15:48:29.907209599 0 python (1897) < clone res=1898(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1897(python) pid=1897(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=447 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1221713 15:48:29.908972570 0 python (1898) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1898(python) pid=1898(python) ptid=1897(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34940 vm_swap=0 flags=0 uid=0 gid=0 
1221722 15:48:29.909974098 0 python (1898) > clone 
1225726 15:48:34.905344696 0 python (643) > clone 
1225814 15:48:34.914869495 0 python (643) < clone res=1904(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=69695 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1225817 15:48:34.915359133 0 python (1904) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1904(python) pid=1904(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1225849 15:48:34.917746478 0 python (1904) > clone 
1225860 15:48:34.942587326 0 python (1904) < clone res=1905(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1904(python) pid=1904(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=447 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1237110 15:48:39.946976805 0 python (643) > clone 
1237119 15:48:39.956307843 0 python (643) < clone res=1919(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=70181 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1237122 15:48:39.956773068 0 python (1919) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1919(python) pid=1919(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1237150 15:48:39.959268256 0 python (1919) > clone 
1237159 15:48:39.975611127 0 python (1919) < clone res=1920(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1919(python) pid=1919(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=446 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1237162 15:48:39.977052534 0 python (1920) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1920(python) pid=1920(python) ptid=1919(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34940 vm_swap=0 flags=0 uid=0 gid=0 
1241155 15:48:44.991001954 0 python (643) > clone 
1241245 15:48:45.008729979 0 python (643) < clone res=1926(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=70665 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1241248 15:48:45.009236501 0 python (1926) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1926(python) pid=1926(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1241276 15:48:45.011562266 0 python (1926) > clone 
1241303 15:48:45.029257884 0 python (1926) < clone res=1927(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1926(python) pid=1926(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=447 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1251465 15:48:50.048728451 0 python (643) > clone 
1251468 15:48:50.058073416 0 python (643) < clone res=1939(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=71151 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1251471 15:48:50.058525109 0 python (1939) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1939(python) pid=1939(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1251508 15:48:50.073255617 0 python (1939) > clone 
1251512 15:48:50.081472857 0 python (1939) < clone res=1940(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1939(python) pid=1939(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=447 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1255444 15:48:55.099475861 0 python (643) > clone 
1255546 15:48:55.109276133 0 python (643) < clone res=1946(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=71634 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1255549 15:48:55.109776958 0 python (1946) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1946(python) pid=1946(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1255577 15:48:55.124323446 0 python (1946) > clone 
1255585 15:48:55.133768516 0 python (1946) < clone res=1947(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1946(python) pid=1946(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=446 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1255590 15:48:55.147593352 0 python (1947) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1947(python) pid=1947(python) ptid=1946(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34940 vm_swap=0 flags=0 uid=0 gid=0 
1255661 15:48:55.148973077 0 python (1947) > clone 
1269613 15:49:00.151300624 0 python (643) > clone 
1269678 15:49:00.160867668 0 python (643) < clone res=1966(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=72120 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1269684 15:49:00.161376184 0 python (1966) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1966(python) pid=1966(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1269717 15:49:00.172002706 0 python (1966) > clone 
1269808 15:49:00.190290833 0 python (1966) < clone res=1967(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1966(python) pid=1966(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=447 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1273897 15:49:05.205366652 0 python (643) > clone 
1273900 15:49:05.226932050 0 python (643) < clone res=1973(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=72606 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1273903 15:49:05.227432885 0 python (1973) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1973(python) pid=1973(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1273935 15:49:05.229832167 0 python (1973) > clone 
1273950 15:49:05.251389130 0 python (1973) < clone res=1974(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1973(python) pid=1973(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=446 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1273953 15:49:05.252978677 0 python (1974) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1974(python) pid=1974(python) ptid=1973(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34940 vm_swap=0 flags=0 uid=0 gid=0 
1273962 15:49:05.254077228 0 python (1974) > clone 
1289740 15:49:10.293190979 0 python (643) > clone 
1289741 15:49:10.310751656 0 python (643) < clone res=1999(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=73091 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1289744 15:49:10.311222121 0 python (1999) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1999(python) pid=1999(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1289776 15:49:10.313613559 0 python (1999) > clone 
1289782 15:49:10.334824568 0 python (1999) < clone res=2000(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1999(python) pid=1999(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=447 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1289785 15:49:10.336966717 0 python (2000) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2000(python) pid=2000(python) ptid=1999(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34940 vm_swap=0 flags=0 uid=0 gid=0 
1289794 15:49:10.337921246 0 python (2000) > clone 
1293814 15:49:15.331547536 0 python (643) > clone 
1293964 15:49:15.345741005 0 python (643) < clone res=2006(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=73575 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1293967 15:49:15.346217333 0 python (2006) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2006(python) pid=2006(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1293997 15:49:15.348755764 0 python (2006) > clone 
1294010 15:49:15.370292051 0 python (2006) < clone res=2007(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2006(python) pid=2006(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=447 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1304003 15:49:20.379624072 0 python (643) > clone 
1304004 15:49:20.397498959 0 python (643) < clone res=2019(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=74058 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1304007 15:49:20.398037267 0 python (2019) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2019(python) pid=2019(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1304035 15:49:20.400464372 0 python (2019) > clone 
1304044 15:49:20.426620247 0 python (2019) < clone res=2020(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2019(python) pid=2019(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=448 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1304047 15:49:20.428977334 0 python (2020) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2020(python) pid=2020(python) ptid=2019(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34940 vm_swap=0 flags=0 uid=0 gid=0 
1304056 15:49:20.429980837 0 python (2020) > clone 
1308187 15:49:25.463616681 0 python (643) > clone 
1308190 15:49:25.489279298 0 python (643) < clone res=2026(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=74544 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1308193 15:49:25.489771482 0 python (2026) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2026(python) pid=2026(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1308221 15:49:25.492092550 0 python (2026) > clone 
1308232 15:49:25.509216823 0 python (2026) < clone res=2027(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2026(python) pid=2026(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=446 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1318572 15:49:30.532766923 0 python (643) > clone 
1318575 15:49:30.542291708 0 python (643) < clone res=2039(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=75030 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1318578 15:49:30.542841195 0 python (2039) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2039(python) pid=2039(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1318612 15:49:30.553300925 0 python (2039) > clone 
1318618 15:49:30.569874997 0 python (2039) < clone res=2040(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2039(python) pid=2039(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=447 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1322359 15:49:35.592167083 0 python (643) > clone 
1322446 15:49:35.601924963 0 python (643) < clone res=2046(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=75516 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1322448 15:49:35.610353385 0 python (2046) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2046(python) pid=2046(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1322478 15:49:35.612919812 0 python (2046) > clone 
1322489 15:49:35.630182796 0 python (2046) < clone res=2047(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2046(python) pid=2046(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=447 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1335482 15:49:40.656789662 0 python (643) > clone 
1335487 15:49:40.666463883 0 python (643) < clone res=2064(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=643(python) pid=643(python) ptid=30880(bash) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=76000 vm_size=509928 vm_rss=41848 vm_swap=0 flags=0 uid=0 gid=0 
1335490 15:49:40.666953525 0 python (2064) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2064(python) pid=2064(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1335522 15:49:40.669339757 0 python (2064) > clone 
1335528 15:49:40.691081224 0 python (2064) < clone res=2065(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2064(python) pid=2064(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=446 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1335531 15:49:40.692970715 0 python (2065) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2065(python) pid=2065(python) ptid=2064(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34940 vm_swap=0 flags=0 uid=0 gid=0 
1335540 15:49:40.694016125 0 python (2065) > clone 

And they go on ana on, most of them just do nothing (as far as I can understand).

1335490 15:49:40.666953525 0 python (2064) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2064(python) pid=2064(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
1335491 15:49:40.666991622 0 python (2064) > set_robust_list 
1335492 15:49:40.666992230 0 python (2064) < set_robust_list 
1335493 15:49:40.668242926 0 python (2064) > stat 
1335494 15:49:40.668249218 0 python (2064) < stat res=-2(ENOENT) path=/usr/lib/python2.7/multiprocessing/random 
1335495 15:49:40.668281311 0 python (2064) > open 
1335496 15:49:40.668293459 0 python (2064) > switch next=8 pgft_maj=0 pgft_min=280 vm_size=509928 vm_rss=35428 vm_swap=0 
1335499 15:49:40.668307417 0 python (2064) < open fd=-2(ENOENT) name=/usr/lib/python2.7/multiprocessing/random.x86_64-linux-gnu.so flags=1(O_RDONLY) mode=0 
1335500 15:49:40.668311116 0 python (2064) > open 
1335501 15:49:40.668314842 0 python (2064) < open fd=-2(ENOENT) name=/usr/lib/python2.7/multiprocessing/random.so flags=1(O_RDONLY) mode=0 
1335502 15:49:40.668316736 0 python (2064) > open 
1335503 15:49:40.668319344 0 python (2064) < open fd=-2(ENOENT) name=/usr/lib/python2.7/multiprocessing/randommodule.so flags=1(O_RDONLY) mode=0 
1335504 15:49:40.668324778 0 python (2064) > open 
1335505 15:49:40.668327547 0 python (2064) < open fd=-2(ENOENT) name=/usr/lib/python2.7/multiprocessing/random.py flags=1(O_RDONLY) mode=0 
1335506 15:49:40.668329281 0 python (2064) > open 
1335507 15:49:40.668331893 0 python (2064) < open fd=-2(ENOENT) name=/usr/lib/python2.7/multiprocessing/random.pyc flags=1(O_RDONLY) mode=0 
1335508 15:49:40.668525839 0 python (2064) > open 
1335509 15:49:40.668535828 0 python (2064) < open fd=21(<f>/dev/urandom) name=/dev/urandom flags=1(O_RDONLY) mode=0 
1335510 15:49:40.668537698 0 python (2064) > read fd=21(<f>/dev/urandom) size=32 
1335511 15:49:40.668546195 0 python (2064) < read res=32 data=H.%...q...G.V.........\\<..c.l%.. 
1335512 15:49:40.668548000 0 python (2064) > close fd=21(<f>/dev/urandom) 
1335513 15:49:40.668549579 0 python (2064) < close res=0 
1335514 15:49:40.668888740 0 python (2064) > open 
1335515 15:49:40.668894315 0 python (2064) < open fd=21(<f>/dev/null) name=/dev/null flags=1(O_RDONLY) mode=0 
1335516 15:49:40.668907186 0 python (2064) > fstat fd=21(<f>/dev/null) 
1335517 15:49:40.668908837 0 python (2064) < fstat res=0 
1335518 15:49:40.668939619 0 python (2064) > switch next=7 pgft_maj=0 pgft_min=382 vm_size=509928 vm_rss=35604 vm_swap=0 
1335522 15:49:40.669339757 0 python (2064) > clone 
1335523 15:49:40.672958482 0 python (2064) > switch next=7 pgft_maj=0 pgft_min=446 vm_size=509928 vm_rss=35664 vm_swap=0 
1335526 15:49:40.673144647 0 python (2064) > switch next=595(sysdig) pgft_maj=0 pgft_min=446 vm_size=509928 vm_rss=35664 vm_swap=0 
1335528 15:49:40.691081224 0 python (2064) < clone res=2065(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2064(python) pid=2064(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=446 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
1335529 15:49:40.692092729 0 python (2064) > exit_group 
1335530 15:49:40.692959269 0 python (2064) > switch next=2065(python) pgft_maj=0 pgft_min=609 vm_size=0 vm_rss=0 vm_swap=0 
1335531 15:49:40.692970715 0 python (2065) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=2065(python) pid=2065(python) ptid=2064(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34940 vm_swap=0 flags=0 uid=0 gid=0 
1335532 15:49:40.693003316 0 python (2065) > set_robust_list 
1335533 15:49:40.693004043 0 python (2065) < set_robust_list 
1335534 15:49:40.693941854 0 python (2065) > chdir 
1335535 15:49:40.693947926 0 python (2065) < chdir res=0 path=/ 
1335536 15:49:40.693982061 0 python (2065) > setsid 
1335537 15:49:40.693992726 0 python (2065) < setsid 
1335538 15:49:40.694006508 0 python (2065) > umask 
1335539 15:49:40.694007079 0 python (2065) < umask 
1335540 15:49:40.694016125 0 python (2065) > clone 
1335541 15:49:40.705867360 0 python (2065) > switch next=595(sysdig) pgft_maj=0 pgft_min=261 vm_size=509928 vm_rss=35324 vm_swap=0 
1335543 15:49:40.706868786 0 python (2064) > procexit 

Any ideas on how to proceed further on ?

alfredopalhares commented 9 years ago

More on the threads, they only seem to be doing this over and over.

$ sysdig -r salt_hangup.scap "proc.apid=15008 and thread.tid=993" 
615892 15:42:14.697883948 0 python (993) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=993(python) pid=993(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34932 vm_swap=0 flags=0 uid=0 gid=0 
615893 15:42:14.697921748 0 python (993) > set_robust_list 
615894 15:42:14.697922395 0 python (993) < set_robust_list 
615895 15:42:14.699213285 0 python (993) > stat 
615896 15:42:14.699220966 0 python (993) < stat res=-2(ENOENT) path=/usr/lib/python2.7/multiprocessing/random 
615897 15:42:14.699253375 0 python (993) > open 
615898 15:42:14.699266324 0 python (993) > switch next=8 pgft_maj=0 pgft_min=281 vm_size=509928 vm_rss=35420 vm_swap=0 
615901 15:42:14.699280368 0 python (993) < open fd=-2(ENOENT) name=/usr/lib/python2.7/multiprocessing/random.x86_64-linux-gnu.so flags=1(O_RDONLY) mode=0 
615902 15:42:14.699284217 0 python (993) > open 
615903 15:42:14.699288004 0 python (993) < open fd=-2(ENOENT) name=/usr/lib/python2.7/multiprocessing/random.so flags=1(O_RDONLY) mode=0 
615904 15:42:14.699289782 0 python (993) > open 
615905 15:42:14.699292394 0 python (993) < open fd=-2(ENOENT) name=/usr/lib/python2.7/multiprocessing/randommodule.so flags=1(O_RDONLY) mode=0 
615906 15:42:14.699297871 0 python (993) > open 
615907 15:42:14.699300725 0 python (993) < open fd=-2(ENOENT) name=/usr/lib/python2.7/multiprocessing/random.py flags=1(O_RDONLY) mode=0 
615908 15:42:14.699302402 0 python (993) > open 
615909 15:42:14.699305053 0 python (993) < open fd=-2(ENOENT) name=/usr/lib/python2.7/multiprocessing/random.pyc flags=1(O_RDONLY) mode=0 
615910 15:42:14.699462722 0 python (993) > open 
615911 15:42:14.699472828 0 python (993) < open fd=21(<f>/dev/urandom) name=/dev/urandom flags=1(O_RDONLY) mode=0 
615912 15:42:14.699474726 0 python (993) > read fd=21(<f>/dev/urandom) size=32 
615913 15:42:14.699483843 0 python (993) < read res=32 data=.Q.P.y...?.. ....nH..Q..e......G 
615914 15:42:14.699485652 0 python (993) > close fd=21(<f>/dev/urandom) 
615915 15:42:14.699487414 0 python (993) < close res=0 
615916 15:42:14.699864057 0 python (993) > open 
615917 15:42:14.699869758 0 python (993) < open fd=21(<f>/dev/null) name=/dev/null flags=1(O_RDONLY) mode=0 
615918 15:42:14.699882785 0 python (993) > fstat fd=21(<f>/dev/null) 
615919 15:42:14.699884411 0 python (993) < fstat res=0 
615920 15:42:14.700302110 0 python (993) > clone 
615921 15:42:14.700591387 0 python (993) > switch next=595(sysdig) pgft_maj=0 pgft_min=445 vm_size=509928 vm_rss=35664 vm_swap=0 
615923 15:42:14.700958673 0 python (993) > switch next=7 pgft_maj=0 pgft_min=445 vm_size=509928 vm_rss=35664 vm_swap=0 
615927 15:42:14.711010728 0 python (993) > switch next=7 pgft_maj=0 pgft_min=445 vm_size=509928 vm_rss=35664 vm_swap=0 
615930 15:42:14.718108714 0 python (993) < clone res=994(python) exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=993(python) pid=993(python) ptid=643(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=445 vm_size=509928 vm_rss=35664 vm_swap=0 flags=0 uid=0 gid=0 
615931 15:42:14.727179536 0 python (993) > switch next=32530(nginx) pgft_maj=0 pgft_min=587 vm_size=509928 vm_rss=35744 vm_swap=0 
615950 15:42:14.743487146 0 python (993) > exit_group 
615951 15:42:14.745500217 0 python (993) > procexit 
sysdig -r salt_hangup.scap "proc.apid=15008 and thread.tid=1232"
777101 15:43:56.675675547 0 python (1232) < clone res=0 exe=/usr/bin/python args=/usr/bin/salt-minion.-l.debug. tid=1232(python) pid=1232(python) ptid=1231(python) cwd=/root fdlimit=1024 pgft_maj=0 pgft_min=1 vm_size=509928 vm_rss=34936 vm_swap=0 flags=0 uid=0 gid=0 
777102 15:43:56.675702984 0 python (1232) > set_robust_list 
777103 15:43:56.675703490 0 python (1232) < set_robust_list 
777104 15:43:56.676429135 0 python (1232) > chdir 
777105 15:43:56.676432901 0 python (1232) < chdir res=0 path=/ 
777106 15:43:56.676465235 0 python (1232) > setsid 
777107 15:43:56.676473703 0 python (1232) < setsid 
777108 15:43:56.676487034 0 python (1232) > umask 
777109 15:43:56.676487493 0 python (1232) < umask 
777110 15:43:56.676495687 0 python (1232) > clone 
777111 15:43:56.678009669 0 python (1232) > switch next=595(sysdig) pgft_maj=0 pgft_min=262 vm_size=509928 vm_rss=35320 vm_swap=0 
777113 15:43:56.684535258 0 python (1232) > switch next=1231(python) pgft_maj=0 pgft_min=262 vm_size=509928 vm_rss=35320 vm_swap=0
rallytime commented 9 years ago

Interesting. Something is definitely amiss here. How did you install salt?

alfredopalhares commented 9 years ago

@rallytime Using salt-bootstrap which use the ubuntu repositories to install salt.

basepi commented 9 years ago

So I can explain all of the repeated returns and cloned processes easily enough.

Basically, when a salt master creates a job, it publishes it to all minions. Each minion decides if it is supposed to run the job based on the target data. If it is supposed to run the job, it forks a new minion process to actually do the job. If it's a long running job and the minion doesn't return before the timeout (which is 10 by default if I remember correctly), the master pubs a new job, saltutil.find_job, to ask the minions if they're still running the job. The minion will fork a new process to handle this job, just like any other job. The minion reports that it's still running the job, and then the master goes back to waiting for timeout seconds again, then does another saltutil.find_job.

So that's where all the repeated returns and forks are coming from, the master just continuing to check if the job is still running using saltutil.find_job.

It looks like the initial job may be hanging trying to fetch a file from the master:

[DEBUG   ] Fetching file from saltenv 'base', ** attempting ** 'salt://top.sls'

Can you show us your master config so we know if you're doing anything interesting with your file_roots setting or fileserver_backends setting?

basepi commented 9 years ago

Also, if you run the job with a timeout of, say, 1000, that will get rid of all the forks and returns in the logs/strace. It will only do a saltutil.find_job every 17 minutes then.

alfredopalhares commented 9 years ago

Ok here it goes the salt master config

# This file managed by Salt, do not edit by hand!!
#  Based on salt version 0.17.4 default config
##### Primary configuration settings #####
##########################################
# This configuration file is used to manage the behavior of the Salt Master
# Values that are commented out but have no space after the comment are
# defaults that need not be set in the config. If there is a space after the
# comment that the value is presented as an example and is not the default.

# Per default, the master will automatically include all config files
# from master.d/*.conf (master.d is a directory in the same directory
# as the main master config file)
#default_include: master.d/*.conf

# The address of the interface to bind to
#interface: 0.0.0.0

# Whether the master should listen for IPv6 connections. If this is set to True,
# the interface option must be adjusted too (for example: "interface: '::'")
#ipv6: False

# The tcp port used by the publisher
#publish_port: 4505

# The user to run the salt-master as. Salt will update all permissions to
# allow the specified user to run the master. If the modified files cause
# conflicts set verify_env to False.
#user: root

# Max open files
# Each minion connecting to the master uses AT LEAST one file descriptor, the
# master subscription connection. If enough minions connect you might start
# seeing on the console(and then salt-master crashes):
#   Too many open files (tcp_listener.cpp:335)
#   Aborted (core dumped)
#
# By default this value will be the one of `ulimit -Hn`, ie, the hard limit for
# max open files.
#
# If you wish to set a different value than the default one, uncomment and
# configure this setting. Remember that this value CANNOT be higher than the
# hard limit. Raising the hard limit depends on your OS and/or distribution,
# a good way to find the limit is to search the internet for(for example):
#   raise max open files hard limit debian
#
#max_open_files: 100000

# The number of worker threads to start, these threads are used to manage
# return calls made from minions to the master, if the master seems to be
# running slowly, increase the number of threads
#worker_threads: 5

# The port used by the communication interface. The ret (return) port is the
# interface used for the file server, authentication, job returnes, etc.
#ret_port: 4506

# Specify the location of the daemon process ID file
#pidfile: /var/run/salt-master.pid

# The root directory prepended to these options: pki_dir, cachedir,
# sock_dir, log_file, autosign_file, autoreject_file, extension_modules,
# key_logfile, pidfile.
#root_dir: /srv/saltstack

# Directory used to store public key data
#pki_dir: /etc/salt/pki/master

# Directory to store job and cache data
#cachedir: /var/cache/salt/master

# Verify and set permissions on configuration directories at startup
#verify_env: True

# Set the number of hours to keep old job information in the job cache
#keep_jobs: 24

# Set the default timeout for the salt command and api, the default is 5
# seconds
#timeout: 5

# The loop_interval option controls the seconds for the master's maintenance
# process check cycle. This process updates file server backends, cleans the
# job cache and executes the scheduler.
#loop_interval: 60

# Set the default outputter used by the salt command. The default is "nested"
#output: nested

# By default output is colored, to disable colored output set the color value
# to False
#color: True

# Set the directory used to hold unix sockets
#sock_dir: /var/run/salt/master

# The master can take a while to start up when lspci and/or dmidecode is used
# to populate the grains for the master. Enable if you want to see GPU hardware
# data for your master.
#
#enable_gpu_grains: False

# The master maintains a job cache, while this is a great addition it can be
# a burden on the master for larger deployments (over 5000 minions).
# Disabling the job cache will make previously executed jobs unavailable to
# the jobs system and is not generally recommended.
#
#job_cache: True

# Cache minion grains and pillar data in the cachedir.
#minion_data_cache: True

# The master can include configuration from other files. To enable this,
# pass a list of paths to this option. The paths can be either relative or
# absolute; if relative, they are considered to be relative to the directory
# the main master configuration file lives in (this file). Paths can make use
# of shell-style globbing. If no files are matched by a path passed to this
# option then the master will log a warning message.
#
#
# Include a config file from some other path:
#include: /etc/salt/extra_config
#
# Include config from several files and directories:
#include:
#  - /etc/salt/extra_config
#include: []

#####        Security settings       #####
##########################################
# Enable "open mode", this mode still maintains encryption, but turns off
# authentication, this is only intended for highly secure environments or for
# the situation where your keys end up in a bad state. If you run in open mode
# you do so at your own risk!
#open_mode: False

# Enable auto_accept, this setting will automatically accept all incoming
# public keys from the minions. Note that this is insecure.
#auto_accept: False

# If the autosign_file is specified, incoming keys specified in the
# autosign_file will be automatically accepted. This is insecure.  Regular
# expressions as well as globing lines are supported.
#autosign_file: /etc/salt/autosign.conf

# Works like autosign_file, but instead allows you to specify minion IDs for
# which keys will automatically be rejected. Will override both membership in
# the autosign_file and the auto_accept setting.
#autoreject_file: /etc/salt/autosign.conf

# Enable permissive access to the salt keys.  This allows you to run the
# master or minion as root, but have a non-root group be given access to
# your pki_dir.  To make the access explicit, root must belong to the group
# you've given access to.  This is potentially quite insecure.
# If an autosign_file is specified, enabling permissive_pki_access will allow group access
# to that specific file.
#permissive_pki_access: False

# Allow users on the master access to execute specific commands on minions.
# This setting should be treated with care since it opens up execution
# capabilities to non root users. By default this capability is completely
# disabled.
#
#client_acl:
#  larry:
#    - test.ping
#    - network.*
#
#client_acl: {}

# Blacklist any of the following users or modules
#
# This example would blacklist all non sudo users, including root from
# running any commands. It would also blacklist any use of the "cmd"
# module.
# This is completely disabled by default.
#
#client_acl_blacklist:
#  users:
#    - root
#    - '^(?!sudo_).*$'   #  all non sudo users
#  modules:
#    - cmd

# The external auth system uses the Salt auth modules to authenticate and
# validate users to access areas of the Salt system.
#
#external_auth:
#  pam:
#    fred:
#      - test.*
#
#external_auth: {}

# Time (in seconds) for a newly generated token to live. Default: 12 hours
#token_expire: 43200

# Allow minions to push files to the master. This is disabled by default, for
# security purposes.
#file_recv: False

# Set a hard-limit on the size of the files that can be pushed to the master.
# It will be interpreted as megabytes.
# Default: 100
#file_recv_max_size: 100

# Signature verification on messages published from the master.
# This causes the master to cryptographically sign all messages published to its event
# bus, and minions then verify that signature before acting on the message.
#
# This is False by default.
#
# Note that to facilitate interoperability with masters and minions that are different
# versions, if sign_pub_messages is True but a message is received by a minion with
# no signature, it will still be accepted, and a warning message will be logged.
# Conversely, if sign_pub_messages is False, but a minion receives a signed
# message it will be accepted, the signature will not be checked, and a warning message
# will be logged.  This behavior will go away in Salt 0.17.6 (or Hydrogen RC1, whichever
# comes first) and these two situations will cause minion to throw an exception and
# drop the message.
#
#sign_pub_message: False

#####    Master Module Management    #####
##########################################
# Manage how master side modules are loaded

# Add any additional locations to look for master runners
#runner_dirs: []

# Enable Cython for master side modules
#cython_enable: False

#####      State System settings     #####
##########################################
# The state system uses a "top" file to tell the minions what environment to
# use and what modules to use. The state_top file is defined relative to the
# root of the base environment as defined in "File Server settings" below.
#state_top: top.sls

# The master_tops option replaces the external_nodes option by creating
# a plugable system for the generation of external top data. The external_nodes
# option is deprecated by the master_tops option.
# To gain the capabilities of the classic external_nodes system, use the
# following configuration:
# master_tops:
#   ext_nodes: <Shell command which returns yaml>
#
#master_tops: {}

# The external_nodes option allows Salt to gather data that would normally be
# placed in a top file. The external_nodes option is the executable that will
# return the ENC data. Remember that Salt will look for external nodes AND top
# files and combine the results if both are enabled!
#external_nodes: None

# The renderer to use on the minions to render the state data
#renderer: yaml_jinja

# The Jinja renderer can strip extra carriage returns and whitespace
# See http://jinja.pocoo.org/docs/api/#high-level-api
#
# If this is set to True the first newline after a Jinja block is removed
# (block, not variable tag!). Defaults to False, corresponds to the Jinja
# environment init variable "trim_blocks".
#jinja_trim_blocks: False
#
# If this is set to True leading spaces and tabs are stripped from the start
# of a line to a block. Defaults to False, corresponds to the Jinja
# environment init variable "lstrip_blocks".
#jinja_lstrip_blocks: False

# The failhard option tells the minions to stop immediately after the first
# failure detected in the state execution, defaults to False
#failhard: False

# The state_verbose and state_output settings can be used to change the way
# state system data is printed to the display. By default all data is printed.
# The state_verbose setting can be set to True or False, when set to False
# all data that has a result of True and no changes will be suppressed.
#state_verbose: True

# The state_output setting changes if the output is the full multi line
# output for each changed state if set to 'full', but if set to 'terse'
# the output will be shortened to a single line.  If set to 'mixed', the output
# will be terse unless a state failed, in which case that output will be full.
#state_output: full

#####      File Server settings      #####
##########################################
# Salt runs a lightweight file server written in zeromq to deliver files to
# minions. This file server is built into the master daemon and does not
# require a dedicated port.

# The file server works on environments passed to the master, each environment
# can have multiple root directories, the subdirectories in the multiple file
# roots cannot match, otherwise the downloaded files will not be able to be
# reliably ensured. A base environment is required to house the top file.
# Example:
# file_roots:
#   base:
#     - /srv/salt/
#   dev:
#     - /srv/salt/dev/services
#     - /srv/salt/dev/states
#   prod:
#     - /srv/salt/prod/services
#     - /srv/salt/prod/states
file_roots:
  base:
    - /srv/saltstack/salt

# The hash_type is the hash to use when discovering the hash of a file on
# the master server. The default is md5, but sha1, sha224, sha256, sha384
# and sha512 are also supported.
#hash_type: md5

# The buffer size in the file server can be adjusted here:
#file_buffer_size: 1048576

# A regular expression (or a list of expressions) that will be matched
# against the file path before syncing the modules and states to the minions.
# This includes files affected by the file.recurse state.
# For example, if you manage your custom modules and states in subversion
# and don't want all the '.svn' folders and content synced to your minions,
# you could set this to '/\.svn($|/)'. By default nothing is ignored.
#file_ignore_regex:
#  - '/\.svn($|/)'
#  - '/\.git($|/)'

# A file glob (or list of file globs) that will be matched against the file
# path before syncing the modules and states to the minions. This is similar
# to file_ignore_regex above, but works on globs instead of regex. By default
# nothing is ignored.
# file_ignore_glob:
#  - '*.pyc'
#  - '*/somefolder/*.bak'
#  - '*.swp'

# File Server Backend
# Salt supports a modular fileserver backend system, this system allows
# the salt master to link directly to third party systems to gather and
# manage the files available to minions. Multiple backends can be
# configured and will be searched for the requested file in the order in which
# they are defined here. The default setting only enables the standard backend
# "roots" which uses the "file_roots" option.
#
#fileserver_backend:
#  - roots
#
# To use multiple backends list them in the order they are searched:
#
#fileserver_backend:
#  - git
#  - roots
fileserver_backend:
  - git

  - roots

#
# Uncomment the line below if you do not want the file_server to follow
# symlinks when walking the filesystem tree. This is set to True
# by default. Currently this only applies to the default roots
# fileserver_backend.
#
#fileserver_followsymlinks: False
#
# Uncomment the line below if you do not want symlinks to be
# treated as the files they are pointing to. By default this is set to
# False. By uncommenting the line below, any detected symlink while listing
# files on the Master will not be returned to the Minion.
#
#fileserver_ignoresymlinks: True

#
# By default, the Salt fileserver recurses fully into all defined environments
# to attempt to find files. To limit this behavior so that the fileserver only
# traverses directories with SLS files and special Salt directories like _modules,
# enable the option below. This might be useful for installations where a file root
# has a very large number of files and performance is impacted. Default is False.
#
#fileserver_limit_traversal: False
#
# The fileserver can fire events off every time the fileserver is updated,
# these are disabled by default, but can be easily turned on by setting this
# flag to True
#fileserver_events: False
#
# Git fileserver backend configuration
#
# Gitfs can be provided by one of two python modules: GitPython or pygit2. If
# using pygit2, both libgit2 and git must also be installed.
#gitfs_provider: gitpython
#
# When using the git fileserver backend at least one git remote needs to be
# defined. The user running the salt master will need read access to the repo.
#
# The repos will be searched in order to find the file requested by a client
# and the first repo to have the file will return it.
# When using the git backend branches and tags are translated into salt
# environments.
# Note:  file:// repos will be treated as a remote, so refs you want used must
# exist in that repo as *local* refs.
gitfs_remotes:
  - git+ssh://git@github.com/users-formula.git
  - git://github.com/maybe-formula.git
  - git+ssh://git@github.com/what-formula.git
  - git+ssh://git@github.com/yes-formula.git
  - git+ssh://git@github.com/not-formula.git
  - git+ssh://git@github.com/rabbitmq-formula.git
  - git+ssh://git@github.com/sensu-formula.git
  - git://github.com/influxdb-formula.git
#
#gitfs_remotes:
#  - git://github.com/saltstack/salt-states.git
#  - file:///var/git/saltmaster
#
# The gitfs_ssl_verify option specifies whether to ignore ssl certificate
# errors when contacting the gitfs backend. You might want to set this to
# false if you're using a git backend that uses a self-signed certificate but
# keep in mind that setting this flag to anything other than the default of True
# is a security concern, you may want to try using the ssh transport.
#gitfs_ssl_verify: True
#
#
# The gitfs_root option gives the ability to serve files from a subdirectory
# within the repository. The path is defined relative to the root of the
# repository and defaults to the repository root.
#gitfs_root: somefolder/otherfolder

#####         Pillar settings        #####
##########################################
# Salt Pillars allow for the building of global data that can be made selectively
# available to different minions based on minion grain filtering. The Salt
# Pillar is laid out in the same fashion as the file server, with environments,
# a top file and sls files. However, pillar data does not need to be in the
# highstate format, and is generally just key/value pairs.
pillar_roots:
  base:
    - /srv/saltstack/pillar

#ext_pillar:
#  - hiera: /etc/hiera.yaml
#  - cmd_yaml: cat /etc/salt/yaml

# The pillar_gitfs_ssl_verify option specifies whether to ignore ssl certificate
# errors when contacting the pillar gitfs backend. You might want to set this to
# false if you're using a git backend that uses a self-signed certificate but
# keep in mind that setting this flag to anything other than the default of True
# is a security concern, you may want to try using the ssh transport.
#pillar_gitfs_ssl_verify: True

# The pillar_opts option adds the master configuration file data to a dict in
# the pillar called "master". This is used to set simple configurations in the
# master config file that can then be used on minions.
#pillar_opts: True

#####          Syndic settings       #####
##########################################
# The Salt syndic is used to pass commands through a master from a higher
# master. Using the syndic is simple, if this is a master that will have
# syndic servers(s) below it set the "order_masters" setting to True, if this
# is a master that will be running a syndic daemon for passthrough the
# "syndic_master" setting needs to be set to the location of the master server
# to receive commands from.

# Set the order_masters setting to True if this master will command lower
# masters' syndic interfaces.
#order_masters: False

# If this master will be running a salt syndic daemon, syndic_master tells
# this master where to receive commands from.
#syndic_master: masterofmaster

# This is the 'ret_port' of the MasterOfMaster
#syndic_master_port: 4506

# PID file of the syndic daemon
#syndic_pidfile: /var/run/salt-syndic.pid

# LOG file of the syndic daemon
#syndic_log_file: syndic.log

#####      Peer Publish settings     #####
##########################################
# Salt minions can send commands to other minions, but only if the minion is
# allowed to. By default "Peer Publication" is disabled, and when enabled it
# is enabled for specific minions and specific commands. This allows secure
# compartmentalization of commands based on individual minions.

# The configuration uses regular expressions to match minions and then a list
# of regular expressions to match functions. The following will allow the
# minion authenticated as foo.example.com to execute functions from the test
# and pkg modules.
#
#peer:
#  foo.example.com:
#    - test.*
#    - pkg.*
#
# This will allow all minions to execute all commands:
#
#peer:
#  .*:
#    - .*
#
# This is not recommended, since it would allow anyone who gets root on any
# single minion to instantly have root on all of the minions!
# Minions can also be allowed to execute runners from the salt master.
# Since executing a runner from the minion could be considered a security risk,
# it needs to be enabled. This setting functions just like the peer setting
# except that it opens up runners instead of module functions.
#
# All peer runner support is turned off by default and must be enabled before
# using. This will enable all peer runners for all minions:
#
#peer_run:
#  .*:
#    - .*
#
# To enable just the manage.up runner for the minion foo.example.com:
#
#peer_run:
#  foo.example.com:
#    - manage.up
#####         Mine settings     #####
##########################################
# Restrict mine.get access from minions. By default any minion has a full access
# to get all mine data from master cache. In acl definion below, only pcre matches
# are allowed.
#
# mine_get:
#   .*:
#     - .*
#
# Example below enables minion foo.example.com to get  'network.interfaces' mine data only
# , minions web* to get all network.* and disk.* mine data and all other minions won't get
# any mine data.
#
# mine_get:
#   foo.example.com:
#     - network.inetrfaces
#   web.*:
#     - network.*
#     - disk.*

#####         Logging settings       #####
##########################################
# The location of the master log file
# The master log can be sent to a regular file, local path name, or network
# location. Remote logging works best when configured to use rsyslogd(8) (e.g.:
# ``file:///dev/log``), with rsyslogd(8) configured for network logging. The URI
# format is: <file|udp|tcp>://<host|socketpath>:<port-if-required>/<log-facility>
#log_file: /var/log/salt/master
#log_file: file:///dev/log
#log_file: udp://loghost:10514

#log_file: /var/log/salt/master
#key_logfile: /var/log/salt/key

# The level of messages to send to the console.
# One of 'garbage', 'trace', 'debug', info', 'warning', 'error', 'critical'.
#log_level: warning

# The level of messages to send to the log file.
# One of 'garbage', 'trace', 'debug', info', 'warning', 'error', 'critical'.
#log_level_logfile: warning

# The date and time format used in log messages. Allowed date/time formating
# can be seen here: http://docs.python.org/library/time.html#time.strftime
#log_datefmt: '%H:%M:%S'
#log_datefmt_logfile: '%Y-%m-%d %H:%M:%S'

# The format of the console logging messages. Allowed formatting options can
# be seen here: http://docs.python.org/library/logging.html#logrecord-attributes
#log_fmt_console: '[%(levelname)-8s] %(message)s'
#log_fmt_logfile: '%(asctime)s,%(msecs)03.0f [%(name)-17s][%(levelname)-8s] %(message)s'

# This can be used to control logging levels more specificically.  This
# example sets the main salt library at the 'warning' level, but sets
# 'salt.modules' to log at the 'debug' level:
#   log_granular_levels:
#     'salt': 'warning',
#     'salt.modules': 'debug'
#

#log_granular_levels: {}

#####         Node Groups           #####
##########################################
# Node groups allow for logical groupings of minion nodes.
# A group consists of a group name and a compound target.
#
#nodegroups:
#  group1: 'L@foo.domain.com,bar.domain.com,baz.domain.com and bl*.domain.com'
#  group2: 'G@os:Debian and foo.domain.com'

#####     Range Cluster settings     #####
##########################################
# The range server (and optional port) that serves your cluster information
# https://github.com/grierj/range/wiki/Introduction-to-Range-with-YAML-files
#
#range_server: range:80

#####     Windows Software Repo settings #####
##############################################
# Location of the repo on the master
#win_repo: /srv/salt/win/repo

# Location of the master's repo cache file
#win_repo_mastercachefile: /srv/salt/win/repo/winrepo.p

# List of git repositories to include with the local repo

#win_gitrepos:
#  - 'https://github.com/saltstack/salt-winrepo.git'
alfredopalhares commented 9 years ago

@basepi Thanks for the explanation! With this I could narrow down the problem. Which is in fileclient while loop

Can you explain why there is a while True there? This commit what causing my minion thread to hang. The init_retries is being reinitialized to 10 every time after continue, since its inside a infinite loop

Just by moving the init_retries out of the while loop makes the script actually work. And it will finally give me an error:

myminion.mycompany.org:
    Data failed to compile:
----------
    Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/state.py", line 2818, in call_highstate
    top = self.get_top()
  File "/usr/lib/python2.7/dist-packages/salt/state.py", line 2320, in get_top
    tops = self.get_tops()
  File "/usr/lib/python2.7/dist-packages/salt/state.py", line 2193, in get_tops
    saltenv
  File "/usr/lib/python2.7/dist-packages/salt/fileclient.py", line 149, in cache_file
    return self.get_url(path, '', True, saltenv)
  File "/usr/lib/python2.7/dist-packages/salt/fileclient.py", line 521, in get_url
    return self.get_file(url, dest, makedirs, saltenv)
  File "/usr/lib/python2.7/dist-packages/salt/fileclient.py", line 1005, in get_file
    if not data['data']:
TypeError: string indices must be integers, not str

Which seems to be a jinja related error.

I would very happily submit a Pull request to fix this problem. But It seems that the approach is wrong in the first place.

basepi commented 9 years ago

Yep, that code definitely looks wrong. I expect he meant to initialize it one line above. Please feel free to submit a pull request! Easy fix, thanks for tracking that down.

whiteinge commented 9 years ago

This issue has been resolved in #19019. The fix will be in 2014.7.1.

whiteinge commented 9 years ago

@masterkorp your diagnosis looks spot-on, btw! Are you in a position to test that fix?

alfredopalhares commented 9 years ago

@whiteinge Yes I can test in a day or two.

wt commented 9 years ago

I have just added these patches to one of my masters. I will report back tomorrow on results.

wt commented 9 years ago

After 10 hours of running with the patch from #19019, I don't see minions locking up. I think this is looking really good.

wt commented 9 years ago

Still looking great after ~18 hours.

alfredopalhares commented 9 years ago

I can confirm that it seems to work great.

wt commented 9 years ago

Oh no, I have minions that are stuck again. Some have been stuck for 6 hours or so. Having said that, they did last a lot longer than without the patch.

cachedout commented 9 years ago

Sorry for the delay here. @wt Are you still experiencing this issue?

cachedout commented 9 years ago

One more ping, @wt. Is this problem ongoing, even with that patch in place?

wt commented 9 years ago

I don't actually know. I actually restart the minions and master once per day. I can tell you that highstates take much longer to run right before the restart. Sometimes they do not complete. Here's a log message I seem to get around that time:

[INFO    ] SaltReqTimeoutError: after 60 seconds. (Try 1 of 3)
[INFO    ] SaltReqTimeoutError: after 60 seconds. (Try 1 of 3)

In fact, I ended up getting this message when I attempted to do a manual highstate just before the restart of the master.

[exception text]
salt.exceptions.SaltClientError: Attempt to authenticate with the salt master failed

If I start a new manual highstate and restart the master shortly after, the minion just sits there. The last log message is:

[DEBUG   ] Decrypting the current master AES key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem

FWIW, I let this the minion seems to be stuck. I haven't seen any new log messages for about 15 minutes now.

Also, here's my version data:

# salt-call --versions
           Salt: 2014.7.1
         Python: 2.6.9 (unknown, Mar 18 2015, 19:35:56)
         Jinja2: 2.7.2
       M2Crypto: 0.21.1
 msgpack-python: 0.4.4
   msgpack-pure: Not Installed
       pycrypto: 2.6.1
        libnacl: Not Installed
         PyYAML: 3.10
          ioflo: Not Installed
          PyZMQ: 14.3.1
           RAET: Not Installed
            ZMQ: 3.2.5
           Mako: Not Installed

Note that I am not yet running 2014.7.2 since 2014.7.1 is the version in epel6.

wt commented 8 years ago

Is this still a thing?

rosscdh commented 7 years ago

yes.. suffering it now

Salt Version:
           Salt: 2016.11.5

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: 0.21.1
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.8
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: 3.4.3
         pygit2: Not Installed
         Python: 2.7.5 (default, Aug 18 2016, 15:58:25)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.2.1511 Core
        machine: x86_64
        release: 3.10.0-327.28.3.el7.x86_64
         system: Linux
        version: CentOS Linux 7.2.1511 Core
cachedout commented 7 years ago

@ross-p OK. Can we see a trace of the minion system calls while it's hung? Can you tell us what it was doing when it hung? Is it a particular state that's doing it? Do you have debug logs?

floridop commented 6 years ago

Hi all. I am experiencing this today for the first time with Salt master 2015.5.10, CentOS7 Salt minion 2017.8.13, CentOS7 I have many minions on these two versions and this one seems the only one experiencing the issue upon a specific state.sls call.

I tried to comment out commands that might hang but I feel this has nothing to do with the state.

It happens when I test the command, I didn't dare to run it for real since the test is not returning any output.

salt -v 'print2*' state.sls nisserver test=True

The state.sls is:

#include:
#  - iptables

{% if 'nisserver' in grains['roles'] %}
nisserver:
  pkg.installed:
    - name: ypserv
  service.running:
    - name: ypserv
    - enable: True
    - watch:
      - file: /etc/defaultdomain
      - file: nisports
      - file: /var/yp/Makefile
      - file: /var/yp/securenets
      - file: /var/yp/securenets_vespa
      - file: /etc/auto.master
      - file: /etc/auto.home
      - file: /etc/passwd
      - file: /etc/shadow
      - file: /etc/group

#yppasswdd:
#  service:
#    - running
#    - enable: True
#    - require:
#      - pkg: nisserver
#      - srv: ypserv
#    - watch:
#      - file: /var/yp/securenets
#      - file: /var/yp/securenets_vespa
#      - file: nisportspasswd
#      - file: /etc/auto.master
#      - file: /etc/auto.home
#      - file: /etc/passwd
#      - file: /etc/shadow
#      - file: /etc/group

/etc/defaultdomain:
{% if grains['os_family'] == 'RedHat' %}
  file.append:
     - name: /etc/sysconfig/network
     - text: 'NISDOMAIN=mytestdomain'
{% else %}
  file.managed:
     - source: salt://nisserver/defaultdomain.mftf
{%endif %}

nisports:
  file.append:
    - name: /etc/sysconfig/network
    - text: |
        YPSERV_ARGS="-p 944"
        YPXFRD_ARGS="-p 945"

nisportspasswd:
  file.replace:
    - name: /etc/sysconfig/yppasswdd
    - pattern: YPPASSWDD_ARGS=.*
    - repl: YPPASSWDD_ARGS="--port 946"
nisportspasswd:
  file.replace:
    - name: /etc/sysconfig/yppasswdd
    - pattern: YPPASSWDD_ARGS=.*
    - repl: YPPASSWDD_ARGS="--port 946"

/etc/passwd:
  file.append:
    - source: salt://nisserver/passwd.mftf

/etc/shadow:
  file.append:
    - source: salt://nisserver/shadow.mftf

/etc/group:
  file.append:
    - source: salt://nisserver/group.mftf

/var/yp/Makefile:
    file.managed:
    - user: root
    - group: root
    - mode: 644
    - source: salt://nisserver/Makefile.mftf.print2

/var/yp/securenets:
  file.managed:
    - user: root
    - group: root
    - mode: 644
    - source: salt://nisserver/securenets.mftf

/var/yp/securenets_vespa:
  file.managed:
    - user: root
    - group: root
    - mode: 644
    - source: salt://nisserver/securenets_vespa.mftf

# list of slaves, check if it works without ypinit
#/var/yp/ypservers:
#  file.managed:
#    - user: root
#    - group: root
#    - mode: 644
#    - source: salt://nisserver/ypservers.mftf

#refreshnis:
#  cmd.run:
#    - name: 'make -C /var/yp'
#    - watch:
#      - file: /var/yp/securenets
#      - file: /etc/auto.home
#      - file: /etc/passwd
#      - file: /etc/shadow
#      - file: /etc/group

#### server maps ####
/etc/auto.master:
  file.managed:
    - user: root
    - group: root
    - source: salt://nisserver/auto.master.mftf

/etc/auto.home:
  file.managed:
    - user: root
    - group: root
    - source: salt://nisserver/auto.home.mftf

/etc/auto.nodes:
  file.managed:
    - user: root
    - group: root
    - source: salt://nisserver/auto.nodes.mftf

The logs shows some of the states successfully completing, and then repeatedly the saltutil calls on jobids that are different from the state.sls calls, as well as lots of PIDs spawned:

2018-01-10 16:48:43,716 [salt.minion      ][INFO    ][3911] User root Executing command state.sls with jid 20180110164417561174
2018-01-10 16:48:43,736 [salt.minion      ][INFO    ][6291] Starting a new job with PID 6291
2018-01-10 16:48:43,865 [salt.state       ][INFO    ][6291] Loading fresh modules for state activity
2018-01-10 16:48:43,906 [salt.fileclient  ][INFO    ][6291] Fetching file from saltenv 'base', ** skipped ** latest already in cache u'salt://nisserver/init.sls'
2018-01-10 16:48:44,793 [salt.state       ][INFO    ][6291] Running state [ypserv] at time 16:48:44.793041
2018-01-10 16:48:44,794 [salt.state       ][INFO    ][6291] Executing state pkg.installed for ypserv
2018-01-10 16:48:44,795 [salt.loaded.int.module.cmdmod][INFO    ][6291] Executing command ['rpm', '-qa', '--queryformat', '%{NAME}_|-%{EPOCH}_|-%{VERSION}_|-%{RELEASE}_|-%{ARCH}_|-(none)\n'] in directory '/root'
2018-01-10 16:48:46,417 [salt.state       ][INFO    ][6291] Package ypserv is already installed
2018-01-10 16:48:46,418 [salt.state       ][INFO    ][6291] Completed state [ypserv] at time 16:48:46.418003
2018-01-10 16:48:46,428 [salt.state       ][INFO    ][6291] Running state [/etc/sysconfig/network] at time 16:48:46.428387
2018-01-10 16:48:46,429 [salt.state       ][INFO    ][6291] Executing state file.append for /etc/sysconfig/network
2018-01-10 16:48:46,433 [salt.state       ][INFO    ][6291] File /etc/sysconfig/network is in correct state
2018-01-10 16:48:46,433 [salt.state       ][INFO    ][6291] Completed state [/etc/sysconfig/network] at time 16:48:46.433311
2018-01-10 16:48:46,434 [salt.state       ][INFO    ][6291] Running state [/etc/sysconfig/network] at time 16:48:46.433714
2018-01-10 16:48:46,434 [salt.state       ][INFO    ][6291] Executing state file.append for /etc/sysconfig/network
2018-01-10 16:48:46,437 [salt.state       ][INFO    ][6291] File /etc/sysconfig/network is in correct state
2018-01-10 16:48:46,438 [salt.state       ][INFO    ][6291] Completed state [/etc/sysconfig/network] at time 16:48:46.437561
2018-01-10 16:48:46,438 [salt.state       ][INFO    ][6291] Running state [/var/yp/Makefile] at time 16:48:46.437937
2018-01-10 16:48:46,438 [salt.state       ][INFO    ][6291] Executing state file.managed for /var/yp/Makefile
2018-01-10 16:48:46,449 [salt.state       ][INFO    ][6291] The file /var/yp/Makefile is in the correct state
2018-01-10 16:48:46,449 [salt.state       ][INFO    ][6291] Completed state [/var/yp/Makefile] at time 16:48:46.449003
2018-01-10 16:48:46,449 [salt.state       ][INFO    ][6291] Running state [/var/yp/securenets] at time 16:48:46.449371
2018-01-10 16:48:46,450 [salt.state       ][INFO    ][6291] Executing state file.managed for /var/yp/securenets
2018-01-10 16:48:46,455 [salt.state       ][INFO    ][6291] The file /var/yp/securenets is in the correct state
2018-01-10 16:48:46,455 [salt.state       ][INFO    ][6291] Completed state [/var/yp/securenets] at time 16:48:46.455027
2018-01-10 16:48:46,455 [salt.state       ][INFO    ][6291] Running state [/var/yp/securenets_vespa] at time 16:48:46.455405
2018-01-10 16:48:46,456 [salt.state       ][INFO    ][6291] Executing state file.managed for /var/yp/securenets_vespa
2018-01-10 16:48:46,461 [salt.state       ][INFO    ][6291] The file /var/yp/securenets_vespa is in the correct state
2018-01-10 16:48:46,461 [salt.state       ][INFO    ][6291] Completed state [/var/yp/securenets_vespa] at time 16:48:46.460852
2018-01-10 16:48:46,461 [salt.state       ][INFO    ][6291] Running state [/etc/auto.master] at time 16:48:46.461212
2018-01-10 16:48:46,462 [salt.state       ][INFO    ][6291] Executing state file.managed for /etc/auto.master
2018-01-10 16:48:46,466 [salt.state       ][INFO    ][6291] The file /etc/auto.master is in the correct state
2018-01-10 16:48:46,467 [salt.state       ][INFO    ][6291] Completed state [/etc/auto.master] at time 16:48:46.466530
2018-01-10 16:48:46,467 [salt.state       ][INFO    ][6291] Running state [/etc/auto.home] at time 16:48:46.466901
2018-01-10 16:48:46,467 [salt.state       ][INFO    ][6291] Executing state file.managed for /etc/auto.home
2018-01-10 16:48:46,476 [salt.fileclient  ][INFO    ][6291] Fetching file from saltenv 'base', ** skipped ** latest already in cache u'salt://nisserver/auto.home.mftf'
2018-01-10 16:48:46,478 [salt.state       ][INFO    ][6291] File changed:
---
+++
OMITTING THIS INFO BUT FILE IS CHANGED ACCORDING TO THE STATE.SLS
2018-01-10 16:48:46,478 [salt.state       ][INFO    ][6291] Completed state [/etc/auto.home] at time 16:48:46.477973
2018-01-10 16:48:46,478 [salt.state       ][INFO    ][6291] Running state [/etc/passwd] at time 16:48:46.478347
2018-01-10 16:48:46,479 [salt.state       ][INFO    ][6291] Executing state file.append for /etc/passwd
2018-01-10 16:48:46,484 [salt.fileclient  ][INFO    ][6291] Fetching file from saltenv 'base', ** skipped ** latest already in cache u'salt://nisserver/passwd.mftf'
2018-01-10 16:48:48,732 [salt.minion      ][INFO    ][3911] User root Executing command saltutil.find_job with jid 20180110164422577112
2018-01-10 16:48:48,759 [salt.minion      ][INFO    ][6303] Starting a new job with PID 6303
2018-01-10 16:48:48,788 [salt.minion      ][INFO    ][6303] Returning information for job: 20180110164422577112
2018-01-10 16:48:58,765 [salt.minion      ][INFO    ][3911] User root Executing command saltutil.find_job with jid 20180110164432610674
2018-01-10 16:48:58,793 [salt.minion      ][INFO    ][6308] Starting a new job with PID 6308
2018-01-10 16:48:58,822 [salt.minion      ][INFO    ][6308] Returning information for job: 20180110164432610674
2018-01-10 16:49:08,781 [salt.minion      ][INFO    ][3911] User root Executing command saltutil.find_job with jid 20180110164442626191
2018-01-10 16:49:08,808 [salt.minion      ][INFO    ][6313] Starting a new job with PID 6313
2018-01-10 16:49:08,870 [salt.minion      ][INFO    ][6313] Returning information for job: 20180110164442626191
2018-01-10 16:49:18,796 [salt.minion      ][INFO    ][3911] User root Executing command saltutil.find_job with jid 20180110164452640849
2018-01-10 16:49:18,823 [salt.minion      ][INFO    ][6318] Starting a new job with PID 6318
2018-01-10 16:49:18,852 [salt.minion      ][INFO    ][6318] Returning information for job: 20180110164452640849
2018-01-10 16:49:28,817 [salt.minion      ][INFO    ][3911] User root Executing command saltutil.find_job with jid 20180110164502662446
2018-01-10 16:49:28,845 [salt.minion      ][INFO    ][6324] Starting a new job with PID 6324
2018-01-10 16:49:28,874 [salt.minion      ][INFO    ][6324] Returning information for job: 20180110164502662446
2018-01-10 16:49:38,845 [salt.minion      ][INFO    ][3911] User root Executing command saltutil.find_job with jid 20180110164512690689
2018-01-10 16:49:38,905 [salt.minion      ][INFO    ][6329] Starting a new job with PID 6329
2018-01-10 16:49:38,934 [salt.minion      ][INFO    ][6329] Returning information for job: 20180110164512690689
2018-01-10 16:49:48,851 [salt.minion      ][INFO    ][3911] User root Executing command saltutil.find_job with jid 20180110164522696811
2018-01-10 16:49:48,880 [salt.minion      ][INFO    ][6334] Starting a new job with PID 6334
2018-01-10 16:49:48,909 [salt.minion      ][INFO    ][6334] Returning information for job: 20180110164522696811
2018-01-10 16:49:58,867 [salt.minion      ][INFO    ][3911] User root Executing command saltutil.find_job with jid 20180110164532712111
2018-01-10 16:49:58,894 [salt.minion      ][INFO    ][6339] Starting a new job with PID 6339
2018-01-10 16:49:58,923 [salt.minion      ][INFO    ][6339] Returning information for job: 20180110164532712111
2018-01-10 16:50:08,899 [salt.minion      ][INFO    ][3911] User root Executing command saltutil.find_job with jid 20180110164542744583
2018-01-10 16:50:08,942 [salt.minion      ][INFO    ][6344] Starting a new job with PID 6344
2018-01-10 16:50:08,970 [salt.minion      ][INFO    ][6344] Returning information for job: 20180110164542744583
2018-01-10 16:50:18,925 [salt.minion      ][INFO    ][3911] User root Executing command saltutil.find_job with jid 20180110164552770688
2018-01-10 16:50:18,954 [salt.minion      ][INFO    ][6349] Starting a new job with PID 6349
2018-01-10 16:50:18,983 [salt.minion      ][INFO    ][6349] Returning information for job: 20180110164552770688
2018-01-10 16:50:28,939 [salt.minion      ][INFO    ][3911] User root Executing command saltutil.find_job with jid 20180110164602784237
2018-01-10 16:50:28,967 [salt.minion      ][INFO    ][6354] Starting a new job with PID 6354
2018-01-10 16:50:28,996 [salt.minion      ][INFO    ][6354] Returning information for job: 20180110164602784237
2018-01-10 16:50:38,964 [salt.minion      ][INFO    ][3911] User root Executing command saltutil.find_job with jid 20180110164612808915
2018-01-10 16:50:39,005 [salt.minion      ][INFO    ][6359] Starting a new job with PID 6359
2018-01-10 16:50:39,035 [salt.minion      ][INFO    ][6359] Returning information for job: 20180110164612808915
2018-01-10 16:50:48,992 [salt.minion      ][INFO    ][3911] User root Executing command saltutil.find_job with jid 20180110164622836910
2018-01-10 16:50:49,019 [salt.minion      ][INFO    ][6364] Starting a new job with PID 6364
2018-01-10 16:50:49,049 [salt.minion      ][INFO    ][6364] Returning information for job: 20180110164622836910
floridop commented 6 years ago

Note that the crazy spawning stops after I salt 'print2*' saltutil.kill_job 20180110164417561174 the jobids listed in that log are non-existing jobs in the system. I also tried removing the nisserver block alltogether and the behaviour is the same. How to debug further?

floridop commented 6 years ago

It is definitely a bug as the system becomes less responsive and python salt-minion hogs one core 100%.

floridop commented 6 years ago

Update: I commented out /etc/passwd editing and the minion completes its job. There must be something in the system that prevents the changes to /etc/passwd to happen; a manual merge works. I probably should use the shadow or user modules to handle these things instead. It's still a bit annoying that one has to go by trial and error with salt, it's never clear what the minion is doing and why, I should give a try to the latest releases maybe?

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.