saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.14k stars 5.47k forks source link

Error when trying to run orch command if another orch is running by scheduler. #37994

Open gummeah opened 7 years ago

gummeah commented 7 years ago

Description of Issue/Question

I have two orchestration state files. First one is running every 15 minutes by scheduler:

schedule:
  http_frontend_haproxy:
    function: state.orchestrate
    minutes: 15
    args:
      - orch.http_frontend.haproxy

/srv/salt/orch/http_frontend/haproxy.sls:

orch_haproxy_conf_generate:
  salt.state:
    - tgt: 'config_server'
    - sls: 
      - orch.http_frontend.haproxy_conf_generate

orch_haproxy_conf_update:
  salt.state:
    - tgt: 'role:http_frontend'
    - tgt_type: 'grain'
    - sls: 
      - orch.http_frontend.haproxy_conf_update

And second one I'm running manually with command:

salt-run state.orch orch.http_frontend.bird

/srv/salt/orch/http_frontend/bird.sls:

refresh_pillar:
  salt.function:
   - name: saltutil.refresh_pillar
   - tgt: 'role:http_frontend'
   - tgt_type: 'grain'

update_bird_config:
  salt.state:
    - tgt: 'role:http_frontend'
    - tgt_type: 'grain'
    - sls: 
      - http_frontend.bird_config

If first orchestration is already running by scheduler and I'm trying to run second one with salt-run state.orch - I'm getting error:

master:
    Data failed to compile:
----------
    The function "state.orchestrate" is running as PID 16907 and was started at 2016, Nov 30 18:18:56.071485 with jid 20161130181856071485

But everything is fine if I'm running both of them manually with commands:

salt-run state.orch orch.http_frontend.haproxy
salt-run state.orch orch.http_frontend.bird

What am I doing wrong?

Versions Report

Salt Version: Salt: 2016.11.0

Dependency Versions: cffi: Not Installed cherrypy: 3.2.2 dateutil: Not Installed gitdb: 0.6.4 gitpython: 1.0.1 ioflo: Not Installed Jinja2: 2.7.2 libgit2: Not Installed libnacl: 1.4.3 M2Crypto: Not Installed Mako: Not Installed msgpack-pure: Not Installed msgpack-python: 0.4.8 mysql-python: 1.2.3 pycparser: Not Installed pycrypto: 2.6.1 pygit2: Not Installed Python: 2.7.5 (default, Nov 20 2015, 02:00:19) python-gnupg: Not Installed PyYAML: 3.11 PyZMQ: 15.3.0 RAET: Not Installed smmap: 0.9.0 timelib: Not Installed Tornado: 4.2.1 ZMQ: 4.1.4

System Versions: dist: centos 7.2.1511 Core machine: x86_64 release: 3.10.0-327.28.2.el7.x86_64 system: Linux version: CentOS Linux 7.2.1511 Core

gtmanfred commented 7 years ago

I believe this is the expected behaviour.

The master can only run one set of orchestration states at the same time.

When you are running them one command after another, the first one finishes running before the second one starts.

If you try to run it with --async, you should get the same error.

salt-run --async state.orch orch.http_frontend.haproxy
salt-run state.orch orch.http_frontend.bird

Thanks, Daniel

gummeah commented 7 years ago

No, the first one is running for few minutes, there's a lot of job to do. So i'm able to run the second one in another terminal window simultaneously with the first.

gummeah commented 7 years ago

And why reactor documentation suggesting to use orchestration with it(with reactor system) then? To use only one orchestration? Or how to control that if you have multiple reactors with orchestration actions on it and multiple events arise in a short period of time?

gtmanfred commented 7 years ago

Ok, so this should definitely be the same behaviour between the two.

I think the goal is to make this so that it will be configurable to be run with concurrent from the scheduler. And set it so that it has the same behaviour as right now, but could be configured to run how you want. Then we will update the behavior in Nitrogen.

gtmanfred commented 7 years ago

I was able to replicate this behavior as described with the following schedule.

[root@65a0c37fd433 /]# tail -c +0 /etc/salt/master.d/sched.conf /srv/salt/test.sls
==> /etc/salt/master.d/sched.conf <==
schedule:
  sleep:
    function: state.orch
    minutes: 20
    concurrent: True
    args:
      - test

==> /srv/salt/test.sls <==
deploy:
  salt.function:
    - tgt: '*'
    - name: test.sleep
    - arg:
      - 1000

and here is the output when the schedule is running.

[root@65a0c37fd433 /]# salt-run jobs.active
20161130193228085768:
    ----------
    Arguments:
        - 1000
    Function:
        test.sleep
    Returned:
    Running:
        |_
          ----------
          65a0c37fd433:
              2212
    StartTime:
        2016, Nov 30 19:32:28.085768
    Target:
        *
    Target-type:
        glob
    User:
        root
[root@65a0c37fd433 /]# salt-run state.orch test
65a0c37fd433_master:
    Data failed to compile:
----------
    The function "state.orch" is running as PID 2003 and was started at 2016, Nov 30 19:32:06.606691 with jid 20161130193206606691
retcode:
    1

This should be the same behavior as it is from the commandline, but for some reason it appears that concurrent is set to True somewhere.

gtmanfred commented 7 years ago

It appears that you can pass the queue=True option on the commandline, or in the scheduler or reactor, and if one state run is already running, the orchestrate will wait for it to finish running instead of failing to run at all.

clallen commented 7 years ago

Disregard, error on my end

twellspring commented 6 years ago

Is there any progress on having the ability to run multiple Orchestration tasks? Orchestration is not very useful f it has to be single threaded

damon-atkins commented 6 years ago

@thatch45 Can we have a bit of an escalation for this issue which has been open for a year. This demotes one of the selling points of salt.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

stale[bot] commented 5 years ago

Thank you for updating this issue. It is no longer marked as stale.