saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.1k stars 5.47k forks source link

Orchestration Error on 2014.7.0 #18146

Closed dstokes closed 8 years ago

dstokes commented 9 years ago
$ salt-run state.orchestrate deploy.rollout
salt-master:
    Data failed to compile:
----------
    No matching sls found for 'deploy.rollout' in env 'base'

Rolling back to 2014.7.0rc2 fixes the issue.. Still digging for a more descriptive error.

dmyerscough commented 9 years ago

@dstokes are you able to share your deploy.rollout?

dstokes commented 9 years ago
{%- set env = pillar.get('environment', '') %}
{%- set revision = pillar.get('revision', False) %}
{%- set target = "G@ec2_apps:" + pillar.tgt + " and ( G@ec2_roles:webserver or G@ec2_roles:*worker* )" %}

build_application:
  salt.state:
    - tgt: {{ target }}
    - tgt_type: compound
    - sls: deploy
    {%- if revision %}
    - pillar:
        orchestrated: True
        deploy_revision: {{ revision }}
    {%- endif %}

reload_latest:
  salt.state:
    - tgt: {{ target }}
    - tgt_type: compound
    - sls: deploy.reload_latest
    {%- if revision %}
    - pillar:
        deploy_revision: {{ revision }}
    {%- endif %}
    - require:
      - salt: build_application

create_slack_deployment_notification:
  cmd.run:
    - name: <redacted>
    - require:
      - salt: reload_latest
jfindlay commented 9 years ago

Thanks for reporting this @dstokes, we'll look into it.

cachedout commented 9 years ago

Is that {%- set env = pillar.get('environment', '') %} line the culprit here? I'm trying to narrow this down to something that we can start troubleshooting effectively.

dstokes commented 9 years ago

Removing everything but:

build_application:
  salt.state:
    - tgt: test
    - sls: deploy

Doesn't fix the problem. Same error as above. There's not a new requirement on orchestration file location right?

cachedout commented 9 years ago

Not that I'm aware of. :]

This should be enough to try and reproduce this, though. We'll start digging in.

dstokes commented 9 years ago

we've been digging. seems like we're seeing issue #5449 which is causing orchestration failure along with a slew of other state related bugs on latest stable. Attempting to show_sls either fails for valid state files, or loads the similarly named pillar file instead of the state file. Our roots config is as follows:

file_roots:
  base:
    - /srv/salt-states/states

pillar_roots:
  base:
    - /srv/salt-states/pillar
iggy commented 9 years ago

I think we might be seeing similar. In our dev enviroment (without gitfs) it works. In QA/Prod with gitfs it fails. When I throw the file in the filesystem like normal, it starts working in QA/Prod.

@dstokes are you using gitfs?

dstokes commented 9 years ago

@iggy i am not

somenick commented 9 years ago

I am seeing something similar and it's also looks like #5449, because I'm seeing salt reference the corresponding pillar file instead of a state file when running state.show_sls. But the original issue was with orchestration as well, just like this one.

We are using multiple environments, and possibly multiple file_roots and pillar_roots for each env. There are 3 entries for each env's file_roots, but these are the same 3 entries. The pillar_roots have a single entry for base, and the other env's have it as fallback.

We're also using gitfs for several formulas, but it hasn't been a problem in the past. Only thing new is the multiple envs and *_roots.

@basepi any chance we can have a fix or a workaround anytime soon?

somenick commented 9 years ago

also, #16990 might be a duplicate

iggy commented 9 years ago

I ended up opening #19802 since I tracked down a workaround. I was using gitfs for everything and once I put copies of my orchestrate files on the actual filesystem (so, /srv/salt/backups/db/orchestrate.sls etc.) everything started working fine.

somenick commented 9 years ago

I'm trying embedding the formulas as well, but didn't manage to get it to work yet.

I am experiencing inconsistencies though - I did before as well - restarting master and minion seems to help some, but the effect doesn't stick in the long run.

Update:

The machine is it's own master/minion and the minion config has the 'environment' set to other than base.

restarting master helps in a way that may be useful for tracking it down:

somenick commented 9 years ago

I finally tracked it down, and it didn't have (much) to do with orchestration, environments or multi-roots.

My specific problem was that I was doing a jinja import in the pillar top.sls file - essentially factoring out some variables related to environments and orchestration.

Bottom line, that jinja import eventually puts the master into a buggy state where it serves the pillar_roots instead of the file_roots. Once I remove the import in pillar/top.sls and restart the master all is good again

msciciel commented 9 years ago

Is orchestration working with gitfs ?

cachedout commented 9 years ago

@dstokes I tried again to replicate this and could not. What is the path to rollout.sls on your system?

dstokes commented 9 years ago

@cachedout states/deploy/rollout.sls

noegenesis commented 9 years ago

I've hit the same problem (also on 2014.7.0), calling eg. sudo -u salt salt-run state.orchestrate setup currently fails with:

    Data failed to compile:
----------
    No matching sls found for 'setup' in env 'base'

Orchestration calls were actually working fine before I started to put some files in /srv/pillar. Adding some more logging messages to the code I could confirm that the file_roots in __opts__ being set to pillar_roots is the cause, so indeed related to #5449

After some digging, I could fix the issue as follows:

diff -c orig/salt/pillar/__init__.py /usr/lib/python2.7/dist-packages/salt/pillar/__init__.py
*** 128,134 ****
              self.functions = functions

          self.matcher = salt.minion.Matcher(self.opts, self.functions)
-         self.rend = salt.loader.render(self.opts, self.functions)
          # Fix self.opts['file_roots'] so that ext_pillars know the real
          # location of file_roots. Issue 5951
          ext_pillar_opts = dict(self.opts)
--- 132,138 ----
          # location of file_roots. Issue 5951
          ext_pillar_opts = dict(self.opts)
          ext_pillar_opts['file_roots'] = self.actual_file_roots
+         self.rend = salt.loader.render(ext_pillar_opts, self.functions)
          self.merge_strategy = 'smart'
          if opts.get('pillar_source_merging_strategy'):
              self.merge_strategy = opts['pillar_source_merging_strategy']

Regarding the way the opts are bleeding out, this seems to be happening in class Loader (of salt/loader.py). Changing the line above instead to

self.rend = salt.loader.render(dict(self.opts, __marker=1), self.functions)

and then at the end of Loader's __init__ adding on

        if '__marker' in opts:
          self.opts = dict(self.opts)
          del self.opts['file_roots']

also fixes the error. This logic to override mod.__opts__ with self.opts found in gen_module and gen_functions should be the reason.

Finally, I suspect that the correct call may just be using the unmodified opts, ie.

self.rend = salt.loader.render(opts, self.functions)

since we probably also want to avoid forcing file_client to 'local', etc. Ideally the logic to override __opts__ could be avoided though since it feels like a source of unexpected behavior waiting to bite in another way, but it is beyond my knowledge of the codebase to suggest how.

noegenesis commented 9 years ago

Actually, changing the definition of self.rend either way breaks the master trying to render pillars for minions, so this does not seem like an option.

Right now I am restricting both of the mod.__opts__.update in the Loader class to not override file_roots (in all cases that is, not using the __marker). This fixes the original error without introducing any apparent problems.

frogunder commented 8 years ago

@dstokes, @somenick, @noegenesis - Is this still an issue for you?

frogunder commented 8 years ago

Since there has been no activity for a while, I will close this issue