saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Install Salt from the Salt package repositories here:
https://docs.saltproject.io/salt/install-guide/en/latest/
Apache License 2.0
14.19k stars 5.48k forks source link

Top SLS compilation does not behave the same as Docs describe #12483

Closed driskell closed 9 years ago

driskell commented 10 years ago

Hi

(This is taken from my comments on closed issue #5440 - @basepi recommended a new issue so here it is! Thanks.)

I have 4 environments, "base", "qa", "dev", "master", in that order defined in file_roots. These are branches in a git and I have my own deployment of this to the salt server (I do not use gitfs).

Each has a version of the top.sls file containing all environments (as you'd expect since it same file, just different revisions different branches)

The documentation describes that if an environment has info in top.sls in "base", it will always override all other top.sls. This is not the case - and checking the code there is no distinction between "base" and any other name, and no specific decision on ordering (and it seems alphabetical) - so in my case, even though in the top.sls and file_roots, "qa" is second down, because it is last alphabetically, that overrides all other top.sls files in all other environments.

Based on the docs, it seems to give the idea that "base" is a special name and is always authoritative. It also explicitly says that if an environment is defined in multiple top.sls files - the definition in the top.sls for that environment will be preferred. But the code seems to say otherwise.

The code seems to simply merge and overwrite, in alphabetical order. Which contradicts entirely the documentation.

For reference, from http://docs.saltstack.com/en/latest/ref/states/top.html:

The following is what lead me to think "base" was a special name (it doesn't say the FIRST environment wins, it says the "base" environment wins.)

21.24.15.3. HOW TOP FILES ARE COMPILED ... 1 . The base environment's top file is processed first. Any environment which is defined in the base top.sls as well as another environment's top file, will use the instance of the environment configured in base and ignore all other instances. In other words, the base top file is authoritative when defining environments.

The following is the bit saying a top.sls in an environment is authoritative for that environment if it contains config for it:

3 . For environments other than base, the top file in a given environment will be checked for a section matching the environment's name. If one is found, then it is used. Otherwise, the remaining (non-base) environments will be checked in alphabetical order. In the below example, the qa section in /srv/salt/dev/top.sls will be ignored, but if /srv/salt/qa/top.sls were cleared or removed, then the states configured for the qa environment in /srv/salt/dev/top.sls will be applied.

Although if I'm honest, point 3 does seem to contradict point 2 (below) - I would've expected point 2 to at least make an exception for point 3 - or point 3 come first

2 . If, for some reason, the base environment is not configured in the base environment's top file, then the other environments will be checked in alphabetical order. The first top file found to contain a section for the base environment wins, and the other top files' base sections are ignored. So, provided there is no base section in the base top file, with the below two top files the dev environment would win out, and the common.centos SLS would not be applied to CentOS hosts.

@basepi:

@driskell Could I convince you to open a new issue and insert all of this awesome information and detective work into that new issue? I think what we will want to do is bring the code in line with the documentation (I still haven't tested this myself, so I haven't verified that I see the same thing you do) if possible. I think the documentation's way is a good way, but we definitely need to get them in line with each other one way or another.

I agree the documentation is (mostly) a good way, and the behaviour should match. Assuming I haven't read the doc wrong though, and "base" is meant to be special and authoritative and overriding, then I think that should be configurable so I can call it something else, so maybe a "main_env: base" option with that as default.

Thanks guys!

basepi commented 10 years ago

Thanks for filing this!

driskell commented 10 years ago

Proposed a fix if it helps!

basepi commented 10 years ago

Awesome! I'll leave review of that to @thatch45. =)

cachedout commented 10 years ago

Flagged as 'Fixed Pending Verification' while we wait for that review.

driskell commented 10 years ago

OK so the fix is not viable as I expected - though I think it was a good POC of what the docs say.

Has there been any discussions on what should happen? Or what recommended approach for me to take?

I think I can rename my environments aDev bQA cProd and things will work better, but that just seems a hack. Or I would need to take my top.sls out of Git and hold them separately, so that each one only contains directives for servers in its environments (thus merging doesn't overwrite anything). But I still worry about someone with access to QA accidentally using a production server name instead of QA and it overriding - so I'd have to use the aDev bQA cProd still.

Maybe I'm approaching this completely wrong though.

basepi commented 10 years ago

I think what it comes down to is that we've never really had testing in place for this kind of workflow. In general we just recommend that people don't have conflicts across environments for their top.sls files, because it tends to be implicit, rather than explicit, which in my opinion isn't good (and is harder to maintain).

So we just need to deep dive this and either change the documentation or get the actual compilation process in line with the documentation.

driskell commented 10 years ago

It seems that #9616 is a variation of this issue?

Please let me know if you start discussing this. I am more than willing to accept something where you have an option to enable the top compilation described in the docs.

I think it is important that not only states can be version controlled, but top.sls also. This workflow I used is exactly how I would proceed when using GitFS backend. But I do require top.sls to be in each branch and version controlled, so changes can be promoted along with state changes. Otherwise I end up merging in state changes from QA to Production but have to prevent top.sls merging too somehow... not even sure how you do this without a dirty hack :-(

Jason

driskell commented 10 years ago

Also - I'm willing to help implement the proposed solution when one is decided.

basepi commented 10 years ago

No worries, we'll keep you posted. Just need to find a good time to have a discussion about this.

eyj commented 9 years ago

I would strongly suggest that a note is added to the docs right now that they are incorrect. We just set up a new system that strongly relied on the behaviour described in the documentation, and now have to find out that is not implemented that way.

basepi commented 9 years ago

You're right, we need to update the docs, especially since it's taken so long for us to have time to investigate this.

tehmaspc commented 9 years ago

ditto; been scratching my head as to why my env ordering isn't working. thanks for updating the docs - but finding that the docs say one thing and saltstack does something else is a sad trend i keep running into :(

cheers

basepi commented 9 years ago

@tehmaspc Thanks for the feedback. We've recently hired some services people and should have more core engineering time in the coming months to fix issues like this and get the general bug count under control.

tehmaspc commented 9 years ago

@basepi - awesome; for the record Helium fixed up a bunch of things for us - so it's not all bad :)

cheers,

driskell commented 9 years ago

For anyone arriving here needing a limited alternative, I name the per-env top.sls "_top.sls" in the repository. And I made "top.sls" a symbolic link to "../../base/states/_top.sls" (adjust as necessary) This makes the top.sls in every env point to the _top.sls in the base environment, always. And you keep the ability to merge top changes between environments. Of course, it means base always wins so not exactly a workaround - just a limited alternative.

It'll be good to get something in place to fix it up completely though! I'm watching in the sidelines if you need any help.

tehmaspc commented 9 years ago

Thanks @driskell - I'll give that a shot; I think I'm having the issue as xcorvis on the other issue linked here: https://github.com/saltstack/salt/issues/5440 ; but I need to peruse that better. I did notice yesterday after trying to debug on one of my minion that it indeed was pulling env info from an env I didn't expect and everything I'm trying to do formula wise is via the git backend. Hopefully I can find a solution that works because I'm trying to create a base, dev, stg, prd environment setup like yourself and I have been trying to put a separate top.sls file in each of these env dirs.

driskell commented 9 years ago

@tehmaspc The current functionality just browses the environment top.sls in order, overwriting envs as it goes along. So it will probably pull the top.sls data from "stg" as it's alphabetically last. For me it was pulling qa.

driskell commented 9 years ago

@tehmaspc I can see that other ticket was 2013 - I think when I last looked at this it used to be random (unordered dictionary) so probably based on some hash of some sort, but at some point the code was changed to be an ordered dictionary. So if you tried to reproduce #5440 now it might be it's reverse alphabetical order.

tehmaspc commented 9 years ago

@driskell

so one thing i'm pretty sure is broken and I just verified in my setup is that gitfs backends are not being searched properly in environments OTHER than 'base'. In my non-base environment I get the following error when trying to verify the state of my minion:

root@REDACTED:/srv/salt/base# salt 'utopia-REDACTED' state.show_highstate utopia-REDACTED:

My 'utopia' env top.sls file:

root@REDACTED:/srv/salt/base# cat ../utopia/top.sls utopia: 'utopia-REDACTED-*':

BUT, if I have this same mongodb in base I am able to get the highstate info. I can also clone the mongodb formula into my utopia env dir completely and get the highstate info as well. Something is up w/ gitfs backends in non-base envs.

This is a different problem than this original issue but because we are all doing gitfs backends it becomes part of the problem w/ gitfs backends and multiple envs. I'll be doing more testing w/ envs today to see what also doesn't work in my desired setup. FYI - I didn't mention above but my full list of envs is base, dev, stg, prd, utopia.

UPDATE:

some more info - I believe the following is a good sign - the minions I want in my utopia env are listing their states from the env properly. I can also see this reflected on the minion's cache dir as well - which is expected from the show_top execution below:

root@REDACTED:/srv/salt# salt '*' state.show_top utopia-REDACTED:

base:
    - core
utopia:
    - mongodb

utopia-REDACTED:

base:
    - core
utopia:
    - mongodb

utopia-REDACTED:

base:
    - core
utopia:
    - mongodb

dns1-REDACTED:

base:
    - core

dns2-REDACTED:

base:
    - core

The following is my file_roots config and the reason it has base in each environment is because I went with this setup based on the info here - http://docs.saltstack.com/en/latest/ref/file_server/file_roots.html#directory-overlay:

file_roots: base:

UPDATE2: Well I just realized that if I'm trying to apply a formula to a minion in an environment other than base than that formula's git repo needs a branch other than master. For my case my mongodb salt formula repo needed a 'utopia' branch and after adding that remote branch to the git repo I can properly do show_top. I totally overlooked this since I assumed even in a non-base environment the master branch of any git repo would be used. Thus, it appears for the time being my use case is working for me unless I'm missing something else.

Thanks, Tehmasp

clinta commented 9 years ago

Count me among those who found this too late after painfully discovering that these don't work they way I'd expect.

I think a simpler scheme would be that an environments definition must be in that environments top file, and all other environments in that top file are ignored. So in the base top.sls, the contents of the base environment is read, and everything else is ignored. In the dev environment's top.sls the dev environment is read and everything else is ignored.

This would make the merging of top files much simpler and intuitive, while still allowing them to be version controlled and merged with a normal git workflow.

basepi commented 9 years ago

I actually do like this idea, I'll have to think on it. It would obviously have to be config-gated, as plenty of people are using a single topfile in the base environment these days, but I think this would be good to add.

alf commented 9 years ago

Just had a fight with #17261 and I see that this is related so I figured I'd point this out. The fix I'm proposing for that issue would probably help out a lot for this issue as well.

The logical fix for this problem would be to not use an unordered set in BaseHighState.get_envs but rather use an OrderedDict and then return it as a list.

driskell commented 9 years ago

@alf Not sure I understand. The get_tops returns an OrderedDict so it doesn't look like the get_envs matters at all. The main issue reported here is the top compilation is purely alphabetical order, and nothing else, which is completely other than the seemingly complex process described in the documentation.

The code seems to simply merge and overwrite, in alphabetical order. Which contradicts entirely the documentation.

The entire logic for this issue is within merge_tops (used to be) and I did put forward the correct fix to make everything function exactly as described in the documentation (see #12537) - however, because it was such a substantial behaviour change it couldn't be accepted at the time.

Although, @basepi I don't mind refactoring and testing a new patch fenced by config if it helps speed things along? Just need some thought on what the process SHOULD be and I can implement it I think.

Not sure how much this will all impact GitFS backend but I presume it will be the same as I guess it uses the same compilation process?

Edited to fix PR link and merge_tops reference

alf commented 9 years ago

According to the python stdlib documentation:

An OrderedDict is a dict that remembers the order that keys were first inserted. If a new entry overwrites an existing entry, the original insertion position is left unchanged.

Since we iterate over the result from get_envs and add to the OrderedDict returned from get_tops, it is get_envs that decides the order.

Since a set in python is unordered, the order is not decided. It might appear to be alphabetical but that is unspecified behaviour.

At least that's how I read the code.

alf commented 9 years ago

I also get the impression that by making this simple change the code will do what the original author intended and documented.

driskell commented 9 years ago

Ah I see! Sorry. It's odd that when I tested originally I always had it alphabetical... maybe random chance of the unordered set.

Though I think fixing that would only ensure it was alphabetical then, with the env list having base first and the rest alphabetical (rather than unordered). Ultimately, merge_tops still merges everything though, and I can see the merging will be more orderly. Edited to clarify: merged output will be different, but still merged

I think based on your analysis then my #12537 was insufficient - but both combined it would work exactly to the docs I bet (since my patch assumed alphabetical environment list)

So I think if we keep existing behaviour, and allow for better arrangement, we're looking at max 3 options:

  1. Option for "ordered_top_files" - default false.
  2. Option for "top_file_method" - default "merge". Available: "merge", "same", "preferred"
  3. Option for "preferred_top" - default "base".

So then we have the following:

  1. Default is the same as now. Unordered merging of top files if you mix them up (maybe a major release can API change and change defaults... and remove the ordered_top_files...)
  2. With top_file_method "same", we prefer the current env's top file, and then if current env doesn't exist in its own top file, we look for it like normal either unordered or ordered
  3. With top_file_method "preferred" we can say, if current env is listed in the "preferred_top" env's top file, use it, otherwise fall back to "same"

So essentially here, with "ordered_top_files" and "top_file_method" set to "preferred" - we're then doing exactly what documentation says.

I think this all works in theory - the only confusing bit is the ordered_top_files. It's just a case of - are we happy to enforce the ordering? Bearing in mind someone somewhere is bound to be relying on it. Maybe it is considered worthwhile though.

What are people's thoughts?

driskell commented 9 years ago

Just to add - I did like the documentations way, thus why I'm still wanting to discuss it. If simplification is desired, maybe the fallbacks on the top_file_methods proposal can be removed though, so "same" is ONLY the same, and "preferred" is ONLY the preferred (only that top file is used). I'd be happy to go with the "same" option then and the top file within an env only affects that env.

alf commented 9 years ago

I agree that fixing the bug in get_envs is not sufficient to get the behaviour specified in the documentation.

alf commented 9 years ago

Looking a bit more at the code, I see that Pillars.get_top behaves similar but different from BaseHighState.get_top. Does anyone know whether the behaviour in Pillars.get_top is specified anywhere? Are they different by design or by accident?

arnisoph commented 9 years ago

Could someone please sum up what's currently working differently from the behavior described in the docs? I see a mighty wall of text in this ticket and would need some time to understand and verify what's the actual problem.

driskell commented 9 years ago

@bechtoldt In the docs, HOW TOP FILES ARE COMPILED, is not how it currently works. ( http://docs.saltstack.com/en/latest/ref/states/top.html )

The code seems to simply merge and overwrite, in arbitrary order. Which contradicts entirely the documentation.

(In my experience it is merge and overwrite alphabetical - but others here have reported arbitrary.)

driskell commented 9 years ago

I would rather have it behave like the docs say (or almost like they say) than just updating the docs. Current behaviour is awkward with git branches.

Hope this quickly gives you enough insight. I am known for my essays (sorry).

alf commented 9 years ago

In short, the docs specify the order to be used when merging the tops from different environments. See How Top Files are Compiled. We experience and see from the source that the documentation is wrong and the actual result seems to be unspecified. There is already a warning in the documentation saying as much. We were hoping it could be solved. :)

alf commented 9 years ago

Issue #19802 might be related.

samart commented 9 years ago

we might be running into this while trying to get svnfs and envs working. this implies environment functionality is pretty much broken?

basepi commented 9 years ago

It implies that the best way to handle environments currently is to not have topfiles with conflicts top level keys. Only define an environment once, in a single topfile, because the merging is not currently happening as described in the docs.

ReifiedException commented 9 years ago

Since this problem has been around for almost a year, is not it better to change the documentation so that it reflects the current state of affairs? I mean not only a reference to the issue #12483, added to http://docs.saltstack.com/en/latest/ref/states/top.html, but also recommendations on the management of the multiple environments.

I wonder if ongoing release candidate treats top files differently... According to Changelog, it does not.

basepi commented 9 years ago

I don't expect the topfile compilation to have changed recently, but you're right, I still don't think it matches the docs and it should. However, we'll probably just fix the functionality, rather than taking the time to test the current state of things sufficiently to document it.

BIAndrews commented 9 years ago

This is a pretty big problem for us as well. We aren't able to implement a multi environment setup tracked in git by branches because of this.

BIAndrews commented 9 years ago

It seems to me a simplified solution would be to enable the salt-master to specify a default environment for all minions that do not specify an environment in the minion client config. Then we can be assured which directory top.sls will be used and the process would be git repo friendly. This is how puppet enterprise does it and it works very well. Compiling a top file from multiple environments and alphanumeric issues just seem like the whole process is very over complicated.

packeteer commented 9 years ago

well this problem just took Salt out of the running here. We're going with Puppet

ajmath commented 9 years ago

+1 to this. Just spent the last 2 hours tracking this down and working around it. It's a crapshoot if any of my highstates will succeed.

tehmaspc commented 9 years ago

FWIW, here is what my team does to pull out env from git branches but equally importantly to keep us from having to change our top.sls env files:

We have a template file ('env_top.sls') in our salt master formula that looks like the following:

env_top.sls:

{%- raw %}
{% set roles = salt['grains.get']('roles', '') %}
{%- if roles %}
{%- endraw %}
{{ env }}:
  'env:{{ env }}':
{%- raw %}
    - match: grain
    {%- for role in roles %}
    - {{ role }}
    {%- endfor %}
{% endif %}
{%- endraw %}

base_top.sls:

base:

  '*':
    - core

This file gets layed down for every env we desire with the following state code in this same salt master formula - you can see how many envs we have:

...

# Setup our environment top.sls files
/srv/salt/base/top.sls:
  file.managed:
    - source: salt://salt/files/base_top.sls
    - makedirs: True

{% set envs = ['dev', 'stg', 'prd', 'inf'] %}
{% for env in envs %}
/srv/salt/{{ env }}/top.sls:
  file.managed:
    - source: salt://salt/templates/env_top.sls
    - template: jinja
    - makedirs: True
    - context:
        env: {{ env }}
{% endfor %}

...

Thus, on our Salt master server we get the following env directory structure: /srv/salt/base/top.sls /srv/salt/dev/top.sls /srv/salt/stg/top.sls /srv/salt/prd/top.sls /srv/salt/inf/top.sls

Notice, we never have to change these top.sls files BUT we are completely grain backed. So we keep important grains on all our Salt servers that allow the following approach to work. Each server per it's env has a 'roles' grain list for figuring out what state files it should have applied. Additionally, since we do not have to update these files and because of the issues in this thread I decided to serve these top.sls files above from the FS directly instead of keeping them in separate Git branches.

Our main idea for matching grains and using a template file like above came from this section of the Salt docs: http://docs.saltstack.com/en/latest/topics/targeting/grains.html#matching-grains-in-the-top-file

If our strategy above works for you - you might want to take our approach to Salt envs. Just wanted to share back our strategy.

BTW - we do pillars similarly - via template pillar top.sls files - but we keep a separate git repo to manage all our pillar data / etc.

Cheers, Tehmasp

thatch45 commented 9 years ago

Some crazy lie somehow got into the docs a while back and it has cause many issues. We will fix the docs.

garo commented 9 years ago

This is also problematic for us: What we want is to have base environment to follow the master branch and then a separated production branch. The idea is that every change goes first to the master branch where it's tested against few servers. If it works it's then rebased to production which the majority of our severs follow.

Note that all our virtual machines / servers described here are getting production traffic. The two branches (master/production) exists just to ensure that for example autoscaling servers will always be able to start because we guarantee that the production branch will work in all cases (by peer reviewing etc).

jamiemoore commented 9 years ago

+1 @BIAndrews . Don't compile the top file at all. Just specify the environment on the minion. Puppet has this right.

jmelfi commented 9 years ago

+1 @BIAndrews @jamiemoore

We use puppet currently to deploy changing values from our code checked out of git branches (single repo). Puppet determines from the node itself what env it should be. We have two repos for puppet and hiera with branches and this works well.

It would be beneficial to be able to have something similar for salt where the salt repo can logically define the envs and pillars can provide the abstracted data. Really hoping for a change and update here since we are beginning our evaluation of continuing with puppet or another solution. This feature problem removes salt for us for now.

cachedout commented 9 years ago

Whew, this issue took a long time to read! ;]

There are a lot of solid suggestions here and good feedback. The use cases surrounding version-controlled SLS files in multiple environments are especially compelling and make me feel like we need to add additional behavior to Salt to facilitate workflows where it makes sense to have per-environment top files.

Here is what I would like to propose:

1) We adopt a merge strategy configuration option similar to the pillar merge strategy that's currently controlled by pillar_source_merging_strategy. For the interested, this code is fairly straightforward to follow. The default merging strategy would not change. Somebody would have to really come up with a compelling reason to do so, since it would break many existing installations, in my opinion.

2) Like @basepi, I like this suggestion from @clinta of being able to allow each environment to only use high data for its own environment defined in the top file present in that environment, disregarding all others. I propose making this an additional merge strategy in the configuration option described above.

3) Finally, I propose an additional merge strategy option that would allow an environment to be selected from which all high data would be taken from. This could even default to base if we like. This would only use the selected environment as the authoritative source for high data, disregarding all others.

Does this proposal adequately cover the use cases outlined? All suggestions or feedback are very welcome. :]

garo commented 9 years ago

The features @cachedout suggested sounds good.

Now I know this feature request isn't directly related to this issue but please also consider if #24622 could be implemented at the same time for pillars.

cachedout commented 9 years ago

Hi @garo. I'll take a look at it and see. Thanks. We'll track the progress of that as a separate feature request, though. Follow #24622 for progress.

Anybody else have feedback before we get to writing this?