saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Install Salt from the Salt package repositories here:
https://docs.saltproject.io/salt/install-guide/en/latest/
Apache License 2.0
14.19k stars 5.48k forks source link

Custom grains (/srv/salt/_grains/foo.py) not available during salt-run state.orchestrate run #19658

Closed iggy closed 9 years ago

iggy commented 9 years ago

We have a custom grain module that grabs tags from GCE's metadata service and stuffs that into a grain (https://github.com/saltstack/salt-contrib/blob/master/grains/gce.py). When trying to use that grain in an orchestrate runner that we use, we aren't getting that grain set. This works fine everywhere else we use it.

Let me know if you need more info.

salt-run --versions
           Salt: 2014.1.10
         Python: 2.7.3 (default, Mar 13 2014, 11:03:55)
         Jinja2: 2.6
       M2Crypto: 0.21.1
 msgpack-python: 0.1.10
   msgpack-pure: Not Installed
       pycrypto: 2.6
         PyYAML: 3.10
          PyZMQ: 13.1.0
            ZMQ: 3.2.3
rallytime commented 9 years ago

Thanks for reporting this issue @iggy. Is it possible for you to give this another try after upgrading to our latest release of 2014.7.0? I wonder if this has been fixed up already.

iggy commented 9 years ago

I can't use 2014.7.0 because we use compound matching in mine.get (which someone disabled in 2014.7 and later releases of 2014.1).

basepi commented 9 years ago

I'LL NEVER LIVE THAT DOWN!

basepi commented 9 years ago

But seriously, the 2014.7.1 release is in internal testing, so your suffering should end soon, @iggy. ;)

iggy commented 9 years ago

We skipped 2014.7. I'm running a snapshot of 2015.2 (bf9c989) in development and the problem appears to be fixed (although there's another one that I'll have to file a different bug about). I can't say when it was fixed exactly.

basepi commented 9 years ago

Thanks, @iggy.

iggy commented 9 years ago

Sorry. I was wrong. This still isn't working.

basepi commented 9 years ago

To clarify: this still isn't working on 2015.2?

iggy commented 9 years ago

I'm actually on devel now. Not working. I had changed my orchestrate job to look at a different (built-in) grain as a workaround.

iggy commented 9 years ago

Looks like they aren't working in pillars either. In the below, dev is in tags, therefore log_level: debug and log_level_logfile: debug should be set and fileserver_backend shouldn't have git in it. This worked fine under 2014.1 (and as far as I know 2014.7).

/srv/pillars/salt/master.sls

salt:
  master:
    # should be able to remove these at some point when everything is managed
    # via salt-cloud, this is mostly for "static" systems before salt was in use
    open_mode: True
    auto_accept: True
    preserve_minion_cache: True
    # couple niceties
    timeout: 30
    state_verbose: False
    #state_output: mixed
{% if 'dev' in salt['grains.get']('tags') %}
    log_level: debug
    log_level_logfile: debug
{% endif %}
    # FIXME not working -bwj
    peer:
      .*:
        - network.interfaces
        - network.get_hostname
    fileserver_backend:
      - roots
{% if 'dev' not in salt['grains.get']('tags') %}
      - git
{% endif %}
    gitfs_remotes:
      - https://github.com/iggy/salt-formula.git
      - https://github.com/OnCenterSoftware/zookeeper-formula.git
      - https://github.com/OnCenterSoftware/postgres-formula.git
      - https://github.com/OnCenterSoftware/pam-ldap-formula.git
      - https://github.com/OnCenterSoftware/rsyslog-formula.git
      - https://github.com/OnCenterSoftware/graphite-formula.git
      - https://github.com/OnCenterSoftware/collectd-formula.git
      - https://github.com/OnCenterSoftware/aptly-formula.git
      #- https://github.com/iggy/postgres-formula.git
      - git+ssh://git@salt-eap-formula-github.com/OnCenterSoftware/eap-formula.git
      - git+ssh://git@salt-activemq-formula-github.com/OnCenterSoftware/activemq-formula.git
      - git+ssh://git@salt-wildfly-formula-github.com/OnCenterSoftware/wildfly-formula.git
      # dist files are kept in google git because unlike github, they don't
      # randomly delete our binary files... yet
      #- https://source.developers.google.com/p/cosmic-octane-595/ # check this out manually for now... it doesn't change much and it gobbles up memory in salt
      - git+ssh://git@salt-states-github.com/OnCenterSoftware/salt_states.git
      # FIXME use our formulas and setup the ssh keys for all that
    file_roots:
      base:
        - /srv/salt/
        - /srv/dist/
{% if 'dev' not in salt['grains.get']('tags') %}
    ext_pillar:
      - git: master git+ssh://git@salt-pillars-github.com/OnCenterSoftware/salt_pillars.git
{% endif %}
    pillar_roots:
      base:
        - /srv/pillar/
bjackson@dev-salt01:~$ sudo salt-call grains.get tags
local:
    - salt
    - minion
    - master
    - dev
bjackson@dev-salt01:~$ sudo salt-call pillar.item salt 
local:
    ----------
    salt:
        ----------
        cloud:
            ----------
            folders:
                - cloud.providers.d/key
                - cloud.profiles.d
                - cloud.maps.d
            profiles:
                None
            providers:
                None
        master:
            ----------
            auto_accept:
                True
            ext_pillar:
                |_
                  ----------
                  git:
                      master git+ssh://git@salt-pillars-github.com/OnCenterSoftware/salt_pillars.git
            file_roots:
                ----------
                base:
                    - /srv/salt/
                    - /srv/dist/
            fileserver_backend:
                - roots
                - git
            gitfs_remotes:
                - https://github.com/iggy/salt-formula.git
                - https://github.com/OnCenterSoftware/zookeeper-formula.git
                - https://github.com/OnCenterSoftware/postgres-formula.git
                - https://github.com/OnCenterSoftware/pam-ldap-formula.git
                - https://github.com/OnCenterSoftware/rsyslog-formula.git
                - https://github.com/OnCenterSoftware/graphite-formula.git
                - https://github.com/OnCenterSoftware/collectd-formula.git
                - https://github.com/OnCenterSoftware/aptly-formula.git
                - git+ssh://git@salt-eap-formula-github.com/OnCenterSoftware/eap-formula.git
                - git+ssh://git@salt-activemq-formula-github.com/OnCenterSoftware/activemq-formula.git
                - git+ssh://git@salt-wildfly-formula-github.com/OnCenterSoftware/wildfly-formula.git
                - git+ssh://git@salt-states-github.com/OnCenterSoftware/salt_states.git
            open_mode:
                True
            peer:
                ----------
                .*:
                    - network.interfaces
                    - network.get_hostname
            pillar_roots:
                ----------
                base:
                    - /srv/pillar/
            preserve_minion_cache:
                True
            state_verbose:
                False
            timeout:
                30
        minion:
            ----------
            master:
                dev-salt01
        no_install_packages:
            True
iggy commented 9 years ago

Even easier...

test:
  - test1
  - test.{{ salt['grains.get']('ocsenv') }}
bjackson@dev-salt01:~$ sudo salt-call grains.get ocsenv
local:
    dev
bjackson@dev-salt01:~$ sudo salt-call pillar.get test
local:
    - test1
    - test.
basepi commented 9 years ago

Thanks, @iggy.

iggy commented 9 years ago

I tried to do a git bisect to find the failure point. Unfortunately salt is not very bisectable (I got a lot of pillar errors on a bunch of runs). Below is the best data I could come up with. Good luck. Let me know if there's anything you want me to try.

bjackson@dev-salt01:~/salt$ git bisect skip 
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
583fa3d6cc480c0ba0a4562154de4d4130bedc44
db3d1a7426415df93cb5e84461da747b6d72955d
589eb58d95a09c466a4dd188bd9decc4a1ca1c4b
ebf67823971111b0988864f491eb62861cd8e7ca
480ee9842718a4c4216343cb5a6cfa15b3208a67
cb43d8649f0fefb28111499ac8241c693a92cc14
191682302def43be7a556ef0aacc70c4a7fbc3ed
bf89ca1bc89ae99f95d841de553683df2cf2f681
586b713b5ffc3b3b5f28e23824084ef95f1ea8b0
a08ab220e7923d2e7d8b075ff724ce5766addf0b
258c978b1110ad4c2f54be2689cf4a3b22e33798
ab2cbd928063f03d04a6d369a65c239386b98405
234b9c210e742b13b6f7beddbfb63408cc9d9073
f79a04ca6a595f9e2f533d19292e0551649b3d11
1402496e86199354f59161bd86ba399a88873c48
4260d211ddce44bb4bdf8988736b58ef2024a149
b16d828896d6a483e502bd835cc55ec66ab4911e
c346737914668ae09d13b079385e637f7bc33704
329c015ff6b835e83b0047d0d6d646fc322d65f9
509b4352f227fd969f6f1f4439be9010f6fcf0a6
39d899115cae9bbb9159676b272f158186e212a4
5f271dd5228a36cabb490dc54c22be9f810f6a1a
fa84d722e975f7db17a103b93e8dbbca06d6a7c9
We cannot bisect more!
iggy commented 9 years ago
bjackson@dev-salt01:~/salt$ git log | egrep -A1 '(583fa3d6cc480c0ba0a4562154de4d4130bedc44|db3d1a7426415df93cb5e84461da747b6d72955d|589eb58d95a09c466a4dd188bd9decc4a1ca1c4b|ebf67823971111b0988864f491eb62861cd8e7ca|480ee9842718a4c4216343cb5a6cfa15b3208a67|cb43d8649f0fefb28111499ac8241c693a92cc14|191682302def43be7a556ef0aacc70c4a7fbc3ed|bf89ca1bc89ae99f95d841de553683df2cf2f681|586b713b5ffc3b3b5f28e23824084ef95f1ea8b0|a08ab220e7923d2e7d8b075ff724ce5766addf0b|258c978b1110ad4c2f54be2689cf4a3b22e33798|ab2cbd928063f03d04a6d369a65c239386b98405|234b9c210e742b13b6f7beddbfb63408cc9d9073|f79a04ca6a595f9e2f533d19292e0551649b3d11|1402496e86199354f59161bd86ba399a88873c48|4260d211ddce44bb4bdf8988736b58ef2024a149|b16d828896d6a483e502bd835cc55ec66ab4911e|c346737914668ae09d13b079385e637f7bc33704|329c015ff6b835e83b0047d0d6d646fc322d65f9|509b4352f227fd969f6f1f4439be9010f6fcf0a6|39d899115cae9bbb9159676b272f158186e212a4|5f271dd5228a36cabb490dc54c22be9f810f6a1a|fa84d722e975f7db17a103b93e8dbbca06d6a7c9)'
commit 583fa3d6cc480c0ba0a4562154de4d4130bedc44
Author: Thomas Jackson <jacksontj.89@gmail.com>
--
commit db3d1a7426415df93cb5e84461da747b6d72955d
Author: Thomas Jackson <jacksontj.89@gmail.com>
--
commit ebf67823971111b0988864f491eb62861cd8e7ca
Author: Thomas Jackson <jacksontj.89@gmail.com>
--
commit 480ee9842718a4c4216343cb5a6cfa15b3208a67
Author: Thomas Jackson <jacksontj.89@gmail.com>
--
commit bf89ca1bc89ae99f95d841de553683df2cf2f681
Author: Thomas Jackson <jacksontj.89@gmail.com>
--
commit 586b713b5ffc3b3b5f28e23824084ef95f1ea8b0
Author: Thomas Jackson <jacksontj.89@gmail.com>
--
commit ab2cbd928063f03d04a6d369a65c239386b98405
Author: Thomas Jackson <jacksontj.89@gmail.com>
--
commit f79a04ca6a595f9e2f533d19292e0551649b3d11
Author: Thomas Jackson <jacksontj.89@gmail.com>
--
commit 1402496e86199354f59161bd86ba399a88873c48
Author: Thomas Jackson <jacksontj.89@gmail.com>
--
commit c346737914668ae09d13b079385e637f7bc33704
Author: Thomas Jackson <jacksontj.89@gmail.com>
--
commit 329c015ff6b835e83b0047d0d6d646fc322d65f9
Author: Thomas Jackson <jacksontj.89@gmail.com>
--
commit 5f271dd5228a36cabb490dc54c22be9f810f6a1a
Author: Thomas Jackson <jacksontj.89@gmail.com>

So maybe @jacksontj should be looking at this bug?

iggy commented 9 years ago

Here is my simplified grain:

/srv/salt/_grains/ocsenv.py

def gce_proj_metadata():

    return {'ocsenv': 'dev'}

if __name__ == '__main__':
    print gce_proj_metadata()

Here is the output from a working version:

bjackson@dev-salt01:~/salt$ sudo salt-call grains.get ocsenv ; sudo salt-call pillar.get test
local:
    dev
local:
    - test1
    - test.dev
jacksontj commented 9 years ago

@basepi Turns out there aren't any tests around the _pillar call on the master-- meaning this specific piece of magic is un-tested :( My PR (#21001) fixes this, but we should get tests in to cover this feature.

jacksontj commented 9 years ago

FTR the above PR will fix pillars, but not orchestrate (at least I can't confirm it).

jacksontj commented 9 years ago

After looking some more I'm not sure how/if grains where ever accessible in the orchestrate system. From looking at the code and docs it seems that the closest that's supported is grain-based targeting.

iggy commented 9 years ago

I can verify #21001 fixes the issue with the command line test. Orchestrate is still not working. I just went back and looked and we were using standard grains before (instead of our custom grains). So It may very well have been broken all along and these just hit me both at the same time so I assumed they were related.

TL;DR custom grains in pillars are fixed, still don't work in orchestrate

jacksontj commented 9 years ago

@iggy Can you paste an example of using grains with orchestrate?

iggy commented 9 years ago

Well, if you have the previously mentioned ocsenv grain, we have the following:

# The backups work by coalescing the application (TODO verify that's actually working) and then having the salt
# master call the google API to do the actual volume snapshots (--tags db) and
# then the application servers end the coalesce.

# each database server should
# FIXME orchestrate and gitfs don't play well together
# https://github.com/saltstack/salt/issues/19802
# TODO get storage only key instead of using my key
# TODO we need to wait till the coalesce is actually finished in the state before we do_snapshot
# FIXME need to enable WAL logging in postgres for coalesce to actually work
# requires logging into the google api as root
coalesce_db:
  salt.function:
    - name: postgres.psql_query
    - tgt: 'tags:db'
    - tgt_type: grain
    - arg:
      - "select pg_start_backup('foo');"

# we want our salt master to run the cloud snapshot script
# TODO at some point move this to use salt.cloud.* functionality
db_snapshot:
  salt.function:
    - name: cmd.run
    # FIXME switch to complex match, but it wasn't working
    - tgt: 'tags:master'
    - tgt_type: grain
    - arg:
      - /home/bjackson/deploy/cloud.py snapshot --tags db -- {{ salt['grains.get']('ocsenv') }}

unfreeze_db:
  salt.function:
    - name: postgres.psql_query
    - tgt: 'tags:db'
    - tgt_type: grain
    - arg:
      - "select pg_stop_backup();"

A minimal example:

test:
  salt.function:
    - name: cmd.run
    - tgt: 'minion'
    - arg:
      - /usr/bin/touch /tmp/test-{{ salt['grains.get']('ocsenv') }}-test
jacksontj commented 9 years ago

hmm, That one may be a bit complicated. The way the jinja templating works (at least everywhere else) is that we run that template before we pass it down to the runner-- so we template that entire YAML file. At that point its master-wide, not minion specific-- so "grains" doesn't mean anything. We could do the same templating further down the stack (right before the module execution) but it would be inconsistent with the other jinja templating stuff -- since you couldn't (for example) use grain anywhere outside of arg/kwargs. I'm not sure if thats more confusing than its worth or not-- of if we should come up with some other markup which makes it more obvious that it is different.

iggy commented 9 years ago

Well, grains at least partially work. The way we were doing it was using salt['grains.get']('nodename') ... So it's just the custom ones that aren't working.

jacksontj commented 9 years ago

Yea, is kinda odd because the "grains" you get there are the grains for the master host-- not the targeted minion. The "salt" object is all of the execution modules, and since this jinja templating runs on the master it'll get the master's grains (without some magic).

iggy commented 9 years ago

In my particular case, that would be enough (it's an environment that is going to be common to everything that connects to that particular master). But I do see how it could get confusing quickly.

How hard would it be to enable custom modules in the master and I can add something to the docs to try to make it clear what is and isn't available in orchestrate sls files.

iggy commented 9 years ago

That PR didn't actually fix the original bug. It fixed another one (pillars didn't have custom grains) that I wrongly thought was connected. I think custom grains in orchestrate (or other runners) is either non-fixable (and the behavior should be better documented somewhere) or is a long term fix.

jacksontj commented 9 years ago

@iggy I think you should open another issue for the feature request. This issue was labeled as a "bug" and since the one which was a bug is fixed, it makes some sense to close.

In that issue we can decide if we can support it, or document that we don't-- because I see that it is a bit confusing if you don't have the context of how its actually executed in the daemon.