saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.1k stars 5.47k forks source link

KeyError in file.recurse state #8861

Closed WeLoveJesusChrist closed 10 years ago

WeLoveJesusChrist commented 10 years ago

Hello everyone, I'm new to this forum as well as SaltStack.

So I have a few servers running SaltStack (1 Salt-Master, who is also a Salt-Minion, and the other servers are Salt Minions). I have an .sls file that install python onto all the servers upon calling the command : salt '*' state.highstate

However, on the function "file.recurse", I tell the Master to transfer all the files located under salt://mainFiles/ onto the appropriate places of all the servers (/etc /opt). When I tell each server to run their highstate command (salt 'system1' state.highstate), it is ALWAYS 100% successful. But when I run the command on all of them at the same time (salt '*' state.highstate), the file.recurse would randomly fail. It state that file.recurse failed, stating that the source file located under salt://|mainFiles/.... doesn't exist (notice the | bar right after the two forward slashes). Any help?

WeLoveJesusChrist commented 10 years ago

Oh, I forgot to add. When I run it, it may fail, saying one file is missing. Then re-run it right afterward, then it says another file is missing. Sometimes, there are no failures at all! It's completely random! And yes, I made sure the files do exist. An example

Source file salt://|files/opt/ellington/ellington/chats/permissions.py not found

basepi commented 10 years ago

Are you only using file_roots or are you using one of the alternative fileserver backends? Additionally, what version of salt are you running? (salt --versions-report)

WeLoveJesusChrist commented 10 years ago

Hi Basepi. Sorry for the delay. Lots of things happened this passed 2 weeks so I had to put this project on hold.

"Are you only using file_roots or are you using one of the alternative fileserver backends?" I don't understand this question.. What do you mean?

Version: Salt: 0.17.2 Python: 2.6.6 (r266:84292, Nov 22 2013, 12:16:22) Jinja2: 2.2.1 M2Crypto: 0.20.2 msgpack-python: 0.1.13 msgpack-pure: Not Installed pycrypto: 2.0.1 PyYAML: 3.10 PyZMQ: 2.2.0.1 ZMQ: 3.2.4

basepi commented 10 years ago

If you can just show us your master config, that would answer my first question. =)

WeLoveJesusChrist commented 10 years ago

Lol sorry basepi... I honestly am not doing this on purpose... Master config file... where is this? You don't mean the top.sls file do you?

edit: If referring to /etc/salt/master, that thing is completely commented out.

basepi commented 10 years ago

Ah, thanks, /etc/salt/master was what I was wondering about.

This is a very strange issue, we'll see if we can reproduce it.

WeLoveJesusChrist commented 10 years ago

Thanks. I really appreciate you guys taking the time to help me out. =) As I have stated...

salt ' * ' state.highstate causes 4 of my server having a chance of failing on the file.recurse.... salt 'mediaserver' state.highstate is 100% successful. Never an error.

It's when i tell all 4 servers to do it, that's when there is a chance of failing. But when i tell 1 specific to run, no errors...

basepi commented 10 years ago

This is a weird one. Are there errors in the minion or master logs when these failures occur?

WeLoveJesusChrist commented 10 years ago

I don't know the answer to that.... I am on the Salt-Master server, I run salt '*' state.highstate, and it will be printed out on my Salt-Master Server. I presume master? If I didn't answer the question, can you tell me the log's location so I may look into it? Thanks basepi, always.

basepi commented 10 years ago

The logs are in /var/log/salt/master and /var/log/salt/minion

WeLoveJesusChrist commented 10 years ago

thanks. It's in the Minion. The master doesn't have the error in its log.

WeLoveJesusChrist commented 10 years ago

***doesn't have "that" specific error in the Master log.

basepi commented 10 years ago

Do you see any errors or warnings in those logs? Just want to make sure there's not some underlying issue that's causing this.

WeLoveJesusChrist commented 10 years ago

I looked into the log... there is a lot. I'm only going to give you the portion of the error/warning that occured after my last GIT update (which resolved a problem).


Cleaning up...

2013-12-10 14:10:49,801 [salt.loaded.int.render.yaml ][WARNING ] Duplicate Key: "mont1" found in salt:// environment=base 2013-12-10 14:11:13,380 [salt.state ][ERROR ] An exception occurred in this state: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/salt/state.py", line 1278, in call _cdata['args'], *_cdata['kwargs']) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1655, in recurse manage_file(dest, src) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1578, in manage_file **pass_kwargs) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1157, in managed dir_mode) File "/usr/lib/python2.6/site-packages/salt/modules/file.py", line 1955, in manage_file if source and source_sum['hsum'] != name_sum: KeyError: 'hsum'

2013-12-11 18:44:31,142 [salt.loaded.int.render.yaml ][WARNING ] Duplicate Key: "mont1" found in salt:// environment=base 2013-12-11 18:55:28,101 [salt.state ][ERROR ] An exception occurred in this state: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/salt/state.py", line 1278, in call _cdata['args'], *_cdata['kwargs']) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1655, in recurse manage_file(dest, src) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1578, in manage_file **pass_kwargs) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1157, in managed dir_mode) File "/usr/lib/python2.6/site-packages/salt/modules/file.py", line 1955, in manage_file if source and source_sum['hsum'] != name_sum: KeyError: 'hsum'

basepi commented 10 years ago

Before you edited your comment, it looked as though there was a git error in your log -- was that the problem you said your git update resolved?

Also, would you mind upgrading to the recently-released 0.17.4? It looks like you may have an include bug that we have since resolved.

WeLoveJesusChrist commented 10 years ago

May I ask how to do this?? Since I have already installed it using wget and curl as shown here: http://docs.saltstack.com/topics/installation/index.html

I ran it again but it is still version .17.2

WeLoveJesusChrist commented 10 years ago

Oh, and just so you know, I did try yum update salt... it said it is up to date.

basepi commented 10 years ago

Can you try again? 0.17.4 should have been pushed to epel stable end of last week.

WeLoveJesusChrist commented 10 years ago

I have just update it and ran the state.highstate. FIrst few times it failed. Then it passed 100% twice in a row, then it failed again... I don't know, could this be a problem on my end instead of Saltstacks, basebi? Here is the Minion Log from one of my Salt-Minion server.

/var/log/salt/minion

2013-12-11 18:55:28,615 [salt.state ][ERROR ] An exception occurred in this state: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/salt/state.py", line 1278, in call _cdata['args'], *_cdata['kwargs']) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1655, in recurse manage_file(dest, src) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1578, in manage_file **pass_kwargs) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1157, in managed dir_mode) File "/usr/lib/python2.6/site-packages/salt/modules/file.py", line 1955, in manage_file if source and source_sum['hsum'] != name_sum: KeyError: 'hsum'

2013-12-12 11:37:44,402 [salt.state ][ERROR ] An exception occurred in this state: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/salt/state.py", line 1278, in call _cdata['args'], *_cdata['kwargs']) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1655, in recurse manage_file(dest, src) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1578, in manage_file **pass_kwargs) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1157, in managed dir_mode) File "/usr/lib/python2.6/site-packages/salt/modules/file.py", line 1955, in manage_file if source and source_sum['hsum'] != name_sum: KeyError: 'hsum'

2014-01-07 12:05:53,470 [salt.state ][ERROR ] An exception occurred in this state: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/salt/state.py", line 1305, in call _cdata['args'], *_cdata['kwargs']) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1655, in recurse manage_file(dest, src) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1578, in manage_file **pass_kwargs) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1157, in managed dir_mode) File "/usr/lib/python2.6/site-packages/salt/modules/file.py", line 1972, in manage_file if source and source_sum['hsum'] != name_sum: KeyError: 'hsum'

2014-01-07 12:12:06,752 [salt.state ][ERROR ] An exception occurred in this state: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/salt/state.py", line 1305, in call _cdata['args'], *_cdata['kwargs']) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1655, in recurse manage_file(dest, src) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1578, in manage_file **pass_kwargs) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1157, in managed dir_mode) File "/usr/lib/python2.6/site-packages/salt/modules/file.py", line 1972, in manage_file if source and source_sum['hsum'] != name_sum: KeyError: 'hsum'

shantanub commented 10 years ago

I get very similar errors as well even without file.recurse. If I execute a state that manages a file, it works perfectly fine when executed on one minion, but if I use a grain to target a group of minions (say the rack the hosts are in) or *, some random minions will fail with the same cryptic hsum error.

It's not physical host or vm specific from what I can tell, though I'm definitely seeing it a lot with the hadoop physical nodes I'm standing up which is really annoying.

Repeating the call to the failed minions works fine but as you can imagine this is incredibly tedious to trace through outputs and repeat the calls individually to each minion where it failed.

I don't recall this happening in v.16 though I don't know that I had as many minions back then (not sure if number of minions really matters but in a group of 16+ nodes, at least one seems to always fail and as KatoneVi has noted, it appears to be a random minion in the group.

    State: - file
    Name:      /managed/scripts/configure_phy_bonding.py
    Function:  managed
        Result:    False
        Comment:   An exception occurred in this state: Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/salt/state.py", line 1305, in call
    *cdata['args'], **cdata['kwargs'])
  File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1157, in managed
    dir_mode)
  File "/usr/lib/python2.6/site-packages/salt/modules/file.py", line 1972, in manage_file
    if source and source_sum['hsum'] != name_sum:
KeyError: 'hsum'
WeLoveJesusChrist commented 10 years ago

did you got your answer yet? If not, send me your code and I will tell you how. -_-

On Tue, Feb 25, 2014 at 7:10 PM, shantanub notifications@github.com wrote:

I get very similar errors as well even without file.recurse. If I execute a state that manages a file, it works perfectly fine when executed on one minion, but if I use a grain to target a group of minions (say the rack the hosts are in) or *, some random minions will fail with the same cryptic hsum error.

It's not physical host or vm specific from what I can tell, though I'm definitely seeing it a lot with the hadoop physical nodes I'm standing up which is really annoying.

Repeating the call to the failed minions works fine but as you can imagine this is incredibly tedious to trace through outputs and repeat the calls individually to each minion where it failed.

I don't recall this happening in v.16 though I don't know that I had as many minions back then (not sure if number of minions really matters but in a group of 16+ nodes, at least one seems to always fail and as KatoneVi has noted, it appears to be a random minion in the group.

State: - file
Name:      /managed/scripts/configure_phy_bonding.py
Function:  managed
    Result:    False
    Comment:   An exception occurred in this state: Traceback (most recent call last):

File "/usr/lib/python2.6/site-packages/salt/state.py", line 1305, in call _cdata['args'], *_cdata['kwargs']) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1157, in managed dir_mode) File "/usr/lib/python2.6/site-packages/salt/modules/file.py", line 1972, in manage_file if source and source_sum['hsum'] != name_sum: KeyError: 'hsum'

Reply to this email directly or view it on GitHubhttps://github.com/saltstack/salt/issues/8861#issuecomment-36086478 .

shantanub commented 10 years ago

Not yet. This is code specific or are you talking about the state itself? I see this in quite a few of my states that manage files.

Is this something simple? What's your fix/workaround?

WeLoveJesusChrist commented 10 years ago

Just that one file. I'll highlight the area. It needs a unique id On Feb 27, 2014 6:09 PM, "shantanub" notifications@github.com wrote:

Not yet. This is code specific or are you talking about the state itself? I see this in quite a few of my states that manage files.

Is this something simple? What's your fix/workaround?

Reply to this email directly or view it on GitHubhttps://github.com/saltstack/salt/issues/8861#issuecomment-36315344 .

WeLoveJesusChrist commented 10 years ago

I don't have my laptop with me, so it's either in help now or I help you late tomorrow. It's the first line in a the box of commands. On Feb 27, 2014 6:09 PM, "shantanub" notifications@github.com wrote:

Not yet. This is code specific or are you talking about the state itself? I see this in quite a few of my states that manage files.

Is this something simple? What's your fix/workaround?

Reply to this email directly or view it on GitHubhttps://github.com/saltstack/salt/issues/8861#issuecomment-36315344 .

WeLoveJesusChrist commented 10 years ago

To clarify.... just one of the salt file. Sls that you experience the problem -_- On Feb 27, 2014 6:24 PM, "Katone Vi" katonevi@gmail.com wrote:

I don't have my laptop with me, so it's either in help now or I help you late tomorrow. It's the first line in a the box of commands. On Feb 27, 2014 6:09 PM, "shantanub" notifications@github.com wrote:

Not yet. This is code specific or are you talking about the state itself? I see this in quite a few of my states that manage files.

Is this something simple? What's your fix/workaround?

Reply to this email directly or view it on GitHubhttps://github.com/saltstack/salt/issues/8861#issuecomment-36315344 .

WeLoveJesusChrist commented 10 years ago

Found it. Look at my_id_1 and my_id_2 . They both must be different.

my_id_1: file.managed:

my_id_2: file.managed:

My problem was I ram it in a For loop... which repeats the unique id which is bad

For( my_id_1: file.managed:

Fins a way to change the my_id_1 to something unique throughout the ENTIRE salt run. On Feb 25, 2014 7:10 PM, "shantanub" notifications@github.com wrote:

I get very similar errors as well even without file.recurse. If I execute a state that manages a file, it works perfectly fine when executed on one minion, but if I use a grain to target a group of minions (say the rack the hosts are in) or *, some random minions will fail with the same cryptic hsum error.

It's not physical host or vm specific from what I can tell, though I'm definitely seeing it a lot with the hadoop physical nodes I'm standing up which is really annoying.

Repeating the call to the failed minions works fine but as you can imagine this is incredibly tedious to trace through outputs and repeat the calls individually to each minion where it failed.

I don't recall this happening in v.16 though I don't know that I had as many minions back then (not sure if number of minions really matters but in a group of 16+ nodes, at least one seems to always fail and as KatoneVi has noted, it appears to be a random minion in the group.

State: - file
Name:      /managed/scripts/configure_phy_bonding.py
Function:  managed
    Result:    False
    Comment:   An exception occurred in this state: Traceback (most recent call last):

File "/usr/lib/python2.6/site-packages/salt/state.py", line 1305, in call _cdata['args'], *_cdata['kwargs']) File "/usr/lib/python2.6/site-packages/salt/states/file.py", line 1157, in managed dir_mode) File "/usr/lib/python2.6/site-packages/salt/modules/file.py", line 1972, in manage_file if source and source_sum['hsum'] != name_sum: KeyError: 'hsum'

Reply to this email directly or view it on GitHubhttps://github.com/saltstack/salt/issues/8861#issuecomment-36086478 .

cachedout commented 10 years ago

The hsum issues and the file.recurse issue are probably related and both fixed in #8653.

shantanub commented 10 years ago

@KatoneVI: That isn't the issue I'm having. My state compiles and runs just fine and I'm not recursing so the filenames aren't being repeated in the state description. I would be getting errors about duplicates and the state wouldn't compile if I had the problem you described.

file.manage itself is reporting hsum errors for a handful of minions when I target groups of minions but runs without errors when I target a single minion. Here's an example of my cronjobs.archiveLifeCycleControllerLogs state:

{% if grains['manufacturer'] == 'Dell Inc.' %} 

  {% set hostList = ['PowerEdge R720xd','PowerEdge R720'] %} 

  {% if grains['productname'] in hostList %} 

/managed/scripts/archiveLifeCycleControllerLogs.py:
  file:
    - managed
    - source: salt://cronjobs/files/archiveLifeCycleControllerLogs.py
    - mode: 744
    - user: root
    - group: root

archive life cycle controller logs cronjob:
  cron:
    - present
    - name: /managed/scripts/archiveLifeCycleControllerLogs.py
    - user: root
    - minute: random
    - require:
      - file: /managed/scripts/archiveLifeCycleControllerLogs.py

  {% endif %} 

{% endif %}
cachedout commented 10 years ago

@shantanub As I mentioned above, those hsum errors should have been resolved by #8653. Since this is marked as a duplicate of that bug and that bug has since been closed, I'm going to close this one too. If this error returns for you on a version of Salt with the fix for #8653 applied, please leave a comment and we'll re-open this. Thanks.