saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Install Salt from the Salt package repositories here:
https://docs.saltproject.io/salt/install-guide/en/latest/
Apache License 2.0
14.19k stars 5.48k forks source link

salt 2014.7.0 reports zero changes as change (file.replace, cron.present) #18312

Closed gpkvt closed 9 years ago

gpkvt commented 9 years ago

After upgrading to 2014.7.0 some of my states are reported as a change, but they didn't change anything:

         ID: /var/drbd/www/html/ilias4/Services/Calendar/classes/class.ilDatePresentation.php
    Function: file.replace
      Result: None
     Comment: No changes were made
     Started: 11:04:43.935129
    Duration: 5.811 ms
     Changes:  

          ID: /usr/local/bin/ilias_cron.sh 2>/dev/null
    Function: cron.present
      Result: None
     Comment: Cron /usr/local/bin/ilias_cron.sh 2>/dev/null is set to be updated
     Started: 11:04:43.794855
    Duration: 125.617 ms
     Changes:   

This leads to false positives in a script monitoring pending changes via Icinga/Nagios.

Previous version works fine.

rallytime commented 9 years ago

@gpkvt Thanks for the report. It seems to me that your first state indicates that no changes were made, while your second state indicates that changes will be made. Are you running the command with a test=True? What do your states look like? What command are your running to execute the states?

djs52 commented 9 years ago

I confirm this -- I see it on any state that uses file.replace. Running a state with test=True produces, for example,

          ID: deadline-scheduler
    Function: file.replace
        Name: /etc/default/grub
      Result: None
     Comment: No changes were made
     Started: 18:31:06.400215
    Duration: 1.372 ms
     Changes:   

Note no changes are listed. Running the state "for real" without test=True, the ID is not listed, and no changes are made (as expected). This is a regression against 2014.1.0

gpkvt commented 9 years ago

@rallytime Indeed it was run with test=True. But the cronjob is in place and would not be updated when you run the state without test=True.

The state looks like this, all other cron.present-States were affected as well, cron.absent worked fine:

/usr/local/bin/ilias_cron.sh 2>/dev/null:
  cron:
  - present
  - user: root
  - minute: 55
  - require:
    - file: /usr/local/bin/ilias_cron.sh

It was executed by salt '*' state.highstate test=True followed by salt '*' state.highstate which produced the following result:

          ID: /usr/local/bin/ilias_cron.sh 2>/dev/null
    Function: cron.present
      Result: True
     Comment: Cron /usr/local/bin/ilias_cron.sh 2>/dev/null already present
     Started: 20:25:42.807124
    Duration: 126.945 ms
     Changes:
rallytime commented 9 years ago

Ah ok, I get it now. Thanks for the follow-up. So it looks like both of those states have a bug in them where when the states are run with test=True.

To summarize, the bug is that when both of these states are run with test=True, they are reporting Result: None which indicates that changes will be made when the state is run without test=True. However, if changes are to take place, Comment should report that changes will be made and Changes should report what change will occur. If not changes are to be made, the Result should be True. This fix should also ensure that the states do not actually execute when test=True is applied.

JensRantil commented 9 years ago

I can also recreate this issue. I think the severity of this could actually be higher. It's fairly common to restart processes after file.replace, which happened in my case.

@rallytime Just curious, why is this issue in "Blocked" milestone? What information do you need?

Let me know if you need anything.

robbydyer commented 9 years ago

+1 for higher severity. Having processes bounce themselves because of detected changes that didn't actually occur is bad news.

rallytime commented 9 years ago

Sorry for the late reply here, @JensRantil. There isn't a "Blocked" milestone on this anymore. It was blocked when we were waiting for more info, but I switched it to "Approved" when I added the bug/severity/etc labels.

Bumping the severity now.

JensRantil commented 9 years ago

Could there even be a hotfix for this one, perhaps?

cachedout commented 9 years ago

This may have been introduced with #12294 and fixed by dfa47a98

Could somebody else please verify that this the correct fix for the file.replace bug and we'll backport it.

JensRantil commented 9 years ago

@cachedout I can test this. Just making sure, should it be applied on master or minion?

cachedout commented 9 years ago

@JensRantil Minion

JensRantil commented 9 years ago

@JensRantil Minion

Cool. Rolled this out using the _states directory from master. Made a change to the state comments just to make sure my execution was actually using the updated module.

This may have been introduced with #12294 and fixed by dfa47a9. Could somebody else please verify that this the correct fix for the file.replace bug and we'll backport it.

I can confirm that dfa47a9 file.replace now correctly reports No changes needed to be made when in fact it is so. I can also confirm that Changes would have been made also is reported correctly when there are such. I ran these tests using salt minion state.highstate test=True. Let me know if you'd like me to test anything else.

Also, it looks like the color of the patch now is in green (see attached screenshot). It used to be yellow along with other changes. Is that a separate issue or related?

1__jrantil_tink-salt_____ssh_

cachedout commented 9 years ago

Cool, glad the functionality is fixed. I think the colours should be a separate issue if you wouldn't mind filing it as one. If you're satisfied with where we are on this one, I'll close this one. Thanks!

JensRantil commented 9 years ago

:+1: Filed a separate issue about the coloring here: https://github.com/saltstack/salt/issues/19958

JensRantil commented 9 years ago

Just noticed this didn't make it into 2014.7.1. Will it make it into 2014.7.2? I believe it's an important fix to get in there.

rallytime commented 9 years ago

@JensRantil It looks like that fix was made against the develop branch, and not the 2014.7 branch. I'll look at backporting #15701 so it will be available in the 2014.7.2 release.

JensRantil commented 9 years ago

Awesome, @rallytime! Thanks!

MoonSweep commented 9 years ago

I don't know if it's the right place to report this but feel free to direct me otherwise (private mail to Joe Healy maybe, who maintains the Debian packges ?).

With the new Debian package 2014.7.1+ds-1 update (not the one in the official Debian repos, but the one in the official Salt repos), not only this problem wasn't fixed, but now it also happens on file.managed (in addition to file.replace). Please fix this ASAP since all those false positives highlighted in yellow, whereas no change would actually be made, make test=True pretty useless.

joehealy commented 9 years ago

Hello,

regressions are painful.

Is the patch at https://github.com/saltstack/salt/pull/15701/files all that is required? to fix both the file.replace issue and file.managed issue or are they separate?

If there is a simple patch, I'm happy and able to do another set of packages tonight AU time (ie next few hours).

Cheers,

Joe

MoonSweep commented 9 years ago

Sorry, I can't answer that. All I can say is that for weeks this happened to file.replace, and now with the latest package update it happens to file.managed too.

rallytime commented 9 years ago

@MoonSweep The fix in question was not included in the 2014.7.1 release, so this isn't actually a packaging issue. This bug wasn't included in the 2014.7 branch until after 2014.7.1 was cut. It will be included in the 2014.7.2 release though with this change #20154.

Now, I haven't tested that this fix addresses file.managed, since the original issue was only for file.replace. Can you open a new issue so we can make sure that this gets addressed on its own?

@joehealy You don't need to worry about starting new packages for this issue. This particular fix will be in 2014.7.2, which should come out MUCH faster than 2014.7.1.

MoonSweep commented 9 years ago

Done: https://github.com/saltstack/salt/issues/20220

As for the Debian packages, an update with the backported fix(es) without waiting for 2014.7.2 to be released would still be much appreciated, because Salt is currently pretty unusable as is.

rallytime commented 9 years ago

Thanks for filing that new issue, @MoonSweep. As for whether or not those fixes should be backported and re-packaged, I will leave that up to @joehealy. However, please note that we should have 2014.7.2 released with in the two of weeks or so.

joehealy commented 9 years ago

In general, if there is a clear and simple patch that fixes a critical issue, I'm happy to do a new package release including the patch. I get a bit reluctant when the issue is difficult to reproduce or test, the fix is not clear or there is a simple workaround.

In this case, given it is a state that is "faulty", you could put a fixed version into /srv/salt/_states/file.py and sync that out to the minions. Given this is possible, I'm actually pretty reluctant to do a new package release just for this.

rallytime commented 9 years ago

Good call @joehealy. I should have suggested that earlier, but didn't think to. Thanks!

MoonSweep commented 9 years ago

Sorry, but I'd qualify this workaround as really dirty because it would "pollute" the filesystem of the master and all minions with a file that doesn't belong to any package, and as I understand it, would supersede not only current file.py, but future versions as well, unless I remember to remove these copies when the packages are upgraded to 2014.7.2. This completely defeats the very purpose of having a packaging system in the first place. For me, it's unacceptable, so I'll just wait until 2014.7.2 is packaged and suspend all sls-writing activity until then.

FWIW, when I started using Salt some month ago, I was very enthusiastic because it's actively developped and the community is very responsive; but after using it all this time, I'm thinking more and more about choosing another configuration management software. With all the time I invested in learning Salt skills it would be a PITA, but if the general QA doesn't improve soon, I won't have any choice.

Don't get me wrong, I'm not telling this to anger you or mock you or something, but if you keep this mindset, Salt will always have this amateurish feel that currently prevents it to reach the big market. For now I use it at home to manage half a dozen hosts and VMs but in its current state I wouldn't take the risk to use it at work to manage hundreds of hosts.

terminalmage commented 9 years ago

@MoonSweep You might be misunderstanding how syncing custom types works. Any files synced from _states, _modules, etc. are synced within the minion cachedir (/var/cache/salt/minion/....) and used instead of the version installed by the package. They don't replace the files installed by the package, so this would not qualify as "polluting" the filesystem to me. Otherwise, any cached files, not to mention the log files created by Salt, would also have to be considered filesystem pollution.

basepi commented 9 years ago

@terminalmage hit it on the head.

And correct me if I'm wrong, but all we're talking about here is a rogue None in a test=True run, right? I'm struggling to see why that would cause a need to "suspend all sls-writing activity" anyway.

In any case, we're sorry that your bugfix didn't make it into 2014.7.1, but were we to never package until all the bugs were fixed, we would never get any releases out. We're hoping to have 2014.7.2 out in the next week or two, and it will include this fix. Sorry for the inconvenience.

MoonSweep commented 9 years ago

@terminalmage No, I didn't misunderstand. If I implement this workaround, I'll have the real state file (from the package) and the one in the cachedir, in two separate places, and the latter will be used instead of the former (hence my use of the word supersede and not replace). My point is still valid: not only will this pollute the filesystem, but if I omit to remove those additional files before upgrading the package, they will be used in place of the package's ones, which is likely to bring a lot of confusion in the future.

@basepi How is one supposed to write/update/debug sls files when testing a highstate returns half of yellow output whereas only one or two files would actually be changed ? With --state-output=terse, the output messages mislead the user with a Result=differs yellow line for each file.replace and file.managed function call; you have to run the highstate with --state-output=full AND ignore/disable color output to tell if a change would actually be made or not. Admit that running an 80+ functions highstate with full output, and having to scrollback to carefully read each result instead or relying on colors, even for a limited (8 in my case) number of minions, is not a comfortable way to test sls files, to say the least. I'd rather wait for 2014.7.2 to be released and packaged, and hope that both issues will be fixed in the upgrade. If not, this will likely be the last straw for me.

terminalmage commented 9 years ago

Honestly, I'm just baffled at the pushback on this. I didn't think it was that inconvenient to remember to remove a file and resync. In fact, I've spoken with many users who are thrilled with the fact they're able to push modifications to states/modules/etc. like this, since it allows them to immediately apply workarounds and even copy in new functions slated for a future release so they can access them early.

I guess it's all a matter of perspective.

whiteinge commented 9 years ago

@MoonSweep the reason you can override core modules with the /srv/salt/_states mechanism is for exactly this kind of thing (among other use-cases as well).

Large software projects have bugs. Salt makes this bug extremely easy to work around until a fix can be released.

If you're interested in contributing a regression test for this module we would be delighted. I'd even be happy to walk you through the process if you're interested and not already familiar with it. That is the best way to improve the quality of Salt moving forward.

gpkvt commented 9 years ago

@MoonSweep Another option to get your files patched, without the need to remember to remove anything just ship the patched files out via salt-state and check for a specific salt-version:

{% if salt['grains.get']('saltversion') == '2014.7.1' %}
patchsalt_file:
  file:
  - managed
  - source: salt://minions/file.py
  - name: /usr/share/pyshared/salt/modules/file.py
{% endif %}
MoonSweep commented 9 years ago

@terminalmage Yes, I guess it's all a matter of perspective.

@whiteinge This is where we disagree. Modules are not meant to be overridden with a mechanism that's internal to Salt, because this impedes on the package manager's job. If a bug is discovered, it should be fixed and an updated version should be released, it's not up to the sysadmin to override modules to work around bugs. If a bug is laying around for too long, it means that there's a problem with the release frequency.

@gpkvt Overwriting files belonging to a package ? This is even worse than every workaround mentioned before. First of all, /usr and everything below it (except /usr/local) is package manager-land, period. It is NOT meant to be fiddled with in any way; only files in /etc and /var are meant to be manually modified by the sysadmin. Secondly, dpkg maintains a list of MD5 sums of all files managed by it. This feature is primarily used to tell if the sysadmin modified a configuration file, so as not to overwrite those changes when the package is upgraded; or to tell if a file has been altered, for example by malware. Messing with this list's integrity by modifying/overwriting something located in /usr is just asking for trouble; this could be done properly by diverting the file beforehand with dpkg-divert, so as to let the package manager know about this local modification, although it would only add to the burden of having to revert those modifications once the package is upgraded (because even with the version check, this would eventually become cruft, and have to be removed from the sls files at some point for clarity, in addition to un-diverting the file). But again, this impedes on the package manager's job.

Guys, I appreciate that you're trying your best to propose various ways of working around this bug, but like @terminalmage said, this is all a matter of perspective. Maybe I'm being overly strict here, but for a production environment, workarounds of this type are just unacceptable; only a new package with a module containing the fix would be an adequate solution.

whiteinge commented 9 years ago

@MoonSweep it's not a disagreement, it's a design decision. Overriding core modules has been in Salt since the beginning. You, as the sysadmin, have the choice of either working around a bug or waiting for the fix. A workaround is obviously not a replacement for packages and is, by definition, temporary.

You have chosen to wait for the fix. It's your choice and it's a fine choice. As mentioned above this fix is slated for the next point-release.

If you wish to speed a given fix, participation providing information, assistance, and attention on this issue tracker is invaluable. Salt is a large project, composed of 1100 execution functions, several hundred state functions, and many thousands of users spread across many platforms.

If you are interested in getting involved, you can read our Community Contribution Guide. Myself or anyone else on the Salt team would be more than happy to help if you have questions.