saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.09k stars 5.47k forks source link

archive.extracted doesn't extract when it should. #45363

Open dhs-rec opened 6 years ago

dhs-rec commented 6 years ago

Description of Issue/Question

I've written a state for installing the Sysinternals Suite from a ZIP archive:

{% set version = salt['pillar.get']('sysinternals:version', '20160829') %}
sysinternals:
  archive.extracted:
    - name: C:/Tools/SysinternalsSuite
    - source: salt://Sysinternals/SysinternalsSuite-{{ version }}.zip?saltenv=base
    - source_hash: salt://Sysinternals/sha512sums?saltenv=base
    - source_hash_name: SysinternalsSuite-{{ version }}.zip
    - source_hash_update: True
    - clean: True
    - enforce_toplevel: False
    - require:
      - file: sysinternals
  file.directory:
    - name: C:/Tools/SysinternalsSuite
    - makedirs: True

Now, when I execute this state for the first time, it creates the directory and extracts the archive as expected. When executed a second time (without changing the {{ version }}), it does nothing, as expected. It also extracts as expected when given a different {{ version }} for the first time. But here's where the fun starts: If I pass in the previous {{ version }} again, it doesn't extract again, but it should.

Setup

See above.

Steps to Reproduce Issue

First run:

root@master ~ # salt myminion state.apply test
myminion:
  Name: C:/Tools/SysinternalsSuite - Function: file.directory - Result: Changed Started: - 12:32:29.254000 Duration: 3092.0 ms
  Name: C:/Tools/SysinternalsSuite - Function: archive.extracted - Result: Changed Started: - 12:32:32.347000 Duration: 9905.0 ms

Summary for myminion
------------
Succeeded: 2 (changed=2)
Failed:    0
------------
Total states run:     2
Total run time:  12.997 s

2nd run, same version:

root@master ~ # salt myminion state.apply test
myminion:
  Name: C:/Tools/SysinternalsSuite - Function: file.directory - Result: Clean Started: - 12:33:02.737000 Duration: 4055.0 ms
  Name: C:/Tools/SysinternalsSuite - Function: archive.extracted - Result: Clean Started: - 12:33:06.793000 Duration: 6484.0 ms

Summary for myminion
------------
Succeeded: 2
Failed:    0
------------
Total states run:     2
Total run time:  10.539 s

3rd run, different version:

root@master ~ # salt myminion state.apply test pillar='{"sysinternals": {"version": "20161118"}}'
myminion:
  Name: C:/Tools/SysinternalsSuite - Function: file.directory - Result: Clean Started: - 12:37:31.944000 Duration: 6956.0 ms
  Name: C:/Tools/SysinternalsSuite - Function: archive.extracted - Result: Changed Started: - 12:37:38.901000 Duration: 10016.0 ms

Summary for myminion
------------
Succeeded: 2 (changed=1)
Failed:    0
------------
Total states run:     2
Total run time:  16.972 s

4th run, back to default version:

root@master ~ # salt myminion state.apply test
myminion:
  Name: C:/Tools/SysinternalsSuite - Function: file.directory - Result: Clean Started: - 12:38:04.759000 Duration: 4554.0 ms
  Name: C:/Tools/SysinternalsSuite - Function: archive.extracted - Result: Clean Started: - 12:38:09.314000 Duration: 6143.0 ms

Summary for myminion
------------
Succeeded: 2
Failed:    0
------------
Total states run:     2
Total run time:  10.697 s

Versions Report

root@master ~ # salt --versions-report
Salt Version:
           Salt: 2017.7.2

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.4.2
      docker-py: Not Installed
          gitdb: 0.6.4
      gitpython: 1.0.1
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: 1.0.3
   msgpack-pure: Not Installed
 msgpack-python: 0.4.6
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.12 (default, Nov 20 2017, 18:23:56)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.2.0
           RAET: Not Installed
          smmap: 0.9.0
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: Ubuntu 16.04 xenial
         locale: UTF-8
        machine: x86_64
        release: 4.4.0-1041-aws
         system: Linux
        version: Ubuntu 16.04 xenial

Same version on Windows minion.

dhs-rec commented 6 years ago

Looking at the minion cache, I see .hash files in C:\salt\var\cache\salt\minion\files\base, which contain SHA256 checksums of the ZIPs (_SysinternalsSuite-20160829.zip.hash and _SysinternalsSuite-20161118.zip.hash). However, the checksums in my file given via source_hash: are SHA512 ones.

garethgreenaway commented 6 years ago

Seems like perhaps an extraction is only happening if the archive is downloaded and cached locally, if it's already available then it might not extract.

dhs-rec commented 6 years ago

BTW: "- clean: True" also doesn't work. If I manually create a file or directory inside C:/Tools/SysinternalsSuite after the first run above, it's still there after the 3rd.

dhs-rec commented 6 years ago

Any news on this?

dhs-rec commented 5 years ago

It's been some time since I last asked...

terminalmage commented 5 years ago

The way that archive.extracted determines whether or not to extract is based solely on the filenames, not the contents or checksums of the individual files. So, if you change the version of the archive and all of the relative paths inside the zip file exist under the destination directory on the minion, Salt will not extract the archive.

For the state to extract when the contents of the archive change would require additional logic to compare checksums. It would also likely require the archive to be extracted to a temporary location first, which we have taken special care to avoid since larger archives could fill up a smaller partition if extracted twice.

All of this would slow down the state, which is one reason this was not implemented. If this were to be added, it would need to be gated behind an argument and ideally disabled by default.

JimKlapwijk commented 3 years ago

Is there any update on this? I wasn't able to figure a way around, like copy it to /tmp and then file.recurse all the files over to the correct location, however file.recurse only accept salt:// as source.

Thanks.

dhs-rec commented 3 years ago

@terminalmage, I'm not after deciding whether or not to extract based on archive content. I just want the state to behave as expected, i.e. when up-/downgrading versions, based on archive name and checksum (see original description).

You also wrote above:

So, if you change the version of the archive and all of the relative paths inside the zip file exist under the destination directory on the minion, Salt will not extract the archive.

This is not true, as you can also see in the original description. archive.extracted extracts the archive anew if I change the version (means archive name) and checksum, as expected, but it doesn't extract it again if I change them back to their previous values, which is not what I'd expect. And that's my main point here.

However, since this has just had its 3rd anniversary, it might be time to check again...

JimKlapwijk commented 3 years ago

I ran into this with Node Exporters from Prometheus. I extract them directly in /usr/bin, and changing version from 1.0.1 to 1.1.0 and back doesn't extract the files again.

The following example is on https://docs.saltproject.io/en/3000/ref/states/all/salt.states.archive.html#module-salt.states.archive but doesn't work as expected as it still doesn't extract the updated .tar.gz..

graylog2-server:
  archive.extracted:
    - name: /opt/
    - source: https://github.com/downloads/Graylog2/graylog2-server/graylog2-server-0.9.6p1.tar.lzma
    - source_hash: md5=499ae16dcae71eeb7c3a30c75ea7a1a6
    - source_hash_update: True
dhs-rec commented 3 years ago

BTW: @terminalmage, there's a compromise between extracting based on archive filename/checksum and archive content: archive table of contents. With both tar and unzip one can just list the contents of an archive without actually unpacking it and then compare the given names and sizes to what's already in the destination directory.

JimKlapwijk commented 3 years ago

@dhs-rec I think it should be more something like rsync, which checks the actual hash of the file.

My workaround:

download_extract_tarball:
  archive.extracted:
    - name: /usr/sbin/
    - source: {{ args['url'] }}
{% if args['hash'] is defined %}
    - source_hash: {{ args['hash'] }}
    - skip_verify: False
{% else %}
    - skip_verify: True
{% endif %}
    - overwrite: True
    - archive_format: 'tar'
    - user: {{ args['user'] }}
    - group: {{ args['group'] }}
    - options: '--strip-components 1'
    - enforce_toplevel: False
    - require:
      - group: {{ args['group'] }}_group
    - watch_in:
      - service: {{ exporter }}_restart
    - unless:
      - /usr/sbin/{{ exporter }} --version 2>&1 >/dev/null | grep {{ args['version'] }}

It just checks the version and if that's different it overwrites. Not ideal, and not for every situation, but for mine this works.

Garagoth commented 1 year ago

@terminalmage in other words, it is impossible to use archive.extracted to manage files if archive name changes with each release, as it is with most stuff published on github, correct? I think most archivers are able to list archive contents together with at least file size which could be used to determine if file was changed (it will differ from filesystem) - tar supports that (at least tarfile library that you are using), zip supports that as well. I think that would solve a lot of cases. For cases where this is not possible, maybe instead of extracting files from archive to check sum of every file (which would be ideal to guard against file corruptions), a simple option to archive.extracted would suffice: local_checksum_file: /file/with/source_hash/of/source/that/was/extracted.txt

And then if it is set and contents of that file differ from source_hash (or file does not exist), it means that extraction is needed.

Right now the workaround would be to use file.managed to download file and then in cmd.wait extract it if there were changes. Not ideal as it requires archive to be kept on all minions just to be albe to watch changes of that file.

jerrykan commented 1 year ago

Managing versioned archives has always been my biggest annoyance when working with Salt. Previously the "recommended" way to deal with them was by using pillar data to force updating an archive (see: https://github.com/saltstack/salt/issues/40484#issuecomment-292055505) but this always seemed very unsatisfactory to me.

I was looking through the Salt issues again to see if there has been any improvement in the situation, and reading through this issue it appears not. However as I was reading a nice-ish workaround came to mind. We could manage extracting the archive based on an version marker file.

{%- set archive_source = "https://example.com/path/to/archive_1.0.tar.gz -%}
{%- set archive_version_marker = '.salt-archive-filename_' ~ archive_source.split('/')[-1] -%}

extract-archive-file: 
  archive.extracted:
    - name: /opt/archive
    - source: {{ archive_source }}
    - skip_verify: True 
    - enforce_toplevel: False
    # The following options ensure the correct version is extracted
    - overwrite: {{ not salt['file.file_exists']('/opt/archive/' ~ archive_version_marker) }}
    - clean_parent: True

extract-archive-file-version-marker:
  file.managed:
    - name: /opt/archive/{{ archive_version_marker }}
    - require:
        archive: extract-archive_file

How this works is that when the archive is extracted an empty file with a name containing the archive filename (a version marker file) is created in the directory where the archive is extracted to. The next time Salt is run it checks to see if this file exists or not:

This obviously doesn't cover all possible scenarios, but it should help in a few.

An even better solution would be if Salt provided an option to natively support creating the version marker file (or some other "hidden" file with useful metadata - possibly containing the entire source URL and hash) and use that as a basis to determine if the archive file should be extracted or not.