Open cpiment opened 5 months ago
There's a related archived ticket in https://puppet.atlassian.net/browse/PUP-9971
Just a clarification, f you specify the desired checksum type and value in the manifest, then the need for ETag
goes away:
file { '/tmp/file.txt':
ensure => file,
source => 'http://httpstat.us/200',
checksum => 'sha256',
checksum_value => 'f9bafc82ba5f8fb02b25020d66f396860604f496ca919480147fa525cb505d88',
}
But if you want the latest file from the server without having to make changes to the manifest, then ETag
or some other HTTP-based versioning is needed. Which means the agent would need to store the etag/version locally for any file that it's managing. I think the only hard part there is making sure we prune the local state when we no longer are managing a file(s), like we had to do in the state file.
Thanks for your reply. The disadvantage of using that method is that you have to keep track in your code of the checksum value every time the file changes, it would be great if Puppet could handle that automatically.
@joshcooper why would you need to maintain state? Puppet would just check the value of the ETag
header in response to a HTTP HEAD
request, just like is done with the existing header checks.
Migrated issue to PUP-12033
@kenyon with the other headers that @cpiment cited--all of the various checksums--we don't have to maintain state because we can independently calculate those checksums and compare with what the source is offering. An Etag
can be many different things. From MDN:
Typically, the ETag value is a hash of the content, a hash of the last modification timestamp, or just a revision number. For example, a wiki engine can use a hexadecimal hash of the documentation article content.
We don't know for sure what an Etag
will be, it isn't guaranteed to be the same thing across services, or even something that we can independently calculate. So to add support, Puppet would need to maintain state.
@mhashizume ah yeah duh, thanks.
I suppose thefile
resource could grow to have an etag
parameter. Maybe that's what @cpiment is suggesting? 🤷
Hi @kenyon, not really. An Etag
parameter would work the same as the checksum_value
that currently exists: the user code is responsible of maintaining updated the value of the Etag
in the code. It would be better if it could be handled automatically.
It would be better if it could be handled automatically.
For this to work, the agent would need to store the ETag
value for each managed file. When the agent next runs, it will request file metadata (done via HEAD
request)
It would need to extract the ETag
header from the HTTP response, like we do for MD5/SHA* checksums:
And it would need to compare the new ETag
against what it recorded earlier. If the values are different, then the agent knows the file needs updating.
This happens in the DataSync module which is mixed into the content
and source
parameters since there are two different ways of managing file content
It might be possible to store the ETag metadata in Puppet::Util::Storage
. It's currently used to record when resources were last checked, and if necessary, synced.
I was thinking that puppet/archive might be a good place to implement this.
Here is what appears to be an identical request, which, funnily, I closed: https://github.com/voxpupuli/puppet-archive/issues/363
Here is a possible implementation: https://github.com/voxpupuli/puppet-archive/pull/281#issuecomment-555208460
I was thinking that puppet/archive might be a good place to implement this.
It seems that puppet/archive handles compressed files, but implementing it in the File resource can solve this for any kind of files.
Here is a possible implementation: voxpupuli/puppet-archive#281 (comment)
I think this implementation works assuming that the S3 bucket Etag
header is and MD5 checksum of the file, but Etag
might or might not be a hash of the file, but a representation of the file version that must change every time the file contents change (see MDN docs)
Hi @joshcooper, is there any news on this issue or do you have any plan for this to be implemented?
Thanks in advance for your help
Use Case
When sourcing files from an http(s) source, in order to check if the file is already present Puppet, searches for these headers:
X-Checksum-Sha256
X-Checksum-Sha1
X-Checksum-Md5
Content-MD5
Last-Modified
Some servers (such as Gitlab) do not provide any of these headers, but they do provide an
Etag
header, which indicates the version of the resource that is going to be served.Describe the Solution You Would Like
Modify the code responsible for retrieving the metadata from http(s) resources (I think is this) to take into account the
Etag
header with more priority thanLast-Modified
since it seems thatLast-Modified
should be considered a fallback when there is noEtag
(as stated in MDN)Describe Alternatives You've Considered
Use other resource to download files, but I have not found any module in the forge that uses
Etag
as metadata of the version of the file.Additional Context
N/A