spenczar / lektor-s3

Plugin to deploy a Lektor project to an S3 bucket
MIT License
44 stars 10 forks source link

Large files get re-uploaded on every deploy #8

Open krosaen opened 8 years ago

krosaen commented 8 years ago

When deploying a lektor site that hosts a podcast each mp3 file is re-uploaded each time:

$ AWS_PROFILE=lektor-deploy lektor deploy
Deploying to S3
  Build cache: /Users/krosaen/Library/Caches/Lektor/builds/47d71c9cd6c3bea470ee10d73758dde5
  Target: s3://brosaen.com
  adding podcasts/pistons/27/.__transQy8kFO
  adding podcasts/pistons/27/brosaen-episode-27.mp3
  adding podcasts/pistons/27/index.html
  updating index.html
  updating podcasts/pistons.xml
  updating podcasts/pistons/1/brosaen-episode-1.mp3
  updating podcasts/pistons/1/index.html
  updating podcasts/pistons/10/brosaen-episode-10.mp3
  updating podcasts/pistons/10/index.html
  updating podcasts/pistons/11/brosaen-episode-11.mp3
  updating podcasts/pistons/11/index.html
  updating podcasts/pistons/12/brosaen-episode-12.mp3
...

I'm guessing this is due to, as the code states,

MD5s can be stored in the 'ETag' field of S3 objects. The field doesn't store the MD5 in two cases: objects uploaded with Multipart Upload and objects encrypted with SSE-C or SSE-KMS. In those cases, we'll just return an empty string.

so perhaps this is a feature request—is there a way to set the etag field for larger objects?

spenczar commented 8 years ago

Amazon's docs say that, for multipart uploads, the ETag "will not necessarily be an MD5 hash of the object data." They don't actually say what it will be, just that it isn't the md5.

So, we don't have a confident way to know that the mp3s have not changed.

One option is to just look at file size and last-modified time for files that don't have usable ETags. This seems a little dangerous because the file contents could change without changing the size and we wouldn't know to push the update. Maybe we could disable it by default, but let users enable it through config if they understand the risk.

krosaen commented 8 years ago

Thanks for the additional details, Spencer, makes sense—I can certainly give something like you suggest a try if the redeployment of the mp3s really becomes a hassle.