hughsk / s3-sync

A streaming interface for uploading multiple files to S3.
Other
79 stars 27 forks source link

hash comparisons with s3 headers #6

Closed jameswyse closed 11 years ago

jameswyse commented 11 years ago

Hey mate,

Just been updating some apps to use this module (thanks!) and ran in to an issue where the files were always being re-uploaded even if they already exist. Looks like it's because the S3 ETAG header never matches the generated md5:

if (res.statusCode === 404 || res.headers.etag !== '"' + details.md5 + '"') return uploadFile(details, next)

Due to hashFile including the headers and destination in the hash. Commenting out these lines fixes it:

    hash.update(JSON.stringify([
        options.headers
      , destination
    ]))

It might be better to just hash the file contents at first to compare the ETAG and then update it again with the metadata before storing in leveldb. What do you think?

If I get some time this week I'll take a look and send a PR :)

hughsk commented 11 years ago

Hey man!

Ah, good catch - I'd forgotten we switched to using etag headers when I made this change. Added the headers to the hash for situations where we enabled/disabled gzip and the like but the cache wasn't being invalidated, so might go back to using x-amz-meta-* headers instead of etag.

If I get the chance today I'll see if I can fix it, but otherwise a PR is welcome :)

jameswyse commented 11 years ago

sweet :)

You can also send a Content-MD5 header with the actual upload which should cause putFile to call back with an error if the file gets corrupted en-route.

edit: actually it looks like knox takes care of this automatically :)

hughsk commented 11 years ago

Should be fixed in s3-sync@0.4.0, give it a shot and let me know! :)