s3tools / s3cmd

Official s3cmd repo -- Command line tool for managing S3 compatible storage services (including Amazon S3 and CloudFront).
https://s3tools.org/s3cmd
GNU General Public License v2.0
4.59k stars 905 forks source link

Options to no-check-md5 and skip-existing are ignored #882

Open chrisspen opened 7 years ago

chrisspen commented 7 years ago

I'm trying to sync two S3 buckets with the command:

chimit s3cmd sync --delete-removed --no-check-md5 --skip-existing s3://mysourcebucket s3://mydestbucket

and every time I run it, it reports:

Summary: 1598 source files to copy, 0 files at destination to delete

Even if I let it complete, and then re-run it, it reports that it again has to transfer 1598 files all over again, indicating that it's ignoring either the --no-check-md5 option or --skip-existing option.

The expected behavior, especially if --skip-existing is specified, is for s3cmd to completely ignore the files its already transferred. To not do this for buckets with a large number of files means that s3cmd wastes a huge amount of time re-transferring files unnecessarily. If the transfer completes, and I re-run it, with no changes having been made to any of the files, I would expect s3cmd to report "0 source files to copy".

jchook commented 6 years ago

I am also having this issue.

s3cmd sync --requester-pays --skip-existing --no-check-md5 s3://mybucket/myprefix .

Steps to reproduce:

  1. Start sync with the command above
  2. Interrupt sync after transferring about 50/100 files
  3. Run the exact same command

Expected behavior

I expected s3cmd to ignore existing files, and continue syncing at file 51/100.

Actual behavior

s3cmd does not ignore existing files in the download folder. It starts from the very first file, re-downloading and and overwriting existing files.

Environment

Notes

fviard commented 6 years ago

Thank you for your reports both of you, that helped me narrow down the 2 cases that produce such an issue.

I will look for a fix for these issues.

jimmywan commented 6 years ago

Any updates on this? Unfortunately s4cmd doesn't appear to support this either. :(

matthewboman commented 4 years ago

Still getting this issue with --skip-existing between two buckets

kakadais commented 3 years ago

This only happens between two buckets?

In my case, sync from local storage to bucket, --skip-existing looks like default option. Which means, existing files are skiped automatically, so now I'm searching for opposit option to 'overwirte existing files'.

So, any clear explanations?

jakubgs commented 2 years ago

Also seeing this issue. Specifying --skip-existing and --no-check-md5 has no effect:

INFO: Running stat() and reading/calculating MD5 values on X files, this may take some time...
robgwin commented 5 months ago

Bug still exists in 2024 with s3cmd version 2.4.0.

I am syncing an entire bucket from AWS S3 to local: s3cmd sync --skip-existing --no-check-md5 s3://my-bucket /local/path/to/my-bucket

If I interrupt and restart, all existing files are downloaded again, every time.

robgwin commented 5 months ago

After more research (i.e. comparing to a different sync utility, rclone) it appears that s3cmd is failing to account for the local system's timezone when setting the modification time.

For example, the AWS S3 website shows the file's modification time as: April 23, 2014, 21:15:52 (UTC-07:00)

That's a PDT display because I'm in California. The UTC equivalent is: 2014-04-24 04:15:52 UTC

When I copy that file with s3cmd sync, the resulting local file has this modification time: 2014-04-24 04:15:52 PDT

So my guess is s3cmd is using a UTC datetime string to set the local mod time, which the local system (MacOS in my case) is interpreting as being in its own timezone. Which ends up being the completely wrong time, which besides causing this particular sync issue is a bad thing in and of itself.

It should probably set mod times with unix timestamp instead?