larrabee / s3sync

Really fast sync tool for S3
GNU General Public License v3.0
476 stars 68 forks source link
amazon ceph-radosgw golang multithreading s3 sync sync-s3-bucket

S3Sync

Really fast sync tool for S3

Go Report Card GoDoc Build Status

Features

Key feature: very high speed.
Avg listing speed around 5k objects/sec for S3.
With 128 workers we get avg sync speed around 2k obj/sec (small objects 1-20 kb) (limited by 1Gb uplink).

Limitations

Usage

>> s3sync --help
Really fast sync tool for S3
VersionId: dev, commit: none, built at: unknown
Usage: cli [--sn] [--sk SK] [--ss SS] [--st ST] [--sr SR] [--se SE] [--tn] [--tk TK] [--ts TS] [--tt TT] [--tr TR] [--te TE] [--s3-retry S3-RETRY] [--s3-retry-sleep S3-RETRY-SLEEP] [--s3-acl S3-ACL] [--s3-storage-class S3-STORAGE-CLASS] [--s3-keys-per-req S3-KEYS-PER-REQ] [--fs-file-perm FS-FILE-PERM] [--fs-dir-perm FS-DIR-PERM] [--fs-disable-xattr] [--fs-atomic-write] [--filter-ext FILTER-EXT] [--filter-not-ext FILTER-NOT-EXT] [--filter-ct FILTER-CT] [--filter-not-ct FILTER-NOT-CT] [--filter-after-mtime FILTER-AFTER-MTIME] [--filter-before-mtime FILTER-BEFORE-MTIME] [--filter-modified] [--filter-exist] [--filter-not-exist] [--filter-dirs] [--filter-not-dirs] [--workers WORKERS] [--debug] [--sync-log] [--sync-log-format SYNC-LOG-FORMAT] [--sync-progress] [--on-fail ON-FAIL] [--error-handling ERROR-HANDLING] [--disable-http2] [--list-buffer LIST-BUFFER] [--ratelimit-objects RATELIMIT-OBJECTS] [--ratelimit-bandwidth RATELIMIT-BANDWIDTH] SOURCE TARGET

Positional arguments:
  SOURCE
  TARGET

Options:
  --sn                   Don't sign request to source AWS for anonymous access
  --sk SK                Source AWS key
  --ss SS                Source AWS session secret
  --st ST                Source AWS token
  --sr SR                Source AWS Region
  --se SE                Source AWS Endpoint
  --tn                   Don't sign request to target AWS for anonymous access
  --tk TK                Target AWS key
  --ts TS                Target AWS secret
  --tt TT                Target AWS session token
  --tr TR                Target AWS Region
  --te TE                Target AWS Endpoint
  --s3-retry S3-RETRY    Max numbers of retries to sync file
  --s3-retry-sleep S3-RETRY-SLEEP
                         Sleep interval (sec) between sync retries on error
  --s3-acl S3-ACL        S3 ACL for uploaded files. Possible values: private, public-read, public-read-write, aws-exec-read, authenticated-read, bucket-owner-read, bucket-owner-full-control
  --s3-storage-class S3-STORAGE-CLASS
                         S3 Storage Class for uploaded files.
  --s3-keys-per-req S3-KEYS-PER-REQ
                         Max numbers of keys retrieved via List request [default: 1000]
  --fs-file-perm FS-FILE-PERM
                         File permissions [default: 0644]
  --fs-dir-perm FS-DIR-PERM
                         Dir permissions [default: 0755]
  --fs-disable-xattr     Disable FS xattr for storing metadata
  --fs-atomic-write      Enable FS atomic writes. New files will be written to temp file and renamed
  --filter-ext FILTER-EXT
                         Sync only files with given extensions
  --filter-not-ext FILTER-NOT-EXT
                         Skip files with given extensions
  --filter-ct FILTER-CT
                         Sync only files with given Content-Type
  --filter-not-ct FILTER-NOT-CT
                         Skip files with given Content-Type
  --filter-after-mtime FILTER-AFTER-MTIME
                         Sync only files modified after given unix timestamp
  --filter-before-mtime FILTER-BEFORE-MTIME
                         Sync only files modified before given unix timestamp
  --filter-modified      Sync only modified files
  --filter-exist         Sync only files, that exist in target storage
  --filter-not-exist     Sync only files, that doesn't exist in target storage
  --filter-dirs          Sync only files, that ends with slash (/)
  --filter-not-dirs      Skip files that ends with slash (/)
  --workers WORKERS, -w WORKERS
                         Workers count [default: 16]
  --debug, -d            Show debug logging
  --sync-log             Show sync log
  --sync-log-format SYNC-LOG-FORMAT
                         Format of sync log. Possible values: json
  --sync-progress, -p    Show sync progress
  --on-fail ON-FAIL, -f ON-FAIL
                         Action on failed. Possible values: fatal, skip, skipmissing (DEPRECATED, use --error-handling instead) [default: fatal]
  --error-handling ERROR-HANDLING
                         Controls error handling. Sum of the values: 1 for ignoring NotFound errors, 2 for ignoring PermissionDenied errors OR 255 to ignore all errors
  --disable-http2        Disable HTTP2 for http client
  --list-buffer LIST-BUFFER
                         Size of list buffer [default: 1000]
  --ratelimit-objects RATELIMIT-OBJECTS
                         Rate limit objects per second
  --ratelimit-bandwidth RATELIMIT-BANDWIDTH
                         Set bandwidth rate limit, byte/s, Allow suffixes: K, M, G
  --help, -h             display this help and exit
  --version              display version and exit

Examples:

SOURCE and TARGET should be a directory. Syncing of single file are not supported (This will not work s3sync --sk KEY --ss SECRET s3://shared/megafile.zip fs:///opt/backups/s3/)

You can use filters.

Install

Download binary from Release page.
Or use docker image larrabee/s3sync like this:

docker run --rm -ti larrabee/s3sync --tk KEY2 --ts SECRET2 --sk KEY1 --ss SECRET1 -w 128 s3://shared/test/ s3://shared_new

Building

Minimum go version: 1.13
Build it with:

go mod vendor
go build -o s3sync ./cli 

Using module

You can easy use s3sync in your application. See example in cli/ folder.

License

GPLv3

Notes

s3sync is a non-destructive one-way sync: it does not delete files in the destination or source paths that are out of sync.