peak / s5cmd

Parallel S3 and local filesystem execution tool.
MIT License
2.7k stars 239 forks source link

feat: optional --range argument for cp to download single part of object #772

Open mackenzie-grimes-noaa opened 1 week ago

mackenzie-grimes-noaa commented 1 week ago

Adds an optional string argument --range to cp command, which exposes the existing AWS GetObject Range header to provide a specific byterange of the object to be copied.

Now s5cmd users can optionally set this header manually and only download a specific part of their src object.

Example:

s5cmd cp --range bytes=500-999 's3://mybucket/foo/bar/file.txt' my_partial_file.txt

Obviously makes any --concurrency or --part_size arguments redundant when --range is specified, since only 1 part will be downloaded, using only 1 thread.

Note: AWS GetObject only supports specifying a single byte range, so we are also constrained by this limitation. An s5cmd user would have to run multiple cp commands to download, for example, bytes ranging from 100-199 (bytes=100-199) and from 300-399 (bytes=300-399).

Solves this Issue: https://github.com/peak/s5cmd/issues/756