It's an extremely common use case to want to copy a file to/from/between S3 buckets. The dst.write_text(src.read_text()) solution is simple, but not viable for large files.
Luckily, I came across this gem by @gabrieldemarmiesse using shutil.copyfileobj. This works well for all combinations of pathlib.Path and s3path.S3Path as source and destination.
That comment is very difficult to discover, especially when searching for issues. Some more prominent discussions are https://github.com/liormizr/s3path/issues/98 (no solution) and https://github.com/liormizr/s3path/issues/44 (overly-complicated solution). Thus I think we should codify the solution in the documentation to make it easily discoverable.
I tried a few things to optimize this code:
There is also a function called shutil.copyfile that works wtih pathlib.Path objects. Unfortunately this function calls
open(p, 'rb')
which fails on s3path.S3Path objects with
FileNotFoundError: [Errno 2] No such file or directory: '/bucket-name/filename'
Therefore, it's necessary to open the file handles ourselves.
Note that shutil.copyfileobj has an optional length argument for the buffer size. Its default value is defined as
A few quick experiments with my setup (Linux with fiber internet) shows that the copy duration is insensitive to length until it drops to the ~1024 range, so I don't think we should suggest modifying this parameter.
It's an extremely common use case to want to copy a file to/from/between S3 buckets. The
dst.write_text(src.read_text())
solution is simple, but not viable for large files.Luckily, I came across this gem by @gabrieldemarmiesse using
shutil.copyfileobj
. This works well for all combinations ofpathlib.Path
ands3path.S3Path
as source and destination.That comment is very difficult to discover, especially when searching for issues. Some more prominent discussions are https://github.com/liormizr/s3path/issues/98 (no solution) and https://github.com/liormizr/s3path/issues/44 (overly-complicated solution). Thus I think we should codify the solution in the documentation to make it easily discoverable.
I tried a few things to optimize this code:
There is also a function called
shutil.copyfile
that works wtihpathlib.Path
objects. Unfortunately this function callswhich fails on
s3path.S3Path
objects withTherefore, it's necessary to open the file handles ourselves.
Note that
shutil.copyfileobj
has an optionallength
argument for the buffer size. Its default value is defined asA few quick experiments with my setup (Linux with fiber internet) shows that the copy duration is insensitive to
length
until it drops to the ~1024 range, so I don't think we should suggest modifying this parameter.