Closed four43 closed 1 year ago
Hi @four43, Interesting topic.
When I want to upload file usually I do it like this:
local_file = Path('...')
remote_file = S3Path('...')
remote_file.write_bytes(local_file.read_bytes())
what you are saying is something else, you want "upload & delete source" right? If I remember correctly, The reason that we have a type modification is that we didn't implement cross types behaviors yet.
I'll raise it in our todo list, and update on this issue
That's really inefficient because it reads the entire file into memory. Using streams like boto3's upload methods are much better. I think Path's .rename interface is the most intuitive but it's limited and not extensible :-/. Pathlib is just kind of a bummer
we are using smart-open for optimization of the file object (not boto3's upload methods directly) Have you tested it? Maybe I can play with there parameters or something...
For example: https://github.com/RaRe-Technologies/smart_open/blob/develop/howto.md#how-to-write-to-s3-efficiently
Thanks for following up. Yeah I don't think it's inherently an issue with
S3Path, it's more of an issue with how Pathlib handles
self.rename(other)
. It doesn't check what the "other" object it's
renaming to, so it just tries file system stuff. I'm not sure how you're
going to hook into Path to get this functionality. Seems like it's almost
outside of your scope. If I were designing this interface for Python I'd
have rename open a readable stream from self, open a writable stream from
other and pipe them together. They just went about it in kind of an
inextensible way.
-Seth
On Mon, Sep 13, 2021 at 1:46 PM liormizr @.***> wrote:
we are using smart-open https://github.com/RaRe-Technologies/smart_open for optimization of the file object (not boto3's upload methods directly) Have you tested it? Maybe I can play with there parameters or something...
For example: https://github.com/RaRe-Technologies/smart_open/blob/develop/howto.md#how-to-write-to-s3-efficiently
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/liormizr/s3path/issues/88#issuecomment-918476793, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD6HDV2YXOV6CQNWMEBYADUBZBI5ANCNFSM5C5PFDKA .
For future readers, I currently use this, it works fine for big files too.
import shutil
def transfer_bytes(src: Path, dst: Path):
"""Performs a copy by reading and writing bytes. Useful when
src and dst are not in the same filesystem.
"""
with src.open("rb") as src_f:
with dst.open("wb") as dst_f:
shutil.copyfileobj(src_f, dst_f)
Crossing type boundaries to easily "upload" a file doesn't seem to work. Is there a better way to do this?
pathlib's path is just running their
_NormalAccessor
which is usingos.rename
it seems like. It doesn't really check thetarget
inPath.rename(target)