liormizr / s3path

s3path is a pathlib extension for AWS S3 Service
Apache License 2.0
208 stars 40 forks source link

`Path.rename(S3Path)` doesn't work #88

Closed four43 closed 1 year ago

four43 commented 3 years ago

Crossing type boundaries to easily "upload" a file doesn't seem to work. Is there a better way to do this?

pathlib's path is just running their _NormalAccessor which is using os.rename it seems like. It doesn't really check the target in Path.rename(target)

liormizr commented 3 years ago

Hi @four43, Interesting topic.

When I want to upload file usually I do it like this:

local_file = Path('...')
remote_file = S3Path('...')
remote_file.write_bytes(local_file.read_bytes())

what you are saying is something else, you want "upload & delete source" right? If I remember correctly, The reason that we have a type modification is that we didn't implement cross types behaviors yet.

I'll raise it in our todo list, and update on this issue

four43 commented 3 years ago

That's really inefficient because it reads the entire file into memory. Using streams like boto3's upload methods are much better. I think Path's .rename interface is the most intuitive but it's limited and not extensible :-/. Pathlib is just kind of a bummer

liormizr commented 3 years ago

we are using smart-open for optimization of the file object (not boto3's upload methods directly) Have you tested it? Maybe I can play with there parameters or something...

For example: https://github.com/RaRe-Technologies/smart_open/blob/develop/howto.md#how-to-write-to-s3-efficiently

four43 commented 3 years ago

Thanks for following up. Yeah I don't think it's inherently an issue with S3Path, it's more of an issue with how Pathlib handles self.rename(other). It doesn't check what the "other" object it's renaming to, so it just tries file system stuff. I'm not sure how you're going to hook into Path to get this functionality. Seems like it's almost outside of your scope. If I were designing this interface for Python I'd have rename open a readable stream from self, open a writable stream from other and pipe them together. They just went about it in kind of an inextensible way.

-Seth

On Mon, Sep 13, 2021 at 1:46 PM liormizr @.***> wrote:

we are using smart-open https://github.com/RaRe-Technologies/smart_open for optimization of the file object (not boto3's upload methods directly) Have you tested it? Maybe I can play with there parameters or something...

For example: https://github.com/RaRe-Technologies/smart_open/blob/develop/howto.md#how-to-write-to-s3-efficiently

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/liormizr/s3path/issues/88#issuecomment-918476793, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD6HDV2YXOV6CQNWMEBYADUBZBI5ANCNFSM5C5PFDKA .

gabrieldemarmiesse commented 1 year ago

For future readers, I currently use this, it works fine for big files too.

import shutil

def transfer_bytes(src: Path, dst: Path):
    """Performs a copy by reading and writing bytes. Useful when
    src and dst are not in the same filesystem.
    """
    with src.open("rb") as src_f:
        with dst.open("wb") as dst_f:
            shutil.copyfileobj(src_f, dst_f)