liormizr / s3path

s3path is a pathlib extension for AWS S3 Service
Apache License 2.0
206 stars 39 forks source link

[Question/discussion] Why S3Path's smart_open usage defaults to compression='disable'? #93

Open Kache opened 2 years ago

Kache commented 2 years ago

Seeing as how S3Path leverages smart_open, I was surprised that:

S3Path('/mybucket/mypath/file.csv.gz').open()

Did not "autodetect" compression from file extension, as smart_open does by default:

smart_open.open(S3Path('/mybucket/mypath/file.csv.gz'))

As it turns out, S3Path sets compression='disable': https://github.com/liormizr/s3path/blob/master/s3path.py#L388

Any particular reason?

liormizr commented 2 years ago

Hi @Kache, thank you for opening the discussion. The all idea of S3Path is not to touch / add unexpected behaviour for the pathlib standard api. A developer that know pathlib should know all he need to know to use s3 with s3path.

Now, in the past year we did added a custom features that relates to our specific implementations and we did add some features that specific for s3. If you (or any body else) have a new api suggestion that won't interfere with that approach, I'm open to discussion it :-)

Kache commented 2 years ago

Ah, that makes sense. Thanks.

With that understanding, then my suggestion is to expose:

S3Path('/mybucket/mypath/file.csv.gz').smart_open()

# equivalent to
import smart_open

smart_open.open(S3Path('/mybucket/mypath/file.csv.gz'))

Could have the method passthrough arguments to smart_open as well.

I don't feel too strongly about it though, up to you. (Close issue if you'd like) I'll be calling smart_open myself in the meantime.

liormizr commented 2 years ago

Hi @Kache sounds good

It will take me sometime to work on it. If you want this feature fast, you are more then welcome to contribute and create a PR.