liormizr / s3path

s3path is a pathlib extension for AWS S3 Service
Apache License 2.0
206 stars 39 forks source link

Patching pathlib to instantiate [Pure]S3Path class #106

Closed petrus-v closed 1 year ago

petrus-v commented 1 year ago

This proposal patch pathlib Path and PurePath in order to instantiate S3Path or PureS3Path class in case developer provide "s3://" uri. This allow to abstract the fact that we are using s3 path using PurePath/Path as factory that developer do not care that much which class is used behind the scene !

petrus-v commented 1 year ago

Probably it could be nicer to move this code in a init_monkey_patch_pathlib method letting developer know about that behaviour, what do you think ?

According your preference regarding this proposal I would like to add some documentation about it !

liormizr commented 1 year ago

Hi @petrus-v First thing I want to tank you for the idea I love when developers come and suggest how to improve the library and thinking outside the box :smile:.

At the beginning of this project I thought about it. How to integrate s3path to pathlib and what what those is actually means that s3path is another extension for pathlib.

Now, the goal for this library was always the same:

  1. simple and small s3 sdk with path API
  2. simple and small library that is verry easy to us and understand whats going on

The conclusion whats to create a small extension for pathlib, if pathlib have win/linux objects to add s3 objects with the same api.

Now I feel that mocking path will change this. It will make s3path a framework that change the standard library behavior. As a result s3path will be less explicit and will add side effects that we won't expect from a small library that extend pathlib capabilities

I tell you what I would have done if I had more time.

creating a new package called urilib :smile:. Adding there windows , linux, s3, ftp, ... Like this we will have one source to all the extensions that we will find/build without changing the standard library behavior

What do you think?

petrus-v commented 1 year ago

Thanks for your message !

When I was first looking for a library that use pathlib interface to manage s3 files, Because I've already some scripts that user pathlib interfaces that I wanted to use with files stored in S3 buckets. I was on the way think to create it from scratch but probably with https://github.com/s3tools/s3cmd instead boto3.

As much as possible I try to avoid to re-build the wheel and get experience from existing libraries. Thanks again for this nice works ;)

My time that I'm able to spend on this project, as yours, is also not infinite !

Does your idea with urilib is to keep using pathlib interface ?

My love on that would be contribute to python core to make pathlib a real interface and implement a factory pattern with hooks that we can implement properly new classes; s3 / ftp / mail? / ... in separate python lib.

Thinking more and trying to understand your proposal while writing this... If we start writting urilib I would make similar mokey patch (visible in this PR) and probably add an extension mechanism based on python entrypoints (https://docs.python.org/3/library/importlib.metadata.html#entry-points ) to be able to declare extension and be able to provide different interfaces outside of that lib.

So I would keep s3path implementation as it and add the factory implementation here. Probably make urilib not as required dependency so developer could keep working as today or with urilib.

I'm not sure if I'm clear and if we share the same target ?!

Other Idea as mentioned before

On this PR what could be done in short term is to move the monkey path in a dedicated method and let developer decide if he want to patch the standard library by calling such method, which shouldn't impact that much #107.

liormizr commented 1 year ago

I think that we are talking about the same thing

Something like if you do this:

.. code:: python

>>> from pathlib import Path

You get the regular std implementation

But if you will do like this:

.. code:: python

>>> from uri import Path
>>> p = Path('<protocol>:/...')

You will get S3Path or WindowsPath or PosixPath or in the future new protocols like HTTP/FTP/Google object storage ... Or if you want more explicit we can do:

.. code:: python

>>> from uri import get_path
>>> p = get_path('<protocol>:/...')

with the same results ...

I don't have a problem to add configuration for this for S3Path if it's easier for you and you still think that it can be a good feature for this library

Something like

.. code:: python

>>> from s3path import path_mocker
>>> path_mocker()

But in this approach Basically it means that you have to call it in the start of your program to make it work

petrus-v commented 1 year ago

Let's take the time to create this new package probably called something like URI Pathlib Factory uri-pathlib-factory that would provide primitives to implement new backends and make this s3path one of them !

I'm fine with that one but with no real preference ! The important things to me is that the factory use the same signature than pathlib.Path and pathlib.PurePath.

>>> from uri import Path
>>> p = Path('<protocol>:/...')

Do you want to host this package on your gh account closed to s3path, create a new organisation or I can host it ? Your choice is mine !

liormizr commented 1 year ago

Sound great! About the hosting that's your call :-) I'm OK with what ever you think

If the two projects get bigger and bigger we can always combine them and create one organisation in my opinion

Do you agree?

petrus-v commented 1 year ago

I'm closing this one because move patch and factory mechanism outside of this lib #111