Open barneygale opened 1 year ago
Thanks @barneygale for letting us know As I looking at your PR with the new _PathBase implementation it doesn't really solve the issues that we have with pathlib extensions & integrations.
Instead of basing our Accessor layer in Accessor class (the old design that was removed from the pathlib few versions ago) basing S3Path from Path or from _PathBase it's not really matter from our point of view. When it will be in the python version we will do the change.
The issue that we do have is with how other packages react to path objects.
for example Pandas In Pandas, this is the api for read_csv (reading a csv file). if you are working with strings:
# for local path
pandas.read_csv('/path/to/table.csv', ...)
# or if you want to work with uri representation
pandas.read_csv('file://localhost/path/to/table.csv', ...)
And when you want S3 you can do like this:
# for S3 uri representation
pandas.read_csv('s3://path/to/table.csv', ...)
Now when you are working with Pandas and Path objects.
Pandas are using the __fspath__
method to understand what to do.
This brings two main issues.
from pathlib import Path
from s3path import S3Path
local_path = Path('/path/to/table.csv') pandas.read_csv(local_path, ...) # it's the same like passing '/path/to/table.csv'
s3_path = S3Path('/path/to/table.csv') pandas.read_csv(s3_path, ...) # it's the same like passing '/path/to/table.csv' instead of s3://path/to/table.csv
2. the second issue is that there is multiple ways to define uri
When we are asking from Path the uri it's different between what we are giving and when Panda expect from us
``` python
from pathlib import Path
local_path = Path('/path/to/table.csv')
pandas.read_csv(local_path.as_uri(), ...) # This will failed. file:///path/to/table.csv is not what Pandas expect
# not the same like:
pandas.read_csv('file://localhost/path/to/table.csv', ...) # This will failed. file:///path/to/table.csv is not what Pandas expect
My point is not that Pandas have issues, Pandas was just one example.
I think that we still have to think about the general purpose and design of methods like __str__
, __fspath__
, __repr__
.
Thinking about moving us for more uri's representation and how to give packages the tools / guidns to relay on them.
make sense? What do you think about that?
Thanks for the feedback, that makes perfect sense.
My hope is that APIs like pandas.read_csv()
will start accepting PathBase
objects, roughly like this:
def read_csv(path: pathlib.PathBase | os.PathLike):
if not isinstance(path, pathlib.PathBase):
path = pathlib.Path(path)
with path.open() as f:
...
It will take a while for third-party packages to begin accepting PathBase
, similar to how it took a while for them to begin accepting os.PathLike
.
Does that make sense?
On __fspath__()
: the implementation of PathBase
does not define __fspath__()
, because it would result in nonsense like open(ZipPath('README'))
opening a file called 'README' in the current directory -- very surprising!
On URIs: the implementation of PathBase.as_uri()
raises UnsupportedOperation
, because the path type might not use file:
URIs. You're right that file:
URIs are implemented differently across different packages. I'm not sure if leaning more heavily into URIs is the right way to go - it would involve adding, I think, a new __uri__()
magic method, and probably a registry of scheme handlers.
with open(path) as f:
Oh, I wasn’t expecting that for Path classes, only methods!
How can the builtin open
work with a TarPath for example?
Make perfect sense @barneygale Thanks for your answer
Regarding _PathBase
we will do the move when it will be available
Will you keep the privet name "_PathBase" or will it be changed to a public class?
with open(path) as f:
Oh, I wasn’t expecting that for Path classes, only methods! How can the builtin
open
work with a TarPath for example?
Shit sorry, horrible typo on my part. Will correct.
Regarding
_PathBase
we will do the move when it will be available Will you keep the privet name "_PathBase" or will it be changed to a public class?
It'll become a public class, hopefully in time for Python 3.13 (next year)
Hello - I've published CPython's private PathBase
class in a PyPI package: https://pypi.org/project/pathlib-abc/0.1.0/. No docs yet but I should have them ready soon.
If the PyPI package succeeds and matures, I'm hopeful we can make PathBase
public in Python itself. Hope that's helpful. Grateful for any feedback you might have!
@barneygale, I was just reading the Python 3.13 release notes. Congrats on the inclusion of your pathlib.Path.from_uri()
!!!
@barneygale Why from_uri
was added to the Path class and not to the PurePath class?
Both as_uri()
and from_uri()
are strictly impure because they use os.fsencode()
and os.fsdecode()
, which can vary from system to system. I also adjusted the documentation for as_uri()
so that it's documented as part of Path
rather than PurePath
, with a note:
For historical reasons, this method is also available from PurePath objects. However, its use of os.fsencode() makes it strictly impure.
Both
as_uri()
andfrom_uri()
are strictly impure because they useos.fsencode()
andos.fsdecode()
, which can vary from system to system. I also adjusted the documentation foras_uri()
so that it's documented as part ofPath
rather thanPurePath
, with a note:For historical reasons, this method is also available from PurePath objects. However, its use of os.fsencode() makes it strictly impure.
In my logic all IO oriented factuality need to be in the Path level And all string/path structure parsing need to be in the PurePath level
How ever I full understand the logic you specify :-)
I've logged a CPython PR that adds a private
pathlib._VirtualPath
class:https://github.com/python/cpython/pull/106337
If/when that lands, it's not much more work to drop the underscore and introduce
pathlib.VirtualPath
to the world. This would be Python 3.13 at the earliest.Would it be suitable for use in s3path? It would be great to hear your feedback!