Closed barneygale closed 8 months ago
Any public API that Pathy could hang off of would be very useful. Please feel free to update this issue when there is a build version that I can use.
Hello - I've published CPython's private PathBase
class in a PyPI package: https://pypi.org/project/pathlib-abc/0.1.0/. No docs yet but I should have them ready soon.
If the PyPI package succeeds and matures, I'm hopeful we can make PathBase
public in Python itself. Let me know if you have any problems/questions? Thanks!
I was about to resort to doing something similar to this package but baked into Pathy, so you saved me a lot of time and duplication. It's especially nice knowing this might feed back into Python and that this package could serve as a fallback after that.
I will spend some time today looking into integrating this with Pathy and ask questions as they arise.
Thanks for publishing a package that contains this functionality 🙇
@barneygale, So far, so good; there are some things I have to refactor to account for the pathlib_abc library not providing a concrete file system implementation, but I think it will ultimately simplify things.
The main question that's come up is whether you plan to add type hints or would be willing to accept a PR to add them. I haven't delved deep into why, but mypy/Pylance both dislike the pathlib_abc module here and there, warning about Unknown base class types and improper subclass signatures. I can add type ignores or reimplement the properties to add signatures, but that feels kind of icky.
Here are some examples of what I mean
PathBase.root
is partially known: str | bytes | Unknown
Overridden methods have an implicit return type that conflicts with the concrete implementation expectations
The typing bits aside, there are no blockers so far; I should be able to open a PR soon.
I'm adding support for __lt__
to support sorted
, and I notice the pathlib class has "_cparts" cached property that's used in the < operator:
@property
def _cparts(self):
# Cached casefolded parts, for hashing and comparison
try:
return self._cached_cparts
except AttributeError:
self._cached_cparts = self._flavour.casefold_parts(self._parts)
return self._cached_cparts
Can you tell me anything about the significance of this cache? I'm all for caches when needed, but that you've left it out of the ABC library makes me wonder how much it contributes? Also, I see that you've cached most properties in the ABC classes, so I wonder if there's maybe a simpler solution you had in mind with comparable performance using the existing cached properties?
It's only important in pathlib because Windows paths need to perform case-insensitive comparisons, and even then I'm not sure if it has a huge effect!
FWIW I've just removed most/all caching in the ABCs. PyPI release coming up...
One troubling thing I found is that Pathy
will no longer inherit from pathlib.Path
while using this library. This comes with some negative side-effects, like I noted in another issue, where a serialization library that spaCy uses has a function to coerce strings to Path objects if they're not already that type:
def force_path(location, require_exists=True):
if not isinstance(location, Path):
location = Path(location)
if require_exists and not location.exists():
raise ValueError(f"Can't read file: {location}")
return location
Pathy objects used to make it through unscathed, but now they get corrupted and transformed into Path objects that don't work.
Pathy isn't so widely used that it will be a significant problem, but it's something to keep in mind. Do you see any workarounds I might be missing that could keep this compatibility?
That sounds like a bug in spacy, judging by this bit of their CONTRIBUTING.md
:
Code that interacts with the file-system should accept objects that follow the
pathlib.Path
API, without assuming that the object inherits frompathlib.Path
. If the function is user-facing and takes a path as an argument, it should check whether the path is provided as a string. Strings should be converted topathlib.Path
objects.
(edit: sorry, just read your comment on the issue and realised you're ahead of me!)
They might be interested in depending on pathlib_abc
- I'll reach out.
Yeah, it was just an example of the type of side-effect that could happen in other libraries. They have a more "correct" implementation in spaCy directly, i.e.
def ensure_path(path: Any) -> Any:
"""Ensure string is converted to a Path.
path (Any): Anything. If string, it's converted to Path.
RETURNS: Path or original argument.
"""
if isinstance(path, str):
return Path(path)
else:
return path
So this is probably a bug in srsly
and how it interacts with spaCy.
With pathlib_abc
that method could be implemented as follows:
from pathlib_abc import PathBase
from pathlib import Path
def ensure_path(path: PathBase | os.PathLike) -> PathBase:
if not isinstance(path, PathBase):
path = Path(path)
return path
But srsly
seems to be a low-dependencies affair, so it could instead adopt spaCy's string check behaviour.
In general I don't think it's correct to inherit from pathlib.Path
unless your object represents a local filesystem path. That's sort-of the whole point of Path
.
In general I don't think it's correct to inherit from
pathlib.Path
unless your object represents a local filesystem path.
I agree with you on that. I always felt awkward using internals and inheriting behavior that didn't apply to my use-case. Still, sometimes we work with solutions that aren't ideal because they're easily grasped.
Moving to pathlib_abc feels much cleaner. It will be nice not to refactor Pathy again next year and the year after with each new Python release because the pathlib internals were refactored. 😅
I've finished work on the PR to replace pathlib.Path base with pathlib_abc. While it's not required, @barneygale if you'd like to review my changes, I'd welcome any feedback you have.
FYI: I pinned the version of pathlib_abc, so if you make significant changes, let me know, and I'll update it.
:tada: This issue has been resolved in version 0.11.0 :tada:
The release is available on GitHub release
Your semantic-release bot :package::rocket:
@barneygale the pathlib_abc experience has been great so far. Reopening this in the event you wanted to follow up more. Github auto closed it because I noted it in the PR. 😅
One thing I noticed while working on this is that pathlib_abc has some methods that only make sense for file systems. For cloud storage they're just extra code that wouldn't really work as expected if you called it.
I took a quick pass and trimmed the PathBase
class def to include the methods I think could be removed and exist solely in pathlib proper. They don't harm Pathy, but they don't help either.
class PathBase(PurePathBase):
def is_mount(self):
...
def is_symlink(self):
...
def is_junction(self):
...
def is_block_device(self):
...
def is_char_device(self):
...
def is_fifo(self):
...
def is_socket(self):
...
def readlink(self):
...
def symlink_to(self, target, target_is_directory=False):
...
def hardlink_to(self, target):
...
None of the cloud providers support symbolic links, so at best we could implement it with application logic and metadata. That might be a good argument for keeping it, given the utility of symlinks.
If you implement stat()
you should find that the is_
methods return False
, and you can ignore the other methods if your filesystem doesn't support em (they'll raise UnsupportedOperation
). They make sense in some situations, e.g. TarPath
in the docs supports most of these methods.
They make sense in some situations, e.g. TarPath in the docs supports most of these methods.
Sure, that makes sense. Like I said, they don't hurt anything, I just wanted to provide some feedback.
What's your stance on conda-forge? Pathy has a conda recipe and it can't resolve pathlib_abc 😓 I'm not sure how many people consume pathy through conda-forge.
FYI, I've published pathlib-abc 0.2.0. Changelog: https://pathlib-abc.readthedocs.io/en/latest/changes.html
What's your stance on conda-forge? Pathy has a conda recipe and it can't resolve pathlib_abc 😓 I'm not sure how many people consume pathy through conda-forge.
(As an observer) conda-forge is community-led and anyone can submit a package as long as they're prepared to maintain the recipe; you don't need permission. The main question is whether @barneygale wants to officially support and maintain it himself. I'd raise this in the pathlib_abc issue tracker rather than here.
Note that you can have multiple maintainers, so you could submit the package with both you and Barney listed as maintainers, if that's what he wants.
FYI, I've published pathlib-abc 0.2.0. Changelog: https://pathlib-abc.readthedocs.io/en/latest/changes.html
I've been working on some updates to my Mathy project recently. I'll check out the latest pathlib_abc sometime soon.
I'd raise this in the pathlib_abc issue tracker rather than here.
Yeah, well Barney did ask for questions on this issue, but you're probably right.
I'll close this issue for now.
I've logged a CPython PR that adds a private
pathlib._VirtualPath
class:https://github.com/python/cpython/pull/106337
If/when that lands, it's not much more work to drop the underscore and introduce
pathlib.VirtualPath
to the world. This would be Python 3.13 at the earliest.Would it be suitable for use in pathy? It would be great to hear your feedback!