Open samwaseda opened 7 months ago
💯 I am huge fan of this. Maybe it will require merging DirectoryObject
and FileObject
together.
The biggest catch I can think of will be to ensure that we exclusively extend rather than modify pathlib.Path
, as I'm sure there's a billion places where people rely on the pattern if isinstance(thing, pathlib.Path): thing_that_depends_on_Pathlike_behavior(thing)
, and it would be nice if it was safe to simply plug our objects in under these conditions.
Somewhat related, there has been some activity in pyiron_base
modifying the working directory behaviour, and it would be lovely if the snippets
solution could be used across the pyiron-verse. To that end, it would be good to double check what changes were made and why, and see whether we can adapt what's done here to accomplish the same thing.
Similarly, the stuff with browsing zipped content might be a cool feature to add in.
Maybe it will require merging
DirectoryObject
andFileObject
together.
Yeah that's actually my biggest concern. Right now I really like the fact that the files and directories are separated, because that's somewhat crucial that each node has its own working directory and file objects are derived from them, and all this is properly conceptualised.
It sounds like we're on the same page (albeit your page is more densely packed with knowledge). I guess a reasonable attack is to simply try moving forward with the re-parenting as see how it turns out.
I'm still a fan of this, is it still on your radar @samwaseda?
My complaints/wishlist atm is:
DirectoryObject
automatically .create()
s the directory whether I like it or notpathlib
and os
have no built-in method for recursively deleting all empty directories in a tree -- maybe we could get a tidy(root: Path | None)
method for this?Path
for type checks, etc.I gave it a try, but this turned out to be a bit more complicated than I thought. I cannot remember what was the problem but I think I often got into conflicts with some existing methods
- DirectoryObject automatically .create()s the directory whether I like it or not
I thought our strategy was to create it first and delete it if it's not used? But ok I can also see that the object should be smart enough to know it.
And I just realized that my last comment didn't answer your question at all. I guess the answer is yes, I still would like to give it a try, but maybe it's also good to think about what kind of interface we want. Since you raised this issue again, I presume you have something in mind?
For not creating stuff we don't need, I'd be content to start with modifying DirectoryObject.__init__
to take create: bool = True
as a kwarg, which is an easy backwards-compatible change. But I agree we should think about the desired interface, and a solution like that will get trampled by any inheritance from pathlib.Path
. Still, not a bad place to start.
For .tidy()
I was looking at this SO post, but I haven't implemented it myself anywhere.
I just asked ChatGPT what it thinks and this is the answer I got:
Deriving FileObject
and DirectoryObject
from pathlib.Path
is a possibility, but it would significantly change how these classes work and their interaction with the Path
object. Let's break down what this would entail and whether it makes sense.
DirectoryObject
from pathlib.Path
Pros:
Path
would allow you to leverage all the methods and properties of Path
directly within DirectoryObject
, reducing redundancy and making the class more powerful.DirectoryObject
class feel more "native" within the Python ecosystem.Cons:
DirectoryObject
and the general-purpose Path
class. For example, the create()
method and other custom logic would either need to override existing methods or be added in a way that doesn't conflict with Path
's existing behavior.Path
isn't quite what you'd expect for a DirectoryObject
. This could introduce subtle bugs or require extensive overrides.Path
is a general-purpose class, while DirectoryObject
is more specialized. Forcing DirectoryObject
to be a subclass of Path
could result in an awkward API if you're not careful.FileObject
from pathlib.Path
Pros:
DirectoryObject
, inheriting from Path
could reduce redundancy and allow you to leverage built-in methods for file manipulation.Cons:
FileObject
is currently more than just a file path; it's a wrapper that includes logic for writing, copying, etc. This added behavior might not fit naturally into the Path
subclass and could lead to confusion.If you decide to derive FileObject
and DirectoryObject
from pathlib.Path
, you would need to be very careful about maintaining the specialized behavior you want these classes to have. You'd have to override several methods to ensure that the classes behave in the specific ways you've designed them to.
However, considering the specific behaviors you’ve implemented (such as creating directories, managing file content, categorizing folder items, and the custom error handling), it might be more maintainable to not subclass from Path
and instead keep your current composition approach.
A more Pythonic approach might be to continue using composition, where DirectoryObject
and FileObject
wrap a Path
object internally (which you’re already doing). This gives you full control over the behavior while still leveraging Path
's functionality.
For instance:
class DirectoryObject:
def __init__(self, directory: str | Path | DirectoryObject):
self.path = Path(directory) if not isinstance(directory, DirectoryObject) else directory.path
self.create()
# The rest of the methods would interact with self.path
This way, you don't inherit all of Path
's behavior, but you can still use it internally, ensuring that DirectoryObject
and FileObject
behave exactly as you intend, without the risk of inherited behavior interfering with your design.
In summary, while inheriting from pathlib.Path
is possible, it would likely introduce complexity without significant benefit. The current approach using composition is more straightforward and gives you more control over the classes' behavior.
I guess the short answer is ChatGPT thinks we shouldn't
Hmm, #28 is not really what I had in mind. I was thinking of something more dramatic like replacing both classes with a single SuperPath(Path)
that has the extra functionality we need. In some cases we wouldn't need to transfer the current files
content because Path
already has the functionality -- e.g. Path.write_text
looks like a reasonable substitute for DirectoryObject.write
. In other places, Path
is just not flexible/powerful enough -- e.g. Path.rmdir
doesn't take any arguments and fails if the directory has content...if I want to delete it even when there's content there I have to import shutil
or whatever, which is nonsense.
Maybe we're coming at this from the wrong perspective though. Perhaps we should start with this: (1) where is pyiron_snippets.files
currently used, and what aspects of it are being useful? And (2) where are we currently using Path
and running into things we really wish were on-liners but aren't?
In https://github.com/pyiron/pyiron_workflow/pull/413 I no longer depend directly on files
, and that was the only use case I was aware of. However, there I run into a few places with pathlib
is a bit disappointing, e.g. I have snippets like
if filename.parent.exists() and not any(filename.parent.iterdir()):
filename.parent.rmdir()
But what I really want is just filename.parent.rmdir(only_empty=True)
Concretely, I'm imagining something like this to solve my issues:
from pathlib import Path, PosixPath
class SuperPosixPath(PosixPath):
def super_rmdir(
self,
missing_ok=True,
only_empty=True,
):
if not self.exists():
if missing_ok:
return
else:
raise FileNotFoundError()
elif self.exists() and not self.is_dir():
raise NotADirectoryError()
for sub in self.iterdir():
if sub.is_dir():
SuperPosixPath(sub).super_rmdir(only_empty=only_empty)
elif not only_empty:
sub.unlink() # Probably need to care about kwargs here for edge cases
if not any(self.iterdir()):
self.rmdir()
p = Path("my_example")
p1 = p / "sub"
p1f = p1 / "foo.txt"
p2 = p / "sub2"
p1.mkdir(parents=True, exist_ok=True)
p2.mkdir(parents=True, exist_ok=True)
p1f.write_text("blah blah")
super_p = SuperPosixPath(p)
super_p.super_rmdir()
print(p.is_dir(), p1.is_dir(), p1f.is_file(), p2.exists())
# True True True False
super_p.super_rmdir(only_empty=False)
print(p.exists())
# False
super_p.super_rmdir(only_empty=False)
# Does nothing, missing_ok defaults to true!
Of course work needs to be done to handle the Posix/Windows/Path
issue, and here I just make a brand new method instead of overriding and haven't given any real thought to what the best choice is for that, but the point here is is anyhow just to convey the concept.
Currently
DirectoryObject
is its own class. There are, however, functionalities that are given almost identically inPath
frompathlib
(whichDirectoryObject
anyway heavily uses). Furthermore there are functionalities that I would love to have but are not implemented inDirectoryObject
yet, such aspath_1 / path_2
etc. So I thought maybe it makes more sense to deriveDirectoryObject
fromPath
, so that people can straightforwardly use the same functionalities. What do you think? @liamhuber