Derive `DirectoryObject` from `Path`?

samwaseda commented 7 months ago

Currently DirectoryObject is its own class. There are, however, functionalities that are given almost identically in Path from pathlib (which DirectoryObject anyway heavily uses). Furthermore there are functionalities that I would love to have but are not implemented in DirectoryObject yet, such as path_1 / path_2 etc. So I thought maybe it makes more sense to derive DirectoryObject from Path, so that people can straightforwardly use the same functionalities. What do you think? @liamhuber

liamhuber commented 7 months ago

💯 I am huge fan of this. Maybe it will require merging DirectoryObject and FileObject together.

The biggest catch I can think of will be to ensure that we exclusively extend rather than modify pathlib.Path, as I'm sure there's a billion places where people rely on the pattern if isinstance(thing, pathlib.Path): thing_that_depends_on_Pathlike_behavior(thing), and it would be nice if it was safe to simply plug our objects in under these conditions.

liamhuber commented 7 months ago

Somewhat related, there has been some activity in pyiron_base modifying the working directory behaviour, and it would be lovely if the snippets solution could be used across the pyiron-verse. To that end, it would be good to double check what changes were made and why, and see whether we can adapt what's done here to accomplish the same thing.

Similarly, the stuff with browsing zipped content might be a cool feature to add in.

samwaseda commented 6 months ago

Maybe it will require merging DirectoryObject and FileObject together.

Yeah that's actually my biggest concern. Right now I really like the fact that the files and directories are separated, because that's somewhat crucial that each node has its own working directory and file objects are derived from them, and all this is properly conceptualised.

liamhuber commented 6 months ago

It sounds like we're on the same page (albeit your page is more densely packed with knowledge). I guess a reasonable attack is to simply try moving forward with the re-parenting as see how it turns out.

liamhuber commented 1 month ago

I'm still a fan of this, is it still on your radar @samwaseda?

My complaints/wishlist atm is:

DirectoryObject automatically .create()s the directory whether I like it or not
pathlib and os have no built-in method for recursively deleting all empty directories in a tree -- maybe we could get a tidy(root: Path | None) method for this?
Not inheriting from Path for type checks, etc.

samwaseda commented 1 month ago

I gave it a try, but this turned out to be a bit more complicated than I thought. I cannot remember what was the problem but I think I often got into conflicts with some existing methods

DirectoryObject automatically .create()s the directory whether I like it or not

I thought our strategy was to create it first and delete it if it's not used? But ok I can also see that the object should be smart enough to know it.

samwaseda commented 1 month ago

And I just realized that my last comment didn't answer your question at all. I guess the answer is yes, I still would like to give it a try, but maybe it's also good to think about what kind of interface we want. Since you raised this issue again, I presume you have something in mind?

liamhuber commented 1 month ago

For not creating stuff we don't need, I'd be content to start with modifying DirectoryObject.__init__ to take create: bool = True as a kwarg, which is an easy backwards-compatible change. But I agree we should think about the desired interface, and a solution like that will get trampled by any inheritance from pathlib.Path. Still, not a bad place to start.

For .tidy() I was looking at this SO post, but I haven't implemented it myself anywhere.

samwaseda commented 1 month ago

I just asked ChatGPT what it thinks and this is the answer I got:

Deriving FileObject and DirectoryObject from pathlib.Path is a possibility, but it would significantly change how these classes work and their interaction with the Path object. Let's break down what this would entail and whether it makes sense.

Deriving `DirectoryObject` from `pathlib.Path`

Pros:

Inheritance from Path would allow you to leverage all the methods and properties of Path directly within DirectoryObject, reducing redundancy and making the class more powerful.
It would simplify path manipulation and make the DirectoryObject class feel more "native" within the Python ecosystem.

Cons:

You would lose the clear separation between the custom behavior in DirectoryObject and the general-purpose Path class. For example, the create() method and other custom logic would either need to override existing methods or be added in a way that doesn't conflict with Path's existing behavior.
You might encounter situations where the behavior of a method inherited from Path isn't quite what you'd expect for a DirectoryObject. This could introduce subtle bugs or require extensive overrides.
Path is a general-purpose class, while DirectoryObject is more specialized. Forcing DirectoryObject to be a subclass of Path could result in an awkward API if you're not careful.

Deriving `FileObject` from `pathlib.Path`

Pros:

Similar to DirectoryObject, inheriting from Path could reduce redundancy and allow you to leverage built-in methods for file manipulation.
Operations like checking if a file exists or getting its absolute path would become simpler and more intuitive.

Cons:

The same issues with potentially conflicting behaviors and API awkwardness apply here.
FileObject is currently more than just a file path; it's a wrapper that includes logic for writing, copying, etc. This added behavior might not fit naturally into the Path subclass and could lead to confusion.

Conclusion

If you decide to derive FileObject and DirectoryObject from pathlib.Path, you would need to be very careful about maintaining the specialized behavior you want these classes to have. You'd have to override several methods to ensure that the classes behave in the specific ways you've designed them to.

However, considering the specific behaviors you’ve implemented (such as creating directories, managing file content, categorizing folder items, and the custom error handling), it might be more maintainable to not subclass from Path and instead keep your current composition approach.

Alternative Approach: Composition Instead of Inheritance

A more Pythonic approach might be to continue using composition, where DirectoryObject and FileObject wrap a Path object internally (which you’re already doing). This gives you full control over the behavior while still leveraging Path's functionality.

For instance:

class DirectoryObject:
    def __init__(self, directory: str | Path | DirectoryObject):
        self.path = Path(directory) if not isinstance(directory, DirectoryObject) else directory.path
        self.create()

    # The rest of the methods would interact with self.path

This way, you don't inherit all of Path's behavior, but you can still use it internally, ensuring that DirectoryObject and FileObject behave exactly as you intend, without the risk of inherited behavior interfering with your design.

In summary, while inheriting from pathlib.Path is possible, it would likely introduce complexity without significant benefit. The current approach using composition is more straightforward and gives you more control over the classes' behavior.

samwaseda commented 1 month ago

I guess the short answer is ChatGPT thinks we shouldn't

liamhuber commented 1 month ago

Hmm, #28 is not really what I had in mind. I was thinking of something more dramatic like replacing both classes with a single SuperPath(Path) that has the extra functionality we need. In some cases we wouldn't need to transfer the current files content because Path already has the functionality -- e.g. Path.write_text looks like a reasonable substitute for DirectoryObject.write. In other places, Path is just not flexible/powerful enough -- e.g. Path.rmdir doesn't take any arguments and fails if the directory has content...if I want to delete it even when there's content there I have to import shutil or whatever, which is nonsense.

Maybe we're coming at this from the wrong perspective though. Perhaps we should start with this: (1) where is pyiron_snippets.files currently used, and what aspects of it are being useful? And (2) where are we currently using Path and running into things we really wish were on-liners but aren't?

In https://github.com/pyiron/pyiron_workflow/pull/413 I no longer depend directly on files, and that was the only use case I was aware of. However, there I run into a few places with pathlib is a bit disappointing, e.g. I have snippets like

        if filename.parent.exists() and not any(filename.parent.iterdir()):
            filename.parent.rmdir()

But what I really want is just filename.parent.rmdir(only_empty=True)

Concretely, I'm imagining something like this to solve my issues:

from pathlib import Path, PosixPath

class SuperPosixPath(PosixPath):

    def super_rmdir(
        self, 
        missing_ok=True, 
        only_empty=True,
    ):
        if not self.exists():
            if missing_ok:
                return
            else:
                raise FileNotFoundError()
        elif self.exists() and not self.is_dir():
            raise NotADirectoryError()

        for sub in self.iterdir():
            if sub.is_dir():
                SuperPosixPath(sub).super_rmdir(only_empty=only_empty)
            elif not only_empty:
                sub.unlink()  # Probably need to care about kwargs here for edge cases

        if not any(self.iterdir()):
            self.rmdir()

p = Path("my_example")
p1 = p / "sub"
p1f = p1 / "foo.txt"
p2 = p / "sub2"
p1.mkdir(parents=True, exist_ok=True)
p2.mkdir(parents=True, exist_ok=True)
p1f.write_text("blah blah")

super_p = SuperPosixPath(p)
super_p.super_rmdir()
print(p.is_dir(), p1.is_dir(), p1f.is_file(), p2.exists())
# True True True False
super_p.super_rmdir(only_empty=False)
print(p.exists())
# False
super_p.super_rmdir(only_empty=False)
# Does nothing, missing_ok defaults to true!

Of course work needs to be done to handle the Posix/Windows/Path issue, and here I just make a brand new method instead of overriding and haven't given any real thought to what the best choice is for that, but the point here is is anyhow just to convey the concept.

pyiron / pyiron_snippets