pyiron / pyiron_snippets

Short, dependency-free pieces of code that are useful for pyiron (and python in general)
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Derive `DirectoryObject` from `Path`? #13

Open samwaseda opened 7 months ago

samwaseda commented 7 months ago

Currently DirectoryObject is its own class. There are, however, functionalities that are given almost identically in Path from pathlib (which DirectoryObject anyway heavily uses). Furthermore there are functionalities that I would love to have but are not implemented in DirectoryObject yet, such as path_1 / path_2 etc. So I thought maybe it makes more sense to derive DirectoryObject from Path, so that people can straightforwardly use the same functionalities. What do you think? @liamhuber

liamhuber commented 7 months ago

💯 I am huge fan of this. Maybe it will require merging DirectoryObject and FileObject together.

The biggest catch I can think of will be to ensure that we exclusively extend rather than modify pathlib.Path, as I'm sure there's a billion places where people rely on the pattern if isinstance(thing, pathlib.Path): thing_that_depends_on_Pathlike_behavior(thing), and it would be nice if it was safe to simply plug our objects in under these conditions.

liamhuber commented 7 months ago

Somewhat related, there has been some activity in pyiron_base modifying the working directory behaviour, and it would be lovely if the snippets solution could be used across the pyiron-verse. To that end, it would be good to double check what changes were made and why, and see whether we can adapt what's done here to accomplish the same thing.

Similarly, the stuff with browsing zipped content might be a cool feature to add in.

samwaseda commented 6 months ago

Maybe it will require merging DirectoryObject and FileObject together.

Yeah that's actually my biggest concern. Right now I really like the fact that the files and directories are separated, because that's somewhat crucial that each node has its own working directory and file objects are derived from them, and all this is properly conceptualised.

liamhuber commented 6 months ago

It sounds like we're on the same page (albeit your page is more densely packed with knowledge). I guess a reasonable attack is to simply try moving forward with the re-parenting as see how it turns out.

liamhuber commented 1 month ago

I'm still a fan of this, is it still on your radar @samwaseda?

My complaints/wishlist atm is:

samwaseda commented 1 month ago

I gave it a try, but this turned out to be a bit more complicated than I thought. I cannot remember what was the problem but I think I often got into conflicts with some existing methods

  • DirectoryObject automatically .create()s the directory whether I like it or not

I thought our strategy was to create it first and delete it if it's not used? But ok I can also see that the object should be smart enough to know it.

samwaseda commented 1 month ago

And I just realized that my last comment didn't answer your question at all. I guess the answer is yes, I still would like to give it a try, but maybe it's also good to think about what kind of interface we want. Since you raised this issue again, I presume you have something in mind?

liamhuber commented 1 month ago

For not creating stuff we don't need, I'd be content to start with modifying DirectoryObject.__init__ to take create: bool = True as a kwarg, which is an easy backwards-compatible change. But I agree we should think about the desired interface, and a solution like that will get trampled by any inheritance from pathlib.Path. Still, not a bad place to start.

For .tidy() I was looking at this SO post, but I haven't implemented it myself anywhere.

samwaseda commented 1 month ago

I just asked ChatGPT what it thinks and this is the answer I got:

Deriving FileObject and DirectoryObject from pathlib.Path is a possibility, but it would significantly change how these classes work and their interaction with the Path object. Let's break down what this would entail and whether it makes sense.

Deriving DirectoryObject from pathlib.Path

Pros:

Cons:

Deriving FileObject from pathlib.Path

Pros:

Cons:

Conclusion

If you decide to derive FileObject and DirectoryObject from pathlib.Path, you would need to be very careful about maintaining the specialized behavior you want these classes to have. You'd have to override several methods to ensure that the classes behave in the specific ways you've designed them to.

However, considering the specific behaviors you’ve implemented (such as creating directories, managing file content, categorizing folder items, and the custom error handling), it might be more maintainable to not subclass from Path and instead keep your current composition approach.

Alternative Approach: Composition Instead of Inheritance

A more Pythonic approach might be to continue using composition, where DirectoryObject and FileObject wrap a Path object internally (which you’re already doing). This gives you full control over the behavior while still leveraging Path's functionality.

For instance:

class DirectoryObject:
    def __init__(self, directory: str | Path | DirectoryObject):
        self.path = Path(directory) if not isinstance(directory, DirectoryObject) else directory.path
        self.create()

    # The rest of the methods would interact with self.path

This way, you don't inherit all of Path's behavior, but you can still use it internally, ensuring that DirectoryObject and FileObject behave exactly as you intend, without the risk of inherited behavior interfering with your design.

In summary, while inheriting from pathlib.Path is possible, it would likely introduce complexity without significant benefit. The current approach using composition is more straightforward and gives you more control over the classes' behavior.

samwaseda commented 1 month ago

I guess the short answer is ChatGPT thinks we shouldn't

liamhuber commented 1 month ago

Hmm, #28 is not really what I had in mind. I was thinking of something more dramatic like replacing both classes with a single SuperPath(Path) that has the extra functionality we need. In some cases we wouldn't need to transfer the current files content because Path already has the functionality -- e.g. Path.write_text looks like a reasonable substitute for DirectoryObject.write. In other places, Path is just not flexible/powerful enough -- e.g. Path.rmdir doesn't take any arguments and fails if the directory has content...if I want to delete it even when there's content there I have to import shutil or whatever, which is nonsense.

Maybe we're coming at this from the wrong perspective though. Perhaps we should start with this: (1) where is pyiron_snippets.files currently used, and what aspects of it are being useful? And (2) where are we currently using Path and running into things we really wish were on-liners but aren't?

In https://github.com/pyiron/pyiron_workflow/pull/413 I no longer depend directly on files, and that was the only use case I was aware of. However, there I run into a few places with pathlib is a bit disappointing, e.g. I have snippets like

        if filename.parent.exists() and not any(filename.parent.iterdir()):
            filename.parent.rmdir()

But what I really want is just filename.parent.rmdir(only_empty=True)

Concretely, I'm imagining something like this to solve my issues:

from pathlib import Path, PosixPath

class SuperPosixPath(PosixPath):

    def super_rmdir(
        self, 
        missing_ok=True, 
        only_empty=True,
    ):
        if not self.exists():
            if missing_ok:
                return
            else:
                raise FileNotFoundError()
        elif self.exists() and not self.is_dir():
            raise NotADirectoryError()

        for sub in self.iterdir():
            if sub.is_dir():
                SuperPosixPath(sub).super_rmdir(only_empty=only_empty)
            elif not only_empty:
                sub.unlink()  # Probably need to care about kwargs here for edge cases

        if not any(self.iterdir()):
            self.rmdir()

p = Path("my_example")
p1 = p / "sub"
p1f = p1 / "foo.txt"
p2 = p / "sub2"
p1.mkdir(parents=True, exist_ok=True)
p2.mkdir(parents=True, exist_ok=True)
p1f.write_text("blah blah")

super_p = SuperPosixPath(p)
super_p.super_rmdir()
print(p.is_dir(), p1.is_dir(), p1f.is_file(), p2.exists())
# True True True False
super_p.super_rmdir(only_empty=False)
print(p.exists())
# False
super_p.super_rmdir(only_empty=False)
# Does nothing, missing_ok defaults to true!

Of course work needs to be done to handle the Posix/Windows/Path issue, and here I just make a brand new method instead of overriding and haven't given any real thought to what the best choice is for that, but the point here is is anyhow just to convey the concept.