jacebrowning / datafiles

A file-based ORM for Python dataclasses.
https://datafiles.readthedocs.io
MIT License
198 stars 18 forks source link

Optional "self." in patterns, frozen dataclass support, and more #288

Closed bnorick closed 1 year ago

bnorick commented 1 year ago

There are four main features / improvements in this PR which I'll cover below. I want to start by saying the test suite was infinitely helpful in developing this PR and I hope to learn from your tests in my own work.

  1. It seemed to me that the ability to forgo all the self. strings in patterns would be nice, so I made this an option on config.Meta which maintains the previous behavior by default but could be changed in the future (probably a major version, it would be a breaking change to make Meta.use_self=False the default, users could be informed that the old functionality can be maintained by setting Meta.use_self=True). The following example demonstrates the change.

    @datafiles.datafile('{a}.toml', use_self=False, manual=True)
    class T:
    a: int
    b: int = 2
  2. I added support for frozen dataclasses. This mostly involved switching to object.__setattr__ in a few places. I also added some protection against loading instances of frozen dataclasses more than once, as this could mutate the frozen object. The example below would fail previously, but works now.

    @datafiles.datafile('{self.a}.toml', manual=True)
    @dataclasses.dataclass(frozen=True)
    class T:
    a: int
    b: int = 2
  3. I added kwarg passthrough for any time datafiles creates a dataclass for you. The example above can become:

    @datafiles.datafile('{self.a}.toml', manual=True, frozen=True)
    class T3:
    a: int
    b: int = 2
  4. I updated Manager.get to be a bit more flexible. It now allows only the arguments which are required for path construction, i.e., those which appear in the pattern to be passed either as args or kwargs. Previously, passing as kwargs only when multiple non-default fields existed would fail as in the example below.

    
    @datafiles.datafile('{self.a}.toml', manual=True)
    class T2:
    a: int
    b: int

T2.objects.get(a=5)

Traceback (most recent call last):

File "", line 1, in

File "/path/to/repo/datafiles/manager.py", line 43, in get

instance = self.model(*args, **kwargs)

^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/path/to/repo/datafiles/model.py", line 82, in modified_init

init(self, *args, **kwargs)

TypeError: T2.init() got multiple values for argument 'a'



NOTE: For all examples in this PR, I had a `5.toml` file in the cwd with the content `b = 4`.

To wrap things up, I want to mention that I need some guidance on adding tests for the new features as that is something I wasn't sure about. That said, I have of course tested things manually. However, for the test suite I wanted to know where would be the best place to put tests for `Meta.use_self=False` and frozen dataclass support.

I hope you find these features and improvements worth adding, otherwise I'll have to maintain a fork (as I need these features myself) and who wants to do that!
jacebrowning commented 1 year ago

Thanks for putting this together! Features 3 and 4 seem valuable to add with no further discussion. Can you open two separate PRs with each of those changes? This will make it easier to see how it impacts the test suite and so we can discuss what additional tests should be added. :v:

bnorick commented 1 year ago

Just so I am clear, one PR with 1 and 2 and another with 3 and 4?

Also, realized I forgot to run the code formatting, etc. I'll make sure to do that too.

jacebrowning commented 1 year ago

One PR with feature 3 and one PR with feature 4.

bnorick commented 1 year ago

Sure. Can we discuss 1 and 2 here?

jacebrowning commented 1 year ago

Sure!

For feature 1, it sounds like your use case is purely syntactic connivence? Is explicitly having to define "{self.a}.toml" blocking you from accomplishing something with datafiles?

For feature 2, it sounds like if datafile(..., frozen=True) is defined then you want the model to be loaded exactly once, never saved, and have the same properties as a frozen dataclass. Do I have that right?

bnorick commented 1 year ago

For feature 1, it sounds like your use case is purely syntactic connivence? Is explicitly having to define "{self.a}.toml" blocking you from accomplishing something with datafiles?

I suppose it's not blocking me, but I find it ugly and I am personally less likely to use things that look ugly to me. If that makes sense. Without this one, I'd suffer through (probably), but I wouldn't like it.

For feature 2, it sounds like if datafile(..., frozen=True) is defined then you want the model to be loaded exactly once, never saved, and have the same properties as a frozen dataclass. Do I have that right?

I would want to be able to save in the case of a new instance (i.e., not on disk), but an instance which is on disk should only be loadable once so that its value doesn't change in the course of execution, yes. The frozen dataclass which underlies the datafile model will handle the quasi-immutability of the instance after creation, so datafiles just needs to prevent multiple loads. I took the easy route there and made the default behavior to block more than one load, but if a user explicitly passed _first_load=True then I think it'd get around that. However, if they're doing that then they explicitly want multiple loads on a frozen instance for some reason and they accept the risk.

jacebrowning commented 1 year ago

Thanks for the explanation!

I'd be open to reviewing a separate PR for feature 2 as well. It looks like the actually change is quite small.

As for feature 1, I'll have to think about it some more. I don't really want to add another configuration option (use_self) at this time, but I think it would be required if this is a breaking change.

bnorick commented 1 year ago

Actually, for feature 1, come to think of it I believe this can be done without a configuration option. The pattern can just be scanned for placeholders of the form {a} or {self.a} and we can determine up front what style is in use. If a mix, a ValueError can be raised.

I don't see much downside to this added flexibility, do you?

bnorick commented 1 year ago

Closing this as all components have been split into separate PRs at this point and we can continue discussion in the relevant PRs.