nucleic / atom

Memory efficient Python objects
https://atom.readthedocs.io/en/latest/
Other
361 stars 50 forks source link

Move __getstate__ and __setstate__ to catom #182

Closed frmdstryr closed 1 year ago

frmdstryr commented 1 year ago

If the benchmarks are accurate, this provides a slight improvement (depending on the py version) for serialization speed.

The current python version can be sped up by moving the slots check into atom meta as done here. If that is done this is still about 15% faster on 3.11.

codecov-commenter commented 1 year ago

Codecov Report

Merging #182 (959167e) into main (11fb42c) will decrease coverage by 0.06%. The diff coverage is 90.72%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #182 +/- ## ========================================== - Coverage 89.33% 89.27% -0.06% ========================================== Files 55 56 +1 Lines 5165 5342 +177 Branches 332 339 +7 ========================================== + Hits 4614 4769 +155 - Misses 535 558 +23 + Partials 16 15 -1 ```
MatthieuDartiailh commented 1 year ago

I do not have time for a complete review (in particular of the C part), but here are just some thoughts.

First, I think it will be very nice to speed up pickling. Second, as we address this there are related two things I would like to address too:

Both points should be easy to address but the second may come with some speed penalty. Benchmarks will be handy then.

frmdstryr commented 1 year ago

Would it make sense to pass the frozen state as an arg to new https://github.com/nucleic/atom/blob/main/atom/atom.py#L619 ?

I'm not sure if doing it in setstate make senses because the object could be unfrozen (assuming it uses a flag to somehow set the frozen state).

MatthieuDartiailh commented 1 year ago

The issue I see is that if we freeze the object in new we won't be able to set any member in setstate. I am not sure I follow your argument about freezing based on a flag. If we pickle a frozen atom my naive idea is that the unpickled version should be frozen. So setting the value in setstate does not seem absurd to me.

Out of curiosity did you consider or try to also implement reduce_ex and newargs in C ?

MatthieuDartiailh commented 1 year ago

For the sake of readability I would use --frozen instead of -f to mark an instance is frozen.

MatthieuDartiailh commented 1 year ago

Below I try to summarize reasons and ways we may want to customize when a member value is made part of an Atom state.

Tuning of getstate behavior:

No matter what we do for the last bullet point I feel that having a getstate mode like we do for getattr, setattr, validate etc is the way to get enough versatility and won't be surprising to people using Atom. The enum value would be:

The method to retrieve whether or not a member should be included would take the atom instance and return a bool. I think the metadata value should take precedence and if set we should ignore the mode. We cannot easily bypass looking the key in teh dictionary since the metadata dictionary is mutable.

@frmdstryr WDYT of the above ?

It is definitively more work than what you have already done but I think it would be worth it. I can help with any part of the above plan if you want.

frmdstryr commented 1 year ago

This is all kind of more than I intended for with this PR... (just slightly faster version of the current behavior).

Anyways, in my opinion if atom is going to have a way to customize the state I don't think choosing whether the value is included or not is flexible enough. It's fine for pickle but not for serializing to something like json which cannot handle py objects.

With atom-db I added a way to tag a member with custom functions to serialize/deserialize the member (eg https://github.com/codelv/atom-db/blob/master/tests/test_json.py#L35). IMO that is the best. Its simple, yet still flexible because the user has full control on what goes in and out.

MatthieuDartiailh commented 1 year ago

I am sorry I am somehow hijacking your PR but the current behavior has already bitten me several times (in particular the impossibility to pickle an Atom with a Constant member) and I saw this PR as a good opportunity to fix the "broken" parts.

What bothers me about your approach is that __getstate__ is meant for pickle not to serialize to a different format. I have used a trick similar to yours to serialize atom object to INI but I feel this is orthogonal to __getstate__ and pickle.

If you need to go through your custom serialization, deserialization function, am I right to think that this PR does not help with speed ?

MatthieuDartiailh commented 1 year ago

Regarding for example json, the addition of a new dunder was recently discussed on Discourse https://discuss.python.org/t/introduce-a-json-magic-method/21768.

MatthieuDartiailh commented 1 year ago

@frmdstryr could you comment on how a faster getstate in C helps with your case of using tags ?

For the pure pickle use case, I believe the plan I proposed make sense (except we could even avoid the tag and the issue of using a name other people may already rely on). I will try to work on it when I have the cycles.

frmdstryr commented 1 year ago

You're correct the speed difference is negligible. Adding the enum you proposed makes sense.

MatthieuDartiailh commented 1 year ago

Could you enable edit by maintainers ? I have implemented the changes described above but cannot push.

frmdstryr commented 1 year ago

It is checked? I have no branch protection stuff setup, but maybe its because its the main branch? IDK. image

Feel free to just close and open a new one.

MatthieuDartiailh commented 1 year ago

my bad I worked on your frmdstryr/move-get-set-state-to-catom branch and failed to notice that the PR was from main...

frmdstryr commented 1 year ago

Oh, I had to move it to main to get actions to run.

MatthieuDartiailh commented 1 year ago

I will go through the changes again to check I do not mess up something while rebasing/cherry-picking but otherwise this should be good to go (modulo the missing documentation). @frmdstryr can you have a look.

MatthieuDartiailh commented 1 year ago

Thanks for having a look @frmdstryr . I still need to go through everything once but I am wondering, should we add support for magic method name in AtomMeta for this use case ?

It may clash with existing names which would make the change more intrusive than I would like but not having it will be surprising since it would the only behavior missing it. Do you have an opinion ?

MatthieuDartiailh commented 1 year ago

@frmdstryr I went ahead and added and documented a magic method for getstate for consistency. Do you have an opinion ?

I was thinking of explaining how to directly alter the member mode which feels more useful to simply forbid the pickling of a member but I did not find a place where it would nicely fit.

frmdstryr commented 1 year ago

Agree. I would just add it to the customization page.

MatthieuDartiailh commented 1 year ago

Thanks @frmdstryr for initiating the work on this. It is always extremely motivating for me to work on a feature I know you need.

frmdstryr commented 1 year ago

Thanks!