Closed frmdstryr closed 1 year ago
Merging #182 (959167e) into main (11fb42c) will decrease coverage by
0.06%
. The diff coverage is90.72%
.
:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more
I do not have time for a complete review (in particular of the C part), but here are just some thoughts.
First, I think it will be very nice to speed up pickling. Second, as we address this there are related two things I would like to address too:
Both points should be easy to address but the second may come with some speed penalty. Benchmarks will be handy then.
Would it make sense to pass the frozen state as an arg to new https://github.com/nucleic/atom/blob/main/atom/atom.py#L619 ?
I'm not sure if doing it in setstate make senses because the object could be unfrozen (assuming it uses a flag to somehow set the frozen state).
The issue I see is that if we freeze the object in new we won't be able to set any member in setstate. I am not sure I follow your argument about freezing based on a flag. If we pickle a frozen atom my naive idea is that the unpickled version should be frozen. So setting the value in setstate does not seem absurd to me.
Out of curiosity did you consider or try to also implement reduce_ex and newargs in C ?
For the sake of readability I would use --frozen
instead of -f
to mark an instance is frozen.
Below I try to summarize reasons and ways we may want to customize when a member value is made part of an Atom state.
Tuning of getstate behavior:
cppy::ptr value( atom->get_slot( member->index ) );
)No matter what we do for the last bullet point I feel that having a getstate mode like we do for getattr, setattr, validate etc is the way to get enough versatility and won't be surprising to people using Atom. The enum value would be:
The method to retrieve whether or not a member should be included would take the atom instance and return a bool. I think the metadata value should take precedence and if set we should ignore the mode. We cannot easily bypass looking the key in teh dictionary since the metadata dictionary is mutable.
@frmdstryr WDYT of the above ?
It is definitively more work than what you have already done but I think it would be worth it. I can help with any part of the above plan if you want.
This is all kind of more than I intended for with this PR... (just slightly faster version of the current behavior).
Anyways, in my opinion if atom is going to have a way to customize the state I don't think choosing whether the value is included or not is flexible enough. It's fine for pickle but not for serializing to something like json which cannot handle py objects.
With atom-db I added a way to tag a member with custom functions to serialize/deserialize the member (eg https://github.com/codelv/atom-db/blob/master/tests/test_json.py#L35). IMO that is the best. Its simple, yet still flexible because the user has full control on what goes in and out.
I am sorry I am somehow hijacking your PR but the current behavior has already bitten me several times (in particular the impossibility to pickle an Atom with a Constant member) and I saw this PR as a good opportunity to fix the "broken" parts.
What bothers me about your approach is that __getstate__
is meant for pickle not to serialize to a different format. I have used a trick similar to yours to serialize atom object to INI but I feel this is orthogonal to __getstate__
and pickle.
If you need to go through your custom serialization, deserialization function, am I right to think that this PR does not help with speed ?
Regarding for example json, the addition of a new dunder was recently discussed on Discourse https://discuss.python.org/t/introduce-a-json-magic-method/21768.
@frmdstryr could you comment on how a faster getstate in C helps with your case of using tags ?
For the pure pickle use case, I believe the plan I proposed make sense (except we could even avoid the tag and the issue of using a name other people may already rely on). I will try to work on it when I have the cycles.
You're correct the speed difference is negligible. Adding the enum you proposed makes sense.
Could you enable edit by maintainers ? I have implemented the changes described above but cannot push.
It is checked? I have no branch protection stuff setup, but maybe its because its the main branch? IDK.
Feel free to just close and open a new one.
my bad I worked on your frmdstryr/move-get-set-state-to-catom branch and failed to notice that the PR was from main...
Oh, I had to move it to main to get actions to run.
I will go through the changes again to check I do not mess up something while rebasing/cherry-picking but otherwise this should be good to go (modulo the missing documentation). @frmdstryr can you have a look.
Thanks for having a look @frmdstryr . I still need to go through everything once but I am wondering, should we add support for magic method name in AtomMeta for this use case ?
It may clash with existing names which would make the change more intrusive than I would like but not having it will be surprising since it would the only behavior missing it. Do you have an opinion ?
@frmdstryr I went ahead and added and documented a magic method for getstate for consistency. Do you have an opinion ?
I was thinking of explaining how to directly alter the member mode which feels more useful to simply forbid the pickling of a member but I did not find a place where it would nicely fit.
Agree. I would just add it to the customization page.
Thanks @frmdstryr for initiating the work on this. It is always extremely motivating for me to work on a feature I know you need.
Thanks!
If the benchmarks are accurate, this provides a slight improvement (depending on the py version) for serialization speed.
The current python version can be sped up by moving the slots check into atom meta as done here. If that is done this is still about 15% faster on 3.11.