Closed mvcisback closed 6 years ago
Recur is not very efficient as it is. The library as a whole has not had a lot of performance testing.
Recur currently works by breaking apart and reassembling every object in the object graph except for those of the requested type. This means that whole branches of the tree are recreated (reallocated) even when nothing in them has changed. I just made a commit to a new recur-performance
branch which fixes this, but the extra work needed to check if an object needs to change could cost more time than what is saved by not making copies. I'd appreciate if you could test this for your use case.
Recur does not currently use the setattr
hook. It could be made to do so, but I think it would decrease performance in the general case because there would be more copying. Right now, the hooks that it uses are to_iter
and from_iter
(Recur assumes that iterable objects iterate over all their contents). I don't know if your objects are supposed to be iterable, but you could try implementing that hook anyway and I think that memoizing it should improve performance in your case. It would also help if the hook could iterate over only a subset of your object's attributes. It might be worth adding a "traversable" hook of some kind for situations such as this. If the iter hooks help then let me know and I'll figure out something more suitable.
By the way, the implementation of Recur (at least, the bit that matters) can be found at line 101 of the lenses/optics/traversals.py
file, if you want to take a look at it. All you need to know is that the folder
method must take a state and return an iterable of all the foci, and that the builder
method takes the old state and an iterable of new foci (called values
) and sets those values on the state in the same order that they would have been focused by folder
. folder
and builder
are required by the parent class, but the other methods are just helper methods for those two.
Note that this library will never beat custom traversal code in terms of performance because it has to be quite generic to support all the fancy features. Haskell can inline function calls, but python can't.
I am not very concerned about performance, but at the moment not memoizing seems to lead to unreasonably inefficient code. I will try your new branch and look into the Recur implementation to see if I can get some reasonable performance without too much work.
Sorry, I was maybe a little bit ramble-y.
My main suggestion is that you should write to_iter
and from_iter
hooks and memoize them, because that's your only option currently. I'm looking to improve the hooks (add more and make them more useful and discoverable) so information about how well that works for you would be useful.
I ask you to test the branch not because I think it will be a large improvement for you, but because you already have a good test case to stress test it.
That makes sense. From what I can see, memoizing from_iter
would make a difference. That said, its not obvious to how memoizing to_iter
makes much of a difference.
In particular, it seems to me that self.build_object
will still get called to many times since it's before the memoized from_iter
.
https://github.com/ingolemo/python-lenses/blob/master/lenses/optics/traversals.py#L162
I think ideally the lenses
library would check if the objects are hashable and memoize accordingly. For example, it looks like https://github.com/ingolemo/python-lenses/blob/master/lenses/optics/traversals.py#L150 could be memoized.
In my case, everything is immutable and hashable so this would effectively implement my current traversal.
I guess you might need an additional hook to tell lenses
that an object is immutable. Perhaps something like:
from typing import NamedTuple
class A(NamedTuple):
data: str
_lens_is_immutable = True
I only mentioned to_iter
because it's behaviour needs to be complimentary with from_iter
, so depending on what you're doing you may have had to implement both. I didn't mean to imply that memoizing to_iter
would help. It won't.
You're also right about build_object
being called before from_iter
being an issue.
build_object
couldn't be memoized as it was because of the way that it progressively consumes the values
. I was hesitant to add memoization because I thought it would be difficult. But, I've played around a little and managed to refactor the build functions in such a way that adding memoization was possible. It was not as hard as I'd feared. Please test your code on the new master branch and see if they help (remove any iter hooks, at least temporarily).
This refactoring has made that old branch unnecessary because the new implementation requires essentially the same trick in order to even work, so I don't need you to test it any more. Incidentally, the new implementation makes use of the setattr
hook for the same reason, not that you need it anymore.
I'm not really sure how an is_immutable
hook would help. Could you elaborate?
This looks like the right direction! I'll take a look later tonight or tomorrow morning.
I just mentioned the is_immutable
issue because if the object being rewritten is mutable, then you might need to deep copy either way. Perhaps this isn't an issue, but it felt like maybe a good way to cue the lenses library that this object is supposed to be immutable, so this is a safe optimization.
This seemed to help a little bit. I'll try to run a larger example on Monday.
Hi, The lenses library was invaluable to helping write the initial prototype of my sequential circuit library pyAiger. I mainly used the
lens
api to perform very recursive updates, for example, to turn latches into inputs:Unfortunately, due to scaling issues I've had to rewrite a number of methods. The key problem is that the same sub-circuit appears multiple times in the data structure. Thus, the
Recur
lens goes down the same path multiple times. My new hard-coded traversals are similarly recursive, but memoize to avoid going down the same path. How could I tellpython-lenses
to do this? For reference, this datastructure is essentially a forest with many of the trees overlapping. Is the best way to do this to memoize thesetattr
hook?