ingolemo / python-lenses

A python lens library for manipulating deeply nested immutable structures
GNU General Public License v3.0
310 stars 19 forks source link

[Help Wanted] - Memoizing when modifying similar subparts. #18

Closed mvcisback closed 6 years ago

mvcisback commented 6 years ago

Hi, The lenses library was invaluable to helping write the initial prototype of my sequential circuit library pyAiger. I mainly used the lens api to perform very recursive updates, for example, to turn latches into inputs:

circ = bind(self).Recur(LatchIn) \
                 .Filter(lambda x: x.name in latches) \
                 .modify(lambda x: Input(name))

Unfortunately, due to scaling issues I've had to rewrite a number of methods. The key problem is that the same sub-circuit appears multiple times in the data structure. Thus, the Recur lens goes down the same path multiple times. My new hard-coded traversals are similarly recursive, but memoize to avoid going down the same path. How could I tell python-lenses to do this? For reference, this datastructure is essentially a forest with many of the trees overlapping. Is the best way to do this to memoize the setattr hook?

ingolemo commented 6 years ago

Recur is not very efficient as it is. The library as a whole has not had a lot of performance testing.

Recur currently works by breaking apart and reassembling every object in the object graph except for those of the requested type. This means that whole branches of the tree are recreated (reallocated) even when nothing in them has changed. I just made a commit to a new recur-performance branch which fixes this, but the extra work needed to check if an object needs to change could cost more time than what is saved by not making copies. I'd appreciate if you could test this for your use case.

Recur does not currently use the setattr hook. It could be made to do so, but I think it would decrease performance in the general case because there would be more copying. Right now, the hooks that it uses are to_iter and from_iter (Recur assumes that iterable objects iterate over all their contents). I don't know if your objects are supposed to be iterable, but you could try implementing that hook anyway and I think that memoizing it should improve performance in your case. It would also help if the hook could iterate over only a subset of your object's attributes. It might be worth adding a "traversable" hook of some kind for situations such as this. If the iter hooks help then let me know and I'll figure out something more suitable.

By the way, the implementation of Recur (at least, the bit that matters) can be found at line 101 of the lenses/optics/traversals.py file, if you want to take a look at it. All you need to know is that the folder method must take a state and return an iterable of all the foci, and that the builder method takes the old state and an iterable of new foci (called values) and sets those values on the state in the same order that they would have been focused by folder. folder and builder are required by the parent class, but the other methods are just helper methods for those two.

ingolemo commented 6 years ago

Note that this library will never beat custom traversal code in terms of performance because it has to be quite generic to support all the fancy features. Haskell can inline function calls, but python can't.

mvcisback commented 6 years ago

I am not very concerned about performance, but at the moment not memoizing seems to lead to unreasonably inefficient code. I will try your new branch and look into the Recur implementation to see if I can get some reasonable performance without too much work.

ingolemo commented 6 years ago

Sorry, I was maybe a little bit ramble-y.

My main suggestion is that you should write to_iter and from_iter hooks and memoize them, because that's your only option currently. I'm looking to improve the hooks (add more and make them more useful and discoverable) so information about how well that works for you would be useful.

I ask you to test the branch not because I think it will be a large improvement for you, but because you already have a good test case to stress test it.

mvcisback commented 6 years ago

That makes sense. From what I can see, memoizing from_iter would make a difference. That said, its not obvious to how memoizing to_iter makes much of a difference.

In particular, it seems to me that self.build_object will still get called to many times since it's before the memoized from_iter. https://github.com/ingolemo/python-lenses/blob/master/lenses/optics/traversals.py#L162

I think ideally the lenses library would check if the objects are hashable and memoize accordingly. For example, it looks like https://github.com/ingolemo/python-lenses/blob/master/lenses/optics/traversals.py#L150 could be memoized.

In my case, everything is immutable and hashable so this would effectively implement my current traversal.

mvcisback commented 6 years ago

I guess you might need an additional hook to tell lenses that an object is immutable. Perhaps something like:

from typing import NamedTuple

class A(NamedTuple):
    data: str
    _lens_is_immutable = True
ingolemo commented 6 years ago

I only mentioned to_iter because it's behaviour needs to be complimentary with from_iter, so depending on what you're doing you may have had to implement both. I didn't mean to imply that memoizing to_iter would help. It won't.

You're also right about build_object being called before from_iter being an issue.

build_object couldn't be memoized as it was because of the way that it progressively consumes the values. I was hesitant to add memoization because I thought it would be difficult. But, I've played around a little and managed to refactor the build functions in such a way that adding memoization was possible. It was not as hard as I'd feared. Please test your code on the new master branch and see if they help (remove any iter hooks, at least temporarily).

This refactoring has made that old branch unnecessary because the new implementation requires essentially the same trick in order to even work, so I don't need you to test it any more. Incidentally, the new implementation makes use of the setattr hook for the same reason, not that you need it anymore.

I'm not really sure how an is_immutable hook would help. Could you elaborate?

mvcisback commented 6 years ago

This looks like the right direction! I'll take a look later tonight or tomorrow morning.

I just mentioned the is_immutable issue because if the object being rewritten is mutable, then you might need to deep copy either way. Perhaps this isn't an issue, but it felt like maybe a good way to cue the lenses library that this object is supposed to be immutable, so this is a safe optimization.

mvcisback commented 6 years ago

This seemed to help a little bit. I'll try to run a larger example on Monday.