holoviz / holoviews

With Holoviews, your data visualizes itself.
https://holoviews.org
BSD 3-Clause "New" or "Revised" License
2.7k stars 402 forks source link

Dynamic Callback memoization breaks for non-hashable stream values #1067

Closed philippjfr closed 7 years ago

philippjfr commented 7 years ago

In #1063 we added memoization for stream values by generating a key from the callback args, kwargs and all stream values. This is fragile because certain types like lists cannot be hashed. We can either cast to hashable types or simply use the string repr of the key.

jlstevens commented 7 years ago

I'm not super comfortable using repr for this purpose although it should work. Probably the best thing is to implement the following behavior:

  1. Assume the key is hashable and try to use it.
  2. Fallback to the repr because something in the key is unhashable.
  3. Skip memoisation (maybe the repr can fail).

In no circumstance should there be an error - the worst case should be that you lose the advantage of memoization because the repr of the unhashable object is unstable.

The other solution is to limit the types of allowed stream parameter - but that is also tricky. If you specify a param.List, the elements of the list could be anything. It probably isn't worth restricting what streams can do for the sake of a performance optimization.

The only other thing we could do is warn when falling back to repr. The idea would be to recommend that the user uses hashable stream values (e.g tuples instead of lists). Then again, if you want to use dictionaries as the stream value, there is no easy hashable alternative (a tuple of tuples?). If only Python had a frozendict....

philippjfr commented 7 years ago

I'm not a huge fan of having it fall back to a different behavior. I do think it should a single approach, although I do recognize your worries about stability of the repr so I'm wondering whether we should just traverse the key and replace unhashable objects:

What else would be left? I believe most other types should hash correctly or am I wrong?

jbednar commented 7 years ago

I would think that solid handling for lists and dicts is very important, and I don't see any reason why converting them to tuples or tuples of tuples, respectively, would be an issue, apart from needing to do so recursively.

jlstevens commented 7 years ago

We would at least want to support OrderedDict as well as dict. I prefer the idea of recursive replacement with hashables but I am still wondering whether our suggested replacements cover everything. Do we ever use defaultdicts with param.Dict?

philippjfr commented 7 years ago

Both OrderedDict and defaultdict are dict subclasses so are covered by dict.

jlstevens commented 7 years ago

I wouldn't have bet that cyordereddict also being a subclass of dict (being cython and all) but thankfully it is.

jlstevens commented 7 years ago

More importantly, neither numpy arrays nor pandas Dataframes are hashable. I wouldn't recommend using them as stream parameters, but there is nothing stopping you from trying.

jlstevens commented 7 years ago

In fact, numpy arrays are a good example of why it is worth avoiding string reprs. As the repr of a large enough numpy array gets truncated with ... in the middle, two arrays with the same values around the edges but different values in the middle would map to the same key.

philippjfr commented 7 years ago

For numpy arrays a SHA might work not sure what to do about dataframes and other types.

jbednar commented 7 years ago

Seems like a SHA is a good fallback, because it's better to have it mistakenly report that something has changed when it hasn't (e.g. if something trivial is reordered inside in a way that doesn't actually matter) than to not update the plot when something truly has changed.

jlstevens commented 7 years ago

Addressed in the above PR. Closing.

github-actions[bot] commented 1 day ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.