Closed philippjfr closed 7 years ago
I'm not super comfortable using repr
for this purpose although it should work. Probably the best thing is to implement the following behavior:
In no circumstance should there be an error - the worst case should be that you lose the advantage of memoization because the repr of the unhashable object is unstable.
The other solution is to limit the types of allowed stream parameter - but that is also tricky. If you specify a param.List
, the elements of the list could be anything. It probably isn't worth restricting what streams can do for the sake of a performance optimization.
The only other thing we could do is warn when falling back to repr
. The idea would be to recommend that the user uses hashable stream values (e.g tuples instead of lists). Then again, if you want to use dictionaries as the stream value, there is no easy hashable alternative (a tuple of tuples?). If only Python had a frozendict....
I'm not a huge fan of having it fall back to a different behavior. I do think it should a single approach, although I do recognize your worries about stability of the repr so I'm wondering whether we should just traverse the key and replace unhashable objects:
What else would be left? I believe most other types should hash correctly or am I wrong?
I would think that solid handling for lists and dicts is very important, and I don't see any reason why converting them to tuples or tuples of tuples, respectively, would be an issue, apart from needing to do so recursively.
We would at least want to support OrderedDict
as well as dict. I prefer the idea of recursive replacement with hashables but I am still wondering whether our suggested replacements cover everything. Do we ever use defaultdicts with param.Dict
?
Both OrderedDict and defaultdict are dict subclasses so are covered by dict.
I wouldn't have bet that cyordereddict also being a subclass of dict (being cython and all) but thankfully it is.
More importantly, neither numpy arrays nor pandas Dataframes are hashable. I wouldn't recommend using them as stream parameters, but there is nothing stopping you from trying.
In fact, numpy arrays are a good example of why it is worth avoiding string reprs. As the repr of a large enough numpy array gets truncated with ...
in the middle, two arrays with the same values around the edges but different values in the middle would map to the same key.
For numpy arrays a SHA might work not sure what to do about dataframes and other types.
Seems like a SHA is a good fallback, because it's better to have it mistakenly report that something has changed when it hasn't (e.g. if something trivial is reordered inside in a way that doesn't actually matter) than to not update the plot when something truly has changed.
Addressed in the above PR. Closing.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
In #1063 we added memoization for stream values by generating a key from the callback args, kwargs and all stream values. This is fragile because certain types like lists cannot be hashed. We can either cast to hashable types or simply use the string repr of the key.