Closed saraedum closed 10 years ago
Branch: u/saraedum/ticket/16316
Dependencies: #16251
Author: Julian Rueth
New commits:
7a0f094 | Introduced _cache_key for sage objects |
62c9681 | made polynomials with unhashable coefficients unhashable |
fa16bc7 | Merge branch 'u/saraedum/ticket/16251' of git://trac.sagemath.org/sage into ticket/16316 |
798aaf8 | Implemented _cache_key() for polynomials |
877302e | Enable caching for non-hashable objects |
Your proposal is not nearly as scary as your description would indicate. We do not want to cache unhashable things (in fact, we can't). Instead what you're doing is creating a generic method _create_key
, which is the default key when the object being passed to a @cached_function
. For example, this would allow us to have matrices/graphs automatically return immutable copies of themselves (something I think I could leverage and find useful for passing through UniqueRepresentation.__classcall__
).
I'm thinking we should also do automatic conversions for mutable python objects such as list -> tuple
and set -> frozenset
. Yet we should make really sure that we're not slowing things down in doing this. Similar for memory leaks.
Simon, do you have any thoughts on this?
It seems that the problem really has to do with key comparison in dictionaries, not with hashing or caching specifically. Given (1) Python's use of ==
for key comparison and (2) Sage's lenient implementation of ==
for elements, we basically cannot use Sage elements as dictionary keys.
One would like a third comparison operator, say equal
1, that regards two objects as equal if and only if all their "mathematically relevant" properties (e.g. value and precision for real or p-adic numbers, but not memory location) are the same. Our caching problems would then disappear if keys were compared using equal
. In this hypothetical case, the correct property for hash functions would be X equal Y
=> hash(X) == hash(Y)
.
Do I understand correctly that the purpose of cache_key
can be viewed as emulating the "missing" equal
by assigning to each object X
a (hashable) object cache_key(X)
satisfying the condition cache_key(X) == cache_key(Y)
<=> X equal Y
?
If so, does the implementation in this ticket satisfy this condition? (I am mainly worried that the parent of an element x
does not seem to be encoded in cache_key(x)
.)
1 I just looked up the different comparison functions (=
, eq
, eql
, equal
and equalp
) in Lisp and equal
seems to be the closest to what we would need, hence I am using that notation. However, Lisp uses eql
by default for hash tables.
Replying to @pjbruin:
It seems that the problem really has to do with key comparison in dictionaries, not with hashing or caching specifically. Given (1) Python's use of
==
for key comparison and (2) Sage's lenient implementation of==
for elements, we basically cannot use Sage elements as dictionary keys.
This is not really the topic of the ticket, but I do not agree. It depends on what you want to do. If you have sufficient control over which elements are put in the dictionary, then you can use Sage objects. Of course you are right, that our ==
causes a lot of trouble.
One would like a third comparison operator, say
equal
1, that regards two objects as equal if and only if all their "mathematically relevant" properties (e.g. value and precision for real or p-adic numbers, but not memory location) are the same. Our caching problems would then disappear if keys were compared usingequal
. In this hypothetical case, the correct property for hash functions would beX equal Y
=>hash(X) == hash(Y)
.
Such an operator might be helpful and would really fix trouble with caching. Then again, when are two objects equal
? Do their parents have to be identical? Or just equal
? It is not clear to me whether there can be a generic implementation that makes everybody happy. If you are aware of the current limitations, then you can work around the problems.
- Do I understand correctly that the purpose of
cache_key
can be viewed as emulating the "missing"equal
by assigning to each objectX
a (hashable) objectcache_key(X)
satisfying the conditioncache_key(X) == cache_key(Y)
<=>X equal Y
?
Yes and no. The idea is really to give elements a hash
which don't have one normally. The intention was to do something similar to what you propose above: Two unhashable elements in the same parent with equal cache key should behave identical in any reasonable computation. (There is of course room for interpretation in this 'definition'.)
- If so, does the implementation in this ticket satisfy this condition? (I am mainly worried that the parent of an element
x
does not seem to be encoded incache_key(x)
.)
I did this on purpose. I replicate the current behaviour in sage, i.e., if you mix elements with different parents in one dictionary, then you really need to know what you are doing. Why not include the parent? Well, are two parents equal if they are ==
-equal, or identical, and what if they can not be hashed (which will be the case for p
-adic extension fields).
I have no strong opinion on whether or not to include the parent but I think it can be more confusing than helpful. Also keys might unnecessarily be rather large. If your application will only get elements from a single parent but you store all the data needed to create the extension field with every element in a cache, that could be really a waste. We could of course only store id(parent)
; but I'm not sure whether that is too extreme.
Replying to @saraedum:
Replying to @pjbruin:
It seems that the problem really has to do with key comparison in dictionaries, not with hashing or caching specifically. Given (1) Python's use of
==
for key comparison and (2) Sage's lenient implementation of==
for elements, we basically cannot use Sage elements as dictionary keys.This is not really the topic of the ticket, but I do not agree.
Sorry if this sounded off-topic; I was thinking of the dictionaries used in caching, but didn't focus on that because I thought my "objections" were applicable to dictionaries in Sage in general.
I also wanted to avoid focussing on hashing because actually I don't fully understand what the criteria are for objects to be hashable. The only thing that is clear to me is that if the "mathematical meaning" can change over time (e.g. if the object is a list whose length can change, or a matrix on which you can perform row/column operations), then the object should not be hashable. But what about p-adic elements? These are supposedly fixed in time. Why exactly do we want elements of Zmod(n)
and RR
to be hashable but not elements of p-adic fields? Are p-adic fields treated differently because different elements of the same parent can satisfy a == b
without being equal
(e.g. if they have different precision)? In that case, shouldn't power series also be unhashable?
It depends on what you want to do. If you have sufficient control over which elements are put in the dictionary, then you can use Sage objects.
Sure, but @cached_function
should also work for user functions, and we have no control over what kind of arguments the user is going to feed those. It would not be strange for an unsuspecting user to have some complicated function of one variable accepting elements of arbitrary p-adic fields, for example. Then the easiest way to cache them would be to stick @cached_function
on the function, but that could easily lead to problems when you don't store the parent, couldn't it?
Such an operator might be helpful and would really fix trouble with caching. Then again, when are two objects
equal
? Do their parents have to be identical? Or justequal
? It is not clear to me whether there can be a generic implementation that makes everybody happy.
I would say the only option (for caching purposes) is to make two elements equal
only if their parents are identical, otherwise you will get cases where the cached function returns values in wrong (equal but not identical) parents. Hence I would translate this hypothetical equal
(for elements) in terms of cache_key
as x equal y
<=> parent(x) is parent(y) and cache_key(x) == cache_key(y)
.
I have no strong opinion on whether or not to include the parent but I think it can be more confusing than helpful. Also keys might unnecessarily be rather large. If your application will only get elements from a single parent but you store all the data needed to create the extension field with every element in a cache, that could be really a waste. We could of course only store
id(parent)
; but I'm not sure whether that is too extreme.
Actually that was the option I was thinking of; it should be fast and won't waste too much memory. Alternatively, there could be both an "internal" cache_key
(for use with @cached_method
, where the parent is known) that does not include the parent id
, and an "external" one (for use with @cached_function
) that does include the parent id
.
Replying to @pjbruin:
Replying to @saraedum:
Replying to @pjbruin:
I also wanted to avoid focussing on hashing because actually I don't fully understand what the criteria are for objects to be hashable. The only thing that is clear to me is that if the "mathematical meaning" can change over time (e.g. if the object is a list whose length can change, or a matrix on which you can perform row/column operations), then the object should not be hashable. But what about p-adic elements? These are supposedly fixed in time. Why exactly do we want elements of
Zmod(n)
andRR
to be hashable but not elements of p-adic fields? Are p-adic fields treated differently because different elements of the same parent can satisfya == b
without beingequal
(e.g. if they have different precision)?
Right, there is no mathematical justification for p
-adics not being hashable. It is an implementation detail because their ==
can not be compatible with hash
.
In that case, shouldn't power series also be unhashable?
Absolutely.
It depends on what you want to do. If you have sufficient control over which elements are put in the dictionary, then you can use Sage objects.
Sure, but
@cached_function
should also work for user functions, and we have no control over what kind of arguments the user is going to feed those. It would not be strange for an unsuspecting user to have some complicated function of one variable accepting elements of arbitrary p-adic fields, for example. Then the easiest way to cache them would be to stick@cached_function
on the function, but that could easily lead to problems when you don't store the parent, couldn't it?
I see. I guess this means that we should at least have that if a != b
, then some_cached_function(a)
should have a chance to be != some_cached_function(b)
.
Such an operator might be helpful and would really fix trouble with caching. Then again, when are two objects
equal
? Do their parents have to be identical? Or justequal
? It is not clear to me whether there can be a generic implementation that makes everybody happy.I would say the only option (for caching purposes) is to make two elements
equal
only if their parents are identical, otherwise you will get cases where the cached function returns values in wrong (equal but not identical) parents. Hence I would translate this hypotheticalequal
(for elements) in terms ofcache_key
asx equal y
<=>parent(x) is parent(y) and cache_key(x) == cache_key(y)
.
Ok.
I have no strong opinion on whether or not to include the parent but I think it can be more confusing than helpful. Also keys might unnecessarily be rather large. If your application will only get elements from a single parent but you store all the data needed to create the extension field with every element in a cache, that could be really a waste. We could of course only store
id(parent)
; but I'm not sure whether that is too extreme.Actually that was the option I was thinking of; it should be fast and won't waste too much memory. Alternatively, there could be both an "internal"
cache_key
(for use with@cached_method
, where the parent is known) that does not include the parentid
, and an "external" one (for use with@cached_function
) that does include the parentid
.
Imho, @cached_method
and @cached_function
should not differ in such a way. Storing the parent's id
is not a waste of memory and should hardly lead to any surprises.
I'll push a new commit which implements the inclusion of the parent. Thanks for joining this discussion btw.
Reviewer: Peter Bruin
Changed branch from u/saraedum/ticket/16316 to u/pbruin/16316-cache_key
Looks good, passes doctests and (together with #16317) resolves a doctest failure in my upcoming branch on #11474. The reviewer patch only contains a few typographical and spelling fixes.
You must change the wording of "caching elements which are unhashable"; this is not correct as you cannot cache (i.e. set as the key in a dict) anything without a (well-defined) hash. Instead say something like
This element is noramlly unhashable, so we use the function ``_cache_key``
to create a hashable key for the object for the cache.
Replying to @tscrim:
You must change the wording of "caching elements which are unhashable"; this is not correct as you cannot cache (i.e. set as the key in a dict) anything without a (well-defined) hash. Instead say something like
This element is noramlly unhashable, so we use the function ``_cache_key`` to create a hashable key for the object for the cache.
I don't think that the word "cache" is necessarily connected to "hashing". A hashtable is one way of caching but there are others. (I just checked, https://en.wikipedia.org/wiki/Cache_%28computing%29 does not contain the word 'hash'.)
Is it acceptable for you if we leave the current wording?
However we are using hashtables (in the form of MonoDict
, TripleDict
, and WeakValueDictionary
) for our cache. So this does need to be changed.
I do not really see the problem here. The objects may be "unhashable" in the sense that they don't have a __hash__()
method, but they don't change over time, so we can consistently attach a hash value to them for the purpose of caching by hashing the cache_key
instead.
It may be useful to emphasize that it is not the input of a cached function that is cached, but the output. The cache (dictionary) key is not necessarily the input itself, it is just some (hashable) object containing enough information to distinguish different input values and that can be consistently extracted from a given input value. Of course in most cases the key is the input, but it does not have to be, as this ticket shows.
Maybe the formulation "caching works for objects which are not hashable" that appears is several doctests is slightly vague, but I personally won't insist on changing it.
It's about how we are implementing our caching, in that we require a well-defined __hash__()
method. The reason why we require elements to be immutable is because the hash value often depends on the object's state (ex, the elements in a list) in order to find the object in the cache.
Formally all we need is a function to the integers to do caching in order to find out where in the cache an element should go. Our current implementations of caching requires the object to have this function called __hash__()
. So all we require is objects to never change their key, so generally the objects can still be mutable.
As for @cached_function
, the input is the key and the output is the value of the cache, and the input needs to be hashable because of this (as the ticket description demonstrates). What this ticket does is create an intermediate function from our input to an object which does have a well-defined hash to create the key for the cache. Okay, perhaps I'm overreaching in saying that it is incorrect, but it is misleading. I'm just asking for a more detailed explanation for the doctests.
Actually...why not just implement a custom __hash__
function for the required elements? This seems like overkill and a custom __hash__
would also allow you to use this as a key for dictionaries, sets, UniqueRepresentation
, etc.
Replying to @tscrim:
Actually...why not just implement a custom
__hash__
function for the required elements? This seems like overkill and a custom__hash__
would also allow you to use this as a key for dictionaries, sets,UniqueRepresentation
, etc.
I'm not sure I understand. If p-adics implemented a non-trivial __hash__
then this could never be made to be consistent with our current ==
. The intermediate function creates something with a new ==
which is then consistent with its __hash__
. Does this answer your question?
Changed branch from u/pbruin/16316-cache_key to u/saraedum/ticket/16316
Replying to @saraedum:
I'm not sure I understand. If p-adics implemented a non-trivial
__hash__
then this could never be made to be consistent with our current==
. The intermediate function creates something with a new==
which is then consistent with its__hash__
. Does this answer your question?
First I'm just making sure it is different than the hash requirements. By unique you mean this bit of doc:
An implementation must make sure that for elements ``a`` and ``b``, if
``a != b``, then also ``a._cache_key() != b._cache_key()``. In pratice
this means that the ``_cache_key`` should always include the ``id`` of
the parent.
whose the contrapositive is if a._cache_key() == b._cache_key()
, then a == b
(assuming that ==
in p
-adics satisfies a == b
iff not (a != b)
). Now this is the opposite requirement of hash: a == b
implies hash(a) == hash(b)
. In order to retrieve elements from the cache, you need that a._cache_key
never changes over the lifetime of a
. Enforcement of this is no different than that of __hash__
.
I'm good with the new wording (BTW, there's a typo (pratice) and this docstring needs ::
).
However there is a code smell to me. The more I think about it, the more I feel what should be done is ==
be strengthened (i.e. less things compare as equal) and a __hash__
method implemented. Yet that did answer my question.
Now for some timings, before:
sage: @cached_function
....: def foo(x):
....: return x
....:
sage: R.<x> = QQ[]
sage: C = crystals.Tableaux(['A',2], shape=[2,1])
sage: p = R(range(100))
sage: q = R(range(10000))
sage: pl = R(range(1000000))
sage: m = matrix(10, 10, range(100))
sage: %timeit foo(5)
1000000 loops, best of 3: 1.46 µs per loop
sage: %timeit foo(C)
1000000 loops, best of 3: 942 ns per loop
sage: %timeit foo(p)
1000 loops, best of 3: 387 µs per loop
sage: %timeit foo(q)
10 loops, best of 3: 29.5 ms per loop
sage: %timeit foo(pl)
1 loops, best of 3: 2.85 s per loop
sage: foo(m) # fails as it should
with the branch:
sage: %timeit foo(5)
1000000 loops, best of 3: 1.51 µs per loop
sage: %timeit foo(C)
1000000 loops, best of 3: 976 ns per loop
sage: %timeit foo(p)
1000 loops, best of 3: 390 µs per loop
sage: %timeit foo(q)
10 loops, best of 3: 30.4 ms per loop
sage: %timeit foo(pl)
1 loops, best of 3: 3.07 s per loop
sage: foo(m) # Still fails
So there's no statistically significant slowdown that I can see.
Some other design questions/comments:
Is there some reason why you didn't cpdef _cache_key(self)
or make it into a @lazy_attribute
since it cannot change over the lifetime of the object? The latter would actually speed things up for things whose hash is long to compute (ex. large degree dense polynomials).
I'm worried about not using the hash for polynomials for caching. I feel that _cache_key
should result self
for every hashable object. For example, this no longer would cache as the same element:
sage: R.<x> = QQ[]
sage: p = R(range(100))
sage: hash(p)
1520766690567384545
sage: hash(p._cache_key())
-4253877232222459194
sage: R.<x> = ZZ[]
sage: p = R(range(100))
sage: hash(p)
1520766690567384545
sage: hash(p._cache_key())
6965062123311622150
So now you're requiring everyone to be very careful about the parent an object belongs too.
id
will break pickling with the cache. So suppose you create parent A
, store an element a
in a cache C
. Now you pickle and unpickle a
, A
, and C
. The id
of A
is different than before the pickle and unpickling, so we are no longer able to retrieve a
from C
:sage: R = QQ['x']
sage: p = R(range(100))
sage: hash(p._cache_key())
-4253877232222459194
sage: dumps(p)
# This is some pickle
# Start a new sage session:
sage: s = # The pickle above
sage: hash(loads(s)._cache_key())
2293648738429524678
sage: hash(loads(s)) # This is persistent across sessions
1520766690567384545
I think both of the last two are issues that need to be addressed.
Replying to @tscrim:
Replying to @saraedum:
I'm not sure I understand. If p-adics implemented a non-trivial
__hash__
then this could never be made to be consistent with our current==
. The intermediate function creates something with a new==
which is then consistent with its__hash__
. Does this answer your question?First I'm just making sure it is different than the hash requirements. By unique you mean this bit of doc:
An implementation must make sure that for elements ``a`` and ``b``, if ``a != b``, then also ``a._cache_key() != b._cache_key()``. In pratice this means that the ``_cache_key`` should always include the ``id`` of the parent.
whose the contrapositive is if
a._cache_key() == b._cache_key()
, thena == b
(assuming that==
inp
-adics satisfiesa == b
iffnot (a != b)
).
Right.
I'm good with the new wording
Good.
(BTW, there's a typo (pratice) and this docstring needs
::
).
Fixed.
However there is a code smell to me.
Sorry, I'm not familiar with this term.
The more I think about it, the more I feel what should be done is
==
be strengthened (i.e. less things compare as equal) and a__hash__
method implemented. Yet that did answer my question.
I can understand this point. Strengthening ==
would be a fairly huge project, affecting a lot of code. Many people would be opposed to this (including myself).
- Is there some reason why you didn't
cpdef _cache_key(self)
This could be changed.
or make it into a
@lazy_attribute
since it cannot change over the lifetime of the object?
I do not understand @lazy_attribute
very well. What would be the implications of this? If it can't hurt us, then I don't mind changing it.
- I'm worried about not using the hash for polynomials for caching. I feel that
_cache_key
should resultself
for every hashable object.
_cache_key()
is only invoked if the object is unhashable. For most hashable objects it raises an error.
For example, this no longer would cache as the same element:
I don't think so. _cache_key
would never be called in this case.
So now you're requiring everyone to be very careful about the parent an object belongs too.
This is a compromise for elements which are unhashable. This was already discussed above.
- I'm pretty sure using the parent's
id
will break pickling with the cache. So suppose you create parentA
, store an elementa
in a cacheC
. Now you pickle and unpicklea
,A
, andC
. Theid
ofA
is different than before the pickle and unpickling, so we are no longer able to retrievea
fromC
:
I agree. This is a very serious problem.
I think both of the last two are issues that need to be addressed.
From my point of view, only the last point needs to be addressed.
Branch pushed to git repo; I updated commit sha1. New commits:
c0ab188 | fixed docstring |
It seems to me that the only option to resolve the issue with id(parent)
is to include the actual parent in the cache key.
This is not very nice for performance, but assuming that cached results are fairly expensive to calculate it is probably the best solution. The whole machinery is only really relevant for p-adics at the moment. If we ever run into performance issues because the parents are cached, then we can still add a parameter to @cached_method
which strips the parent from the cache key (e.g. because the function checks that all inputs belong to the same parent.) For the moment I would rather not implement such functionality which solves performance issues which we do not have yet.
New commits:
c0ab188 | fixed docstring |
Branch pushed to git repo; I updated commit sha1. New commits:
22d1d8e | Include parent instead of id(parent) as first entry of a cache key |
_cache_key() is only invoked if the object is unhashable. For most hashable objects it raises an error.
Ah I misinterpreted a part of the code. So since you only call the function _cache_key
(in cachefunc.pyx
) when hash
doesn't work, so the first try-except block in the function is redundant. I've reworked some things around knowing this, which should simply things.
Also instead of a raising a NotImplementedError
, I think we should raise a TypeError
to be consistent when elements are unhashable. I tried to implement a version using @lazy_attribute
, but it doesn't currently have full support for the cython inheritance.
Changed branch from u/saraedum/ticket/16316 to u/tscrim/ticket/16316
If you're happy with my changes, then it's a positive review. Thanks for dealing with my nagging.
Also in short, @lazy_attribute
makes a method (with no input args) behave like an attribute and is only called when necessary. IIRC it's faster than usual python attribute access and using @cached_method
with no args.
New commits:
7a9a6b2 | Review changes. |
Changed branch from u/tscrim/ticket/16316 to u/saraedum/ticket/16316
Changed branch from u/saraedum/ticket/16316 to u/tscrim/ticket/16316
Replying to @tscrim:
If you're happy with my changes, then it's a positive review.
It's a very good idea to include the description in the summary of cachefunc.pyx
. But I also think it should be in SageObject._cache_key
; if somebody is wondering what it is good for, that is the first place where they will look for it. It is a duplication of docstrings but I hope it is acceptable.
I made a few changes to your commit:
_cache_key
change the current behaviour. Say you have a tuple of a polynomial over the rationals and a p-adic number. The old version would only call _cache_key
of the p-adic number and leave the polynomial. Your version unpacks both which is unnecessary. I restored my version of the code._cache_key
should raise an error. If somebody calls it (misunderstanding what it is good for), then it makes sense to warn people that they did something wrong: the code in SageObject._cache_key()
will never be executed in a legitimate context. I do not mind raising a TypeError
instead of a NotImplementedError
.Thanks for dealing with my nagging.
No worries. After all you found a very serious bug (id(parent)
).
Changed reviewer from Peter Bruin to Peter Bruin, Travis Scrimshaw
Replying to @tscrim:
Also in short,
@lazy_attribute
makes a method (with no input args) behave like an attribute and is only called when necessary. IIRC it's faster than usual python attribute access and using@cached_method
with no args.
That's not correct. At least when you have a Python class, a lazy attribute will be a plain python attribute after the first access. Indeed, a lazy attribute L
defined for some class C
has a __get__
method, so that L.__get__(O, C)
assigns the output value to O.L
, for an instance O
of C
. In other words, it overrides itself by the return value and thus is exactly as fast as a usual python attribute, after the first access.
However, if C
is a Cython class, attribute assignment does not always work. Hence, in this case, L.__get__(O,C)
will not be able to assign the return value to O.L
. By consequence, the __get__
method will be called upon each single access. Hence, if I see that correctly, the performance will be worse than calling a method.
Replying to @saraedum:
Replying to @tscrim:
- I'm pretty sure using the parent's
id
will break pickling with the cache. So suppose you create parentA
, store an elementa
in a cacheC
. Now you pickle and unpicklea
,A
, andC
. Theid
ofA
is different than before the pickle and unpickling, so we are no longer able to retrievea
fromC
:I agree. This is a very serious problem.
Hmm, I see. But if we store the parent itself, then I think we have to make sure that all unhashable elements belong to parents satisfying unique representation. Otherwise, calling a cached function on an element whose parent is equal but not identical to the parent of an element that was used before could return a cached result that lives in this equal-but-not-identical parent.
Changed branch from u/tscrim/ticket/16316 to u/saraedum/ticket/16316
Replying to @saraedum:
It's a very good idea to include the description in the summary of
cachefunc.pyx
. But I also think it should be inSageObject._cache_key
; if somebody is wondering what it is good for, that is the first place where they will look for it. It is a duplication of docstrings but I hope it is acceptable.
It's a private method only used in one place, so the chances of someone finding it randomly are very slim, but okay, it doesn't really matter.
I made a few changes to your commit:
- I removed some debug statements.
Whoops, forgot about those.
- Your changes to
_cache_key
change the current behaviour. Say you have a tuple of a polynomial over the rationals and a p-adic number. The old version would only call_cache_key
of the p-adic number and leave the polynomial. Your version unpacks both which is unnecessary. I restored my version of the code.- The default implementation of
_cache_key
should raise an error. If somebody calls it (misunderstanding what it is good for), then it makes sense to warn people that they did something wrong: the code inSageObject._cache_key()
will never be executed in a legitimate context. I do not mind raising aTypeError
instead of aNotImplementedError
.
Restoring your branch so I can look again.
New commits:
b41c455 | restore previous version of _cache_key() in cachefunc.pyx |
37aa276 | reverted changes which treated _cache_key like an attribute |
5586f56 | removed debug output |
83944fa | Restored old version of SageObject._cache_key() but raising a TypeError instead of a NotImplementedError. |
Ah, now I see why you're trying to hash every time inside of the function _cache_key
, which is why SageObject._cache_key
should be the thing raising the error. Although what I had before, with proper placement in the class hierarchy, would not have unpacked the polynomial. Defining _cache_key
for all polynomials led to some of my confusion before.
I'm also not in favor of raising an error in SageObject._cache_key
because an object is unhashable. What I had before allowed more flexibility is use, say I wanted to do my own custom cache which uses the _cache_key
method.
So overall I feel like this is a heavy-handed approach. However I don't have a strong enough use case to oppose this getting in as is. So positive review.
Replying to @pjbruin:
Hmm, I see. But if we store the parent itself, then I think we have to make sure that all unhashable elements belong to parents satisfying unique representation. Otherwise, calling a cached function on an element whose parent is equal but not identical to the parent of an element that was used before could return a cached result that lives in this equal-but-not-identical parent.
No, this isn't a problem because equal parents have equal hashes.
Replying to @simon-king-jena:
That's not correct. At least when you have a Python class, a lazy attribute will be a plain python attribute after the first access. Indeed, a lazy attribute
L
defined for some classC
has a__get__
method, so thatL.__get__(O, C)
assigns the output value toO.L
, for an instanceO
ofC
. In other words, it overrides itself by the return value and thus is exactly as fast as a usual python attribute, after the first access.
Ah right. Then perhaps I was thinking a @cached_method
with no args was faster than a Python attribute access? shrugs
However, if
C
is a Cython class, attribute assignment does not always work. Hence, in this case,L.__get__(O,C)
will not be able to assign the return value toO.L
. By consequence, the__get__
method will be called upon each single access. Hence, if I see that correctly, the performance will be worse than calling a method.
Hmm...is that an issue with the extension class not necessarily having a __dict__
and related to the @lazy_attribute
not always working with (Cython) inheritance?
Caching does currently not work for unhashable elements
In this case
a
should not be hashable since its current operator==
could never be made consistent with a non-trivial hash value (cf. #11895). However, such elements should be cacheable.This ticket adds a
_cache_key()
for such elements which can be used to hash and compare unhashable elements. That method should then return a tuple (or anything else that is hashable) which uniquely identifies the element, e.g. for p-adics the precision and a lift of the element to the integers.Depends on #16251
CC: @simon-king-jena
Component: misc
Author: Julian Rueth
Branch/Commit:
aa037de
Reviewer: Peter Bruin, Travis Scrimshaw
Issue created by migration from https://trac.sagemath.org/ticket/16316