Closed simon-king-jena closed 11 years ago
Note that the patch also needed to change some auxiliary class CartanType_simple_finite
, which is used to unpickle some old data. It used to inherit from object
, but for an incompatibility of Cython types it has to inherit from UniqueRepresentation
instead.
For the timings, I use MatrixSpace
, which inherits from UniqueRepresentation
:
sage: isinstance(MatrixSpace(GF(3),2,3), UniqueRepresentation)
True
With sage-5.6.rc0 plus #14017:
sage: %time L = [MatrixSpace(GF(3),n) for n in range(10000)]
CPU times: user 1.94 s, sys: 0.05 s, total: 2.00 s
Wall time: 2.00 s
sage: %time D = dict(zip(L,range(len(L))))
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.01 s
sage: MS = MatrixSpace(GF(3),10000)
sage: MS in D
False
sage: timeit("MS in D", number = 10^6)
1000000 loops, best of 3: 552 ns per loop
sage: MS = L[5000]
sage: MS in D
True
sage: timeit("MS in D", number = 10^6)
1000000 loops, best of 3: 540 ns per loop
Adding the patch from here:
sage: %time L = [MatrixSpace(GF(3),n) for n in range(10000)]
CPU times: user 1.96 s, sys: 0.04 s, total: 2.00 s
Wall time: 2.00 s
sage: %time D = dict(zip(L,range(len(L))))
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.01 s
sage: MS = MatrixSpace(GF(3),10000)
sage: MS in D
False
sage: timeit("MS in D", number = 10^6)
1000000 loops, best of 3: 187 ns per loop
sage: MS = L[5000]
sage: MS in D
True
sage: timeit("MS in D", number = 10^6)
1000000 loops, best of 3: 176 ns per loop
Hence, the time drops by 2/3. I did run make ptest successfully. Needs review!
Cc to Nicolas as the original author of UniqueRepresentation
.
I just did some experiments with the tp_hash function from Python's C-API: It is 1/3 faster than a cythoned hash. I'll try to do similar things with the comparison methods.
While we are at speeding up UniqueRepresentation
, I think we should actually refactor it. Namely, UniqueRepresentation
serves two purposes, namely (1) caching, and (2) comparison and hash by identity.
Some classes misuse UniqueRepresentation
by only using feature (1), overriding comparison in a way that violates the unique representation condition. See my monologue at sage-devel.
I suggest to split the two features.
FWIW, I finished a more experimental and rather intrusive version of UniqueRepresentation
.
Idea:
Create a cdef function, that results in faster C code than what Cython makes of
def __hash__(self):
return id(self)
UniqueRepresentation
. Likewise for tp_richcompare.It remains possible to override those parts of comparison that can't be decided by looking at identity (such as "a<b" if "a is not b").
It would be great to just define the fast hash and comparison for UniqueRepresentation
itself, but alas it seems that subclasses forget these settings, whether they override __hash__
or not. See the comments on sage-devel.
I still think it is a good idea to separate UniqueRepresentation
from CachedRepresentation
, but I am not so sure about enforcing the uniqueness behaviour, without the possibility to override it---this wouldn't be pythonic...
Let the patchbot do some work:
Apply trac14054_fast_methods.patch
From sage-devel, I understand that people think that separating the cache feature from the uniqueness feature of UniqueRepresentation
is a good idea.
However, in my old patch, I was enforcing uniqueness behaviour for instances of UniqueRepresentation
. This isn't pythonic. Hence, I do differently in the new patch version.
Question
In the current implementation, inheritance from UniqueRepresentation
will overload rich comparison (==, >=, !=, etc.) inherited from a base class, but it will not overload comparison (cmp). Do you think that both should be overloaded?
Patchbot reported two failures with the previous patch version. I guess that's because of an additional dependency. So, as soon as I have a decent internet connection, I'll download the latest beta, and rebase on top of it.
Apply trac14054_fast_methods.patch
Other question: Does provide_hash_by_id
really make sense to have?
The updated patch should make the coverage script happy, but three tests with 5.7.rc0 currently fail. I am downloading 5.7.rc0 now.
Apply trac14054_fast_methods.patch
I am not qualified to look over this ticket, but glancing at it I spotted this
308 complete = complete = self.complete()
which looks like a typo.
Replying to @sagetrac-jlopez:
I am not qualified to look over this ticket, but glancing at it I spotted this
308 complete = complete = self.complete()
which looks like a typo.
Yes, thank you for spotting it! Will remove it when I rebase the patch against 5.7.rc0.
Hey Simon,
Let me know when this is ready for review again.
Best,
Travis
Reviewer: Travis Scrimshaw
The patch should now be ready for review.
Two questions that I'd like to have addressed:
provide_hash_by_id
, but I don't use it. Shall it be deleted?hash(a)!=hash(b)
but cmp(a,b)==0
. Does this violate the contract of hash functions? Or is it enough that hash(a)!=hash(b)
implies a!=b
?Pathbot:
Apply trac14054_fast_methods.patch
Description changed:
---
+++
@@ -1,5 +1,15 @@
-`UniqueRepresentation` provides a comfortable way to create unique parent structures, and automatically provides a hash and certain comparison methods. Problem: It relies on a metaclass, namely `ClasscallMetaclass` and thus has to be a Python class. That's bad for speed.
+`UniqueRepresentation` provides a comfortable way to create unique parent structures, and automatically provides a hash and certain comparison methods.
-Here, I suggest to create a new cdef class `UniqueRepresentation_c` that provides hash and comparison, and let `UniqueRepresentation` inherit from it, just adding the classcall method.
+Problems:
-The problem is that `UniqueRepresentation` relies on cached_method, and this decorator had problems to work on methods defined in Cython with varargs and keywords arguments. That is fixed in #14017, which is thus a dependency.
+- It relies on a metaclass, namely `ClasscallMetaclass` and thus has to be a Python class. That's bad for speed.
+- It combines two features, namely a cache and unique instance behaviour. But in many some cases we want a cache so that two distinct instances can still be equal.
+
+Here, I suggest to
+
+1. separate the two features, by creating a new class `sage.unique_representation.CachedRepresentation` as one base of `UniqueRepresentation`.
+2. create a new cdef class `sage.misc.fast_methods.WithRichCmpById`, that provides hash and rich comparison, as expected for unique instance behaviour.
+
+__Apply__
+
+- [attachment: trac14054_fast_methods.patch](https://github.com/sagemath/sage-prod/files/10657049/trac14054_fast_methods.patch.gz)
To answer the question about the hash contract:
Apparently dictionaries use the rich comparison methods and not cmp:
sage: class Bla(object):
....: def __hash__(self):
....: return 2
....: def __cmp__(self, other):
....: return 0
....: def __eq__(self, other):
....: return self is other
....: def __ne__(self, other):
....: return self is not other
....:
sage: a = Bla()
sage: b = Bla()
sage: a==b
False
sage: hash(a)==hash(b)
True
sage: cmp(a,b)
0
sage: D = {a:1}
Since hash(a)==hash(b)
, a and b belong to the same hash bucket of the dictionary. And comparison by cmp tells that they are equal. But we still have:
sage: D[b]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-8-6cf1ee6b63d5> in <module>()
----> 1 D[b]
KeyError: <__main__.Bla object at 0x5064310>
Hence, dictionaries use comparison by ==
, and thus my patch does the right thing, IMHO.
Replying to @simon-king-jena:
- I introduce
provide_hash_by_id
, but I don't use it. Shall it be deleted?
Yes, you should. I'm pretty sure changing type->tp_hash
after calling PyType_Ready()
is not supported by the Python C-API (see the inheritance problems we saw).
- Is it enough to override the rich comparison methods? Currently, one can have
hash(a)!=hash(b)
butcmp(a,b)==0
.Does this violate the contract of hash functions? Or is it enough that
hash(a)!=hash(b)
impliesa!=b
?
I think Python requires hash(a)!=hash(b)
implies not(a==b)
, which Python does not enforce to be the same thing. In any case, I'm pretty sure that "rich comparison" is fully exhausted before trying to use __cmp__
, which is only there because of backward compatibility.
There's another peculiarity for membership and lookup in python:
sage: class neq(object):
....: def __eq__(self,other):
....: return False
....: def __ne__(self,other):
....: return True
....:
sage: a=neq()
sage: V={a}
sage: a in V
True
sage: sage: [a == v for v in V]
[False]
They explicitly mention this in the documentation: because hash collisions are rare, they first test "is" for dict lookup before trying "==" (it's cheaper).
Replying to @nbruin:
Replying to @simon-king-jena:
- I introduce
provide_hash_by_id
, but I don't use it. Shall it be deleted?Yes, you should.
OK.
Does this violate the contract of hash functions? Or is it enough that
hash(a)!=hash(b)
impliesa!=b
?I think Python requires
hash(a)!=hash(b)
impliesnot(a==b)
,
Right, that's a difference.
In any case, I'm pretty sure that "rich comparison" is fully exhausted before trying to use
__cmp__
, which is only there because of backward compatibility.
Nope! See my post on sage-devel.
If you have a non-cdef class, then it seems __richcmp__
is simply ignored (but methods like __eq__
have precedence over __cmp__
). In a cdef class, __richcmp__
has priority over __cmp__
when deciding binary relations such as ==, <, <=, etc. But for cmp(,), __cmp__
will be used, even if there is __richcmp__
!
So, the decisive question is: Do dictionaries compare stuff by cmp(a,b)==0 or by a==b?
It is the latter, and thus making __richcmp__
compatible with __hash__
was the right thing to do.
For now, it needs work, because I will delete the C API hack, and apparently some script of the patchbot has a complaint...
Attachment: trac14054_fast_methods.patch.gz
Separate cache and uniqueness.
Done!
It seems that the patchbot complains about an increased startup time. How did that happen? Is it because of a slightly longer mro for UniqueRepresentation
?
Patchbot:
Apply trac14054_fast_methods.patch
What's your rationale to define 'a < b' etc. when a is b
? I agree that it's likely that your answers are what any ordering override will want, but Python does not mandate any particular properties of semantics of these operators. So I think the proper thing is to just not do anything there. It's going to be a rare occasion anyway, so a little speed gain won't be very noticeable anyway.
The behaviour you are implementing is much easier to explain if you don't override those cases: Then you can just say that WithRichCmpById
provides __hash__
, __eq__
and __ne__
, period.
I think WithEqualityById
would make a better conceptual name that is less dependent on the implementation, by the way. Your motivation here seems to be as much to provide conceptual units in preference of just technical implementation tools as to provide more efficient implementations. Someone writing a Python class may not know what "richcomp" is, and doesn't need to.
Hey,
Replying to @nbruin:
What's your rationale to define 'a < b' etc. when
a is b
? I agree that it's likely that your answers are what any ordering override will want, but Python does not mandate any particular properties of semantics of these operators. So I think the proper thing is to just not do anything there. It's going to be a rare occasion anyway, so a little speed gain won't be very noticeable anyway.
So how would you want comparisons in the complex numbers to behave? In python, they return an error:
sage: complex(2+2*i) < complex(2-2*i)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-ce2e269601f6> in <module>()
----> 1 complex(Integer(2)+Integer(2)*i) < complex(Integer(2)-Integer(2)*i)
TypeError: no ordering relation is defined for complex numbers
However in sage they are not so well-behaved (see #14088):
sage: CC(1+2*i) < CC(2-2*i)
True
so there's no prescribed sage way.
As far as the actual comparison code, I think it would be better to do something like
if m == 2:
return True
elif m == 3:
return False
else:
return m == 1 or m == 5
but this is a micro-optimization, so feel free to ignore this.
The behaviour you are implementing is much easier to explain if you don't override those cases: Then you can just say that
WithRichCmpById
provides__hash__
,__eq__
and__ne__
, period.I think
WithEqualityById
would make a better conceptual name that is less dependent on the implementation, by the way. Your motivation here seems to be as much to provide conceptual units in preference of just technical implementation tools as to provide more efficient implementations. Someone writing a Python class may not know what "richcomp" is, and doesn't need to.
I believe the response to both of these points depends on how we want to do comparison between objects (when there is not a [natural] ordering).
Last thing for now, shouldn't we issue a deprecation warning for FastHashable_class
since it has changed locations?
Thanks,
Travis
The issue is that if someone wants to define
class C(WithEqualityById):
def __cmp__(self,other):
return -1
they will find that
sage: a=C()
sage: a < a
False
which might surprise them, because they thought that WithEqualityById
only affected (in)equality testing and let ordering comparisons fall through to __cmp__
if implemented. With the current code, this is not the case if the two arguments are identical.
There have been extensive discussions about what inequality testing SHOULD be in sage and for the most part the implementation here is staying clear of the topic (which I think is a good thing). There's just this one small optimization that's probably usually OK, but if left out makes for much more predictable behaviour.
Replying to @nbruin:
What's your rationale to define 'a < b' etc. when
a is b
? I agree that it's likely that your answers are what any ordering override will want, but Python does not mandate any particular properties of semantics of these operators.
Python does not. But I think "comparison by identity" implies a semantics. And that is: a < a
is False, a <= a
is True.
The behaviour you are implementing is much easier to explain if you don't override those cases: Then you can just say that
WithRichCmpById
provides__hash__
,__eq__
and__ne__
, period.
OK.
I think
WithEqualityById
would make a better conceptual name that is less dependent on the implementation, by the way.
Good idea.
Replying to @tscrim:
Last thing for now, shouldn't we issue a deprecation warning for
FastHashable_class
since it has changed locations?
I think I introduced it. But when? Or what ticket? I don't think anyone used it.
Replying to @simon-king-jena:
The behaviour you are implementing is much easier to explain if you don't override those cases: Then you can just say that
WithRichCmpById
provides__hash__
,__eq__
and__ne__
, period.OK.
I think
WithEqualityById
would make a better conceptual name that is less dependent on the implementation, by the way.Good idea.
Note: I would not like the name "WithCmpById
", because cmp
relies on __cmp__
(which is not touched) even if __richcmp__
exists.
Replying to @tscrim:
Last thing for now, shouldn't we issue a deprecation warning for
FastHashable_class
since it has changed locations?I think I introduced it. But when? Or what ticket? I don't think anyone used it.
It was in #11900. We could of course keep it in sage.categories.category_singleton, but I think that's not the right place. Note that I remove the dependency of CategorySingleton
on FastHashable_class
, because the new cythoned hash is slightly faster.
This will not go into 5.7 anyway. Hence, I was rebasing against #6495 (which is a minor change anyway). In addition, I took into account your comments:
WithEqualityById
.FastHashable_class
back into category_singleton (wrong as this may be...)I hope this addresses all complaints.
Apply trac14054_fast_methods-5.8.patch
Changed dependencies from #14017 to #14017, #6495
Description changed:
---
+++
@@ -12,4 +12,4 @@
__Apply__
-- [attachment: trac14054_fast_methods.patch](https://github.com/sagemath/sage-prod/files/10657049/trac14054_fast_methods.patch.gz)
+- [attachment: trac14054_fast_methods-5.8.patch](https://github.com/sagemath/sage-prod/files/10657050/trac14054_fast_methods-5.8.patch.gz)
Oops, I forgot to rename two occurrences of WithRichCmpById
into WithEqualityById
...
Should now pass all tests.
Apply trac14054_fast_methods-5.8.patch
Strange. I can not replicate the error reported by the patchbot.
Neither can I by hand or testing the individual files. It has to be something since the testbot can reproduce it... Have you tried running testall by chance? I ran tests in the categories
folder than on free_module.py
and that passed.
It must have something to testing things in the right order...I'm wondering if the problem lies with free_module.tensor_constructor()
being a cached_method and something is being improperly stored...
Replying to @tscrim:
It must have something to testing things in the right order...I'm wondering if the problem lies with
free_module.tensor_constructor()
being a cached_method and something is being improperly stored...
The failing assertion is
File "/mnt/storage2TB/patchbot/Sage/sage-5.8.beta0/local/lib/python/site-packages/sage/categories/modules_with_basis.py", line 1387, in __call__
assert(x.parent() is self.domain())
I can hardly imagine that there is yet another premature deallocation via weak references in play. After all, both x._parent
and self._domain
are just plain strong references.
This parent mismatch happens in a morphism call. If a morphism gets registered in the wrong slot in a conversion/coercion then it could indeed end up getting mismatched input (shouldn't __call__
be a little more permissive about its arguments by the way? Certainly, an error would be more appropriate than an assert if this is something that actually can go wrong)
Anyway, TripleDict
does store homomorphisms and we've recently seen it's not entirely safe wrt. garbage collections that can happen during update procedures. That can cause homomorphisms to get registered under the wrong domain/codomain keys.
I think it's extremely unlikely that this is happening in this particular situation, but if all other conceivable alternatives are impossible ... (since the corruption would be dependent on a GC happening at just the wrong time, it would be extremely fickle behaviour that is very hard to reproduce).
Of course, if this test is using a map that wasn't retrieved from a TripleDict
or a MonoDict
the above explanation is impossible.
Replying to @nbruin:
This parent mismatch happens in a morphism call. If a morphism gets registered in the wrong slot in a conversion/coercion then it could indeed end up getting mismatched input
If it is in a wrong slot (you mean: Addressed by a non-identic copy of a parent?) then it should simply not occur here, because TripleDict
would compare by identity.
(shouldn't
__call__
be a little more permissive about its arguments by the way? Certainly, an error would be more appropriate than an assert if this is something that actually can go wrong)
No, I think an assertion is the right thing to do here. Perhaps one could provide the assertion with an error message that names the two parents that don't match. In that way, we could at least see what two parents are involved.
Note that my patch changes CombinatorialFreeModule
from UniqueRepresentation
to CachedRepresentation
, since it overrides the equality tests. So, it could actually be that the two parents are genuinely non-unique. And since a cached_method is involved here, which does comparison by equality and not identity, we could really be in trouble here.
Of course, it could be that changing to CachedRepresentation
was wrong in this case.
Replying to @simon-king-jena:
If it is in a wrong slot (you mean: Addressed by a non-identic copy of a parent?) then it should simply not occur here, because
TripleDict
would compare by identity.
No, what I meant is: bucket position to write value into gets determined, dictionary gets changed due to a GC, value gets placed into bucket position determined earlier, which now belongs to an entirely different key triple. Without #13387 we couldn't strictly exclude that from happening, but it would need extreme bad luck.
Your hypothesis sounds much more probable.
By the way: One reason for removing FastHashable_class
from sage.categories.category_singleton is the fact that the hash inherited from the cythoned version of UniqueRepresentation
is faster than what was provided by FastHashable_class
.
Replying to @simon-king-jena:
By the way: One reason for removing
FastHashable_class
from sage.categories.category_singleton is the fact that the hash inherited from the cythoned version ofUniqueRepresentation
is faster than what was provided byFastHashable_class
.
I didn't want it to not be removed, but I thought it was suppose to have a deprecation warning saying the class had moved/changed namespaces?
Replying to @tscrim:
I didn't want it to not be removed, but I thought it was suppose to have a deprecation warning saying the class had moved/changed namespaces?
Would you mind to just leave it in its original space (being an orphan, though)?
Otherwise: How does one deprecate a class that has no init method?
Replying to @simon-king-jena:
Otherwise: How does one deprecate a class that has no init method?
"Had" no init method, I should say.
Anyway, if I move it to a different file, then the import statement "from sage.categories.category_singleton import FastHashable_class
" should result in a deprecation warning. How can this be achieved?
Let's test whether the safer use of callbacks for weak references to Homsets stored in a TripleDict
from #14159 will fix the problem here.
Apply https://github.com/sagemath/sage-prod/files/10657050/trac14054_fast_methods-5.8.patch.gz
Changed dependencies from #14017, #6495 to #14017, #6495, #14159
Patchbot,
Apply trac14054_fast_methods-5.8.patch
Hooray, I can reproduce the assertion error (it only occurs with the patch from here)! So, that should make it possible to debug it.
Replying to @simon-king-jena:
Would you mind to just leave it in its original space (being an orphan, though)?
Otherwise: How does one deprecate a class that has no init method?
What I've done is turned the class into a function and have that function return the desired class after issuing the deprecation warning. I guess an alternative would be to implement a custom __call__
which does the same as above.
How are you reproducing the assertion error?
Replying to @tscrim:
How are you reproducing the assertion error?
By running ./sage -t -force_lib devel/sage/sage/combinat/free_module.py
with sage-5.8.beta0 plus #12313, #13387, #14159 and #14054.
Replying to @tscrim:
Replying to @simon-king-jena: What I've done is turned the class into a function and have that function return the desired class after issuing the deprecation warning.
Perhaps like this:
sage: from sage.misc.superseded import deprecated_function_alias
sage: g = deprecated_function_alias(13109, number_of_partitions)
I hesitate to do things like
from sage.misc.superseded import deprecated_function_alias
from sage.misc.fast_methods import FastFashable_class as FH_class
FastHashable_class = deprecated_function_alias(14054,FH_class)
in sage.categories.category_singleton:
FastHashable_class
is a base class, and hence one would like to inherit from it. But doing the above, FastHashable_class
would not be a class but a function in sage.categories.category_singleton.Is there a "lazy import statement with deprecation"? Then, one would not actually import FastHashable_class
in category_singleton, but from sage.categories.category_singleton import FastHashable_class
would result in a deprecation warning. And we can't cope with the compile time cimport problem anyway.
Replying to @simon-king-jena:
Perhaps like this:
sage: from sage.misc.superseded import deprecated_function_alias sage: g = deprecated_function_alias(13109, number_of_partitions)
That might work, but I haven't tried it (in principle, I don't see why it wouldn't). What I've done is like this:
def DeprecatedClass(self, arg1, arg2, ..., *old_args, **old_kwds):
from sage.misc.superseded import deprecation
deprecation(14054, "DepC is deprecated. Use path.to.new_mod.NewClass instead")
# Do whatever needs to be done to convert to NewClass's inputs
from path.to.new_mod import NewClass
return NewClass(arg1, arg2, ..., *old_args, **old_kwds)
However that's a good point about the inheritance...perhaps something like
class Dep(NewClass):
def __init__(self, arg1, arg2, ..., *args, **kwds):
from sage.misc.superseded import deprecation
deprecation(14054, "DepC is deprecated. Use path.to.new_mod.NewClass instead")
NewClass.__init__(self, arg1, arg2, ..., *args, **kwds)
?
Replying to @tscrim:
However that's a good point about the inheritance...perhaps something like
class Dep(NewClass): def __init__(self, arg1, arg2, ..., *args, **kwds): from sage.misc.superseded import deprecation deprecation(14054, "DepC is deprecated. Use path.to.new_mod.NewClass instead") NewClass.__init__(self, arg1, arg2, ..., *args, **kwds)
?
Well, that would still mean that we need to actually import it.
I am tempted to ask what sage-devel has to say about this issue.
Replying to @simon-king-jena:
Well, that would still mean that we need to actually import it.
I thought if we turn it into a python class, we wouldn't need to cimport it and it would only be imported if the old class was actually used? (To me honest, part of me is still wondering if we even really need a deprecation warning since it is such a low-level base class.)
I am tempted to ask what sage-devel has to say about this issue.
That might be for the best.
UniqueRepresentation
provides a comfortable way to create unique parent structures, and automatically provides a hash and certain comparison methods.Problems:
ClasscallMetaclass
and thus has to be a Python class. That's bad for speed.Here, I suggest to
sage.unique_representation.CachedRepresentation
as one base ofUniqueRepresentation
.sage.misc.fast_methods.WithRichCmpById
, that provides hash and rich comparison, as expected for unique instance behaviour.Apply
Depends on #14017 Depends on #6495 Depends on #14182 Depends on #14040 Depends on #14011
CC: @nthiery
Component: performance
Keywords: cython UniqueRepresentation
Author: Simon King
Reviewer: Travis Scrimshaw
Merged: sage-5.8.beta4
Issue created by migration from https://trac.sagemath.org/ticket/14054