Closed tscrim closed 7 years ago
Branch pushed to git repo; I updated commit sha1. New commits:
7373869 | Speedup and improvement to dictionary addition and related methods. |
Branch pushed to git repo; I updated commit sha1. New commits:
f082104 | Reviewer changes by Nicolas. |
Hmmm...slightly worrisome that there is an ordering change to elements in that failing categories/finite_dimensional_algebras_with_basis.py
test, to which this ticket should not have caused AFAICS. Will investigate.
So what the issue is is that the example finite monoid does not have an ordering of its elements. What I am thinking of doing is implementing a _cmp_
by using the _cmp_by_value
of ElementWrapper
. This will make this doctest far less fragile in the future (it's somewhat surprising it is not machine dependent as is), but it then becomes less of a minimal implementation.
Branch pushed to git repo; I updated commit sha1. New commits:
d2a704c | 20680: define a total order on Monoids().Finite().example() for more deterministic tests |
Branch pushed to git repo; I updated commit sha1. New commits:
4e912ff | Merge branch 'develop' into t/20680/public/combinat/improve_dict_addition-20680 |
Is there anything combinatorics-specific to dict_addition.pyx
? If not, it should not be in the combinat
directory. Perhaps move to data_structures
?
Work Issues: Move from combinat
Here are some benchmarks, first with plain Sage 5.3, then with Travis's optimizations, and then after the code refactorization of commit ee10e70daec5704b4f2c8f63c2cf2e3c525e9311. Sorry that's a bit long; the timings fluctuate quite some which makes it painful to do.
Rough interpretation: with Travis optimization the speed gain fluctuates between 1 and 3, and the refactorization does no harm, and could even improve slightly the situation.
For each n=0,1000,2000,4000, we time the addition of two dictionaries of length n. Three times for overlapping dictionaries, three times for half overlapping dictionaries, three times for non overlapping dictionaries.
def Xn(n): return { i:i for i in range(n) }
def Yn(n): return { i:i for i in range(n,2*n) }
def Zn(n): return { i:i for i in range(2*n,3*n) }
Setup:
sage: blasadd = sage.combinat.dict_addition.dict_addition
Running the tests for a given n:
X = Xn(n); Y = Yn(n); Z = Zn(n)
%timeit blasadd([X,X])
%timeit blasadd([X,X])
%timeit blasadd([X,X])
%timeit blasadd([X,Y])
%timeit blasadd([X,Y])
%timeit blasadd([X,Y])
%timeit blasadd([X,Z])
%timeit blasadd([X,Z])
%timeit blasadd([X,Z])
Timings:
n = 0
1000000 loops, best of 3: 1.12 µs per loop
1000000 loops, best of 3: 662 ns per loop
1000000 loops, best of 3: 684 ns per loop
1000000 loops, best of 3: 646 ns per loop
1000000 loops, best of 3: 1.04 µs per loop
1000000 loops, best of 3: 941 ns per loop
1000000 loops, best of 3: 918 ns per loop
1000000 loops, best of 3: 705 ns per loop
1000000 loops, best of 3: 670 ns per loop
n = 1000
1000 loops, best of 3: 365 µs per loop
1000 loops, best of 3: 405 µs per loop
1000 loops, best of 3: 402 µs per loop
1000 loops, best of 3: 462 µs per loop
1000 loops, best of 3: 212 µs per loop
1000 loops, best of 3: 376 µs per loop
1000 loops, best of 3: 465 µs per loop
1000 loops, best of 3: 481 µs per loop
1000 loops, best of 3: 467 µs per loop
n=2000
1000 loops, best of 3: 835 µs per loop
1000 loops, best of 3: 369 µs per loop
1000 loops, best of 3: 785 µs per loop
1000 loops, best of 3: 504 µs per loop
1000 loops, best of 3: 968 µs per loop
1000 loops, best of 3: 561 µs per loop
1000 loops, best of 3: 618 µs per loop
1000 loops, best of 3: 675 µs per loop
1000 loops, best of 3: 1.02 ms per loop
n=4000
1000 loops, best of 3: 1.11 ms per loop
1000 loops, best of 3: 1.19 ms per loop
1000 loops, best of 3: 1.17 ms per loop
1000 loops, best of 3: 1.82 ms per loop
1000 loops, best of 3: 1.67 ms per loop
1000 loops, best of 3: 1.5 ms per loop
100 loops, best of 3: 2.02 ms per loop
100 loops, best of 3: 1.1 ms per loop
1000 loops, best of 3: 1.53 ms per loop
Setup:
blasadd = sage.combinat.dict_addition.dict_add
Running the tests for a given n:
X = Xn(n); Y = Yn(n); Z = Zn(n)
%timeit blasadd(X,X)
%timeit blasadd(X,X)
%timeit blasadd(X,X)
%timeit blasadd(X,Y)
%timeit blasadd(X,Y)
%timeit blasadd(X,Y)
%timeit blasadd(X,Z)
%timeit blasadd(X,Z)
%timeit blasadd(X,Z)
Timings:
n = 0:
1000000 loops, best of 3: 677 ns per loop
1000000 loops, best of 3: 574 ns per loop
1000000 loops, best of 3: 1.01 µs per loop
1000000 loops, best of 3: 985 ns per loop
1000000 loops, best of 3: 640 ns per loop
1000000 loops, best of 3: 635 ns per loop
1000000 loops, best of 3: 664 ns per loop
1000000 loops, best of 3: 489 ns per loop
1000000 loops, best of 3: 600 ns per loop
n = 1000: ~.3ms
1000 loops, best of 3: 357 µs per loop
1000 loops, best of 3: 353 µs per loop
1000 loops, best of 3: 373 µs per loop
1000 loops, best of 3: 282 µs per loop
1000 loops, best of 3: 283 µs per loop
1000 loops, best of 3: 259 µs per loop
1000 loops, best of 3: 302 µs per loop
1000 loops, best of 3: 315 µs per loop
1000 loops, best of 3: 313 µs per loop
n = 2000: .7ms
1000 loops, best of 3: 787 µs per loop
1000 loops, best of 3: 778 µs per loop
1000 loops, best of 3: 747 µs per loop
1000 loops, best of 3: 175 µs per loop
1000 loops, best of 3: 343 µs per loop
1000 loops, best of 3: 256 µs per loop
1000 loops, best of 3: 682 µs per loop
1000 loops, best of 3: 330 µs per loop
1000 loops, best of 3: 662 µs per loop
n = 4000: 1.2ms
1000 loops, best of 3: 1.15 ms per loop
1000 loops, best of 3: 1.46 ms per loop
1000 loops, best of 3: 1.2 ms per loop
1000 loops, best of 3: 1.11 ms per loop
1000 loops, best of 3: 1.09 ms per loop
1000 loops, best of 3: 800 µs per loop
1000 loops, best of 3: 985 µs per loop
1000 loops, best of 3: 915 µs per loop
1000 loops, best of 3: 1.08 ms per loop
Setup:
blasadd = sage.data_structures.blas_dict.add
Running the tests for a given n:
X = Xn(n); Y = Yn(n); Z = Zn(n)
%timeit blasadd(X,X)
%timeit blasadd(X,X)
%timeit blasadd(X,X)
%timeit blasadd(X,Y)
%timeit blasadd(X,Y)
%timeit blasadd(X,Y)
%timeit blasadd(X,Z)
%timeit blasadd(X,Z)
%timeit blasadd(X,Z)
Timings::
n=0: ~.2ms
1000000 loops, best of 3: 578 ns per loop
1000000 loops, best of 3: 603 ns per loop
1000000 loops, best of 3: 227 ns per loop
1000000 loops, best of 3: 226 ns per loop
1000000 loops, best of 3: 391 ns per loop
1000000 loops, best of 3: 572 ns per loop
1000000 loops, best of 3: 566 ns per loop
10000000 loops, best of 3: 515 ns per loop
1000000 loops, best of 3: 217 ns per loop
n=1000: ~.3ms
10000 loops, best of 3: 299 µs per loop
1000 loops, best of 3: 351 µs per loop
1000 loops, best of 3: 334 µs per loop
1000 loops, best of 3: 256 µs per loop
1000 loops, best of 3: 255 µs per loop
1000 loops, best of 3: 258 µs per loop
1000 loops, best of 3: 78.1 µs per loop
1000 loops, best of 3: 279 µs per loop
1000 loops, best of 3: 283 µs per loop
n = 2000: ~.6s
1000 loops, best of 3: 704 µs per loop
1000 loops, best of 3: 709 µs per loop
1000 loops, best of 3: 697 µs per loop
1000 loops, best of 3: 174 µs per loop
1000 loops, best of 3: 334 µs per loop
10000 loops, best of 3: 502 µs per loop
1000 loops, best of 3: 204 µs per loop
1000 loops, best of 3: 612 µs per loop
1000 loops, best of 3: 591 µs per loop
n = 4000: ~1 ms
1000 loops, best of 3: 1 ms per loop
1000 loops, best of 3: 960 µs per loop
1000 loops, best of 3: 903 µs per loop
1000 loops, best of 3: 658 µs per loop
1000 loops, best of 3: 1.04 ms per loop
1000 loops, best of 3: 723 µs per loop
1000 loops, best of 3: 1.15 ms per loop
1000 loops, best of 3: 1.16 ms per loop
1000 loops, best of 3: 984 µs per loop
Branch pushed to git repo; I updated commit sha1. New commits:
ee10e70 | 20680: refactored the internal dict_* methods of (Combinatorial)FreeModule with a BLAS-style API |
18dcef3 | Merge branch 'develop' into t/20680/public/combinat/improve_dict_addition-20680 |
bff94d1 | 20680: typo fix |
300b5e7 | 20680: micro bug fix (wrong argument order) |
Replying to @jdemeyer:
Is there anything combinatorics-specific to
dict_addition.pyx
? If not, it should not be in thecombinat
directory. Perhaps move todata_structures
?
I agree. I'll chat with Travis here for whether we do it now or in a later ticket.
After discussion with Travis:
sage.modules
and sage.data_structures
; Travis is more in favor of the latter.Any opinion?
I'll fix some remaining doctest failures in the mean time.
I also mentioned earlier that data_structures
seems like a better choice.
Replying to @jdemeyer:
I also mentioned earlier that
data_structures
seems like a better choice.
Your comment was about data_structures
versus combinat
, when we are hesitating between modules
and data_structures
:-)
I'll do the move this afternoon.
Branch pushed to git repo; I updated commit sha1. New commits:
be78e1f | 20680: refactored the logic of dict_addition.iaxpy for speed (hopefully) and compactness |
Branch pushed to git repo; I updated commit sha1. New commits:
7749836 | 20680: updated the deprecation aliases w.r.t. the previous move and documented that deprecation |
Branch pushed to git repo; I updated commit sha1. New commits:
07b303e | 20680: UTF-8 fix + note about Python3 compatibility |
Obvious design question: why don't you implement this as a subclass of dict
such that you could actually write a * D
instead of scal(a, D)
?
I understand that it might be more work this way. It does seem the most natural thing to do and it would result in more readable and more efficient code.
This should be avoided
__cmp__ = ElementWrapper._cmp_by_value
as it will not be supported in Python 3.
Cython knows how to copy dicts efficiently, so you can replace PyDict_Copy(D)
by D.copy()
(assuming that D
is declared as dict
).
I think you should also specify exactly what mathematical assumptions you make on K
. For example, you assume that bool(x)
implies bool(-x)
and you assume that -1 * x = -x
.
This is false:
.. TODO::
Upon migrating to Python 3, change .iteritems below to .items. We
don't want to do it now as this is a speed-critical location.
Cython supports .iteritems()
for objects typed as dict
, so you can just keep .iteritems()
regardless of Python version.
I think the remove_zeros
flag of iaxpy
is too confusing. You don't really define what happens if removes_zeros=False
(which keys would appear with a zero value?). I guess that every key which appears in X
or Y
should appear in the result, but that is not currently the case.
Branch pushed to git repo; I updated commit sha1. New commits:
9570c65 | Doing some changes asked for by Jeroen and some other little doc tweaks. |
comment:22 - This would be quite difficult to do because we would have to handle a * D
and D * a
, which would take more time because they would both pass through _mul_
. Furthermore, we couldn't do a*D + E
in one single operation.
comment:23 - I think this was needed at the time due to the absence of coercion for comparisons for
ElementWrapper
.
comment:24 - Done.
comment:25 - I took a crack at it. Let me know if you still think it is unclear.
Changed work issues from Move from combinat to none
Replying to @tscrim:
comment:22 - This would be quite difficult to do because we would have to handle
a * D
andD * a
, which would take more time because they would both pass through_mul_
.
More precisely, _mul_
would not be involved since D
would not be an Element
. But the coercion model would be involved. If the coercion model is too slow for these purposes, we really should fix that. It makes no sense to make code more complicated just to avoid the coercion model.
Furthermore, we couldn't do
a*D + E
in one single operation.
True, but then you make a method to do that in a single operation.
Regarding [comment:27], I think the phrase "values are zero after the addition has
been performed" is still not clear. Do you mean values such that a * x
is non-zero and y
is non-zero but a * x + y
is zero? If so, what is the rationale for keeping those zeros but not the cases where y
is zero and a * x
is zero?
Replying to @tscrim:
comment:22 - This would be quite difficult to do because we would have to handle
a * D
andD * a
, which would take more time because they would both pass through_mul_
. Furthermore, we couldn't doa*D + E
in one single operation.
OK, new try: how about subclassing dict
and making all operations methods of that class? That would avoid the inefficiency issues with the coercion model.
(Edit: wrong ticket)
Replying to @jdemeyer:
Replying to @tscrim:
comment:22 - This would be quite difficult to do because we would have to handle
a * D
andD * a
, which would take more time because they would both pass through_mul_
.More precisely,
_mul_
would not be involved sinceD
would not be anElement
. But the coercion model would be involved. If the coercion model is too slow for these purposes, we really should fix that. It makes no sense to make code more complicated just to avoid the coercion model.
Sorry, that should have been __mul__
, but you still need to take some cycles differentiating between a * D
and D * a
in Cython. At least, I could not get __radd__
on a Cython class (different ticket), and I'm assuming that extends to __rmul__
.
Replying to @jdemeyer:
OK, new try: how about subclassing dict and making all operations methods of that class? That would avoid the inefficiency issues with the coercion model.
If we make all operations to be methods of that class, then all I see is unneeded complexity added because we'd still be making (essentially) function calls everywhere, but we have to differentiate between dict
and BLASdict
.
Replying to @jdemeyer:
Regarding [comment:27], I think the phrase "values are zero after the addition has been performed" is still not clear. Do you mean values such that
a * x
is non-zero andy
is non-zero buta * x + y
is zero? If so, what is the rationale for keeping those zeros but not the cases wherey
is zero anda * x
is zero?
I think you're misparsing something. The word value
is as in key-value pair of a dictionary (because everything we are doing is dictionaries). In some cases, someone might want a basis element that has a coefficient of 0. For instance, it takes longer to remove these coefficients, and if you're doing a lot of additions with full support, you may only want to remove the basis elements with a 0 coefficient after you're all done.
Actually, slightly radical proposal: How about removing this altogether as a separate spkg since it is independent of Sage?
Replying to @tscrim:
Actually, slightly radical proposal: How about removing this altogether as a separate spkg since it is independent of Sage?
Even if you do that, you could still develop it within Sage and then split it off.
Replying to @tscrim:
Replying to @jdemeyer:
Regarding [comment:27], I think the phrase "values are zero after the addition has been performed" is still not clear. Do you mean values such that
a * x
is non-zero andy
is non-zero buta * x + y
is zero? If so, what is the rationale for keeping those zeros but not the cases wherey
is zero anda * x
is zero?I think you're misparsing something.
If I am misparsing something, it probably means that the documentation wasn't clear. My question remains: if remove_zeros=False
, exactly which keys will appear with a zero value?
We improve the speed of methods like
dict_addition
in order to improve the speed of addition inCombinatorialFreeModule
.CC: @sagetrac-sage-combinat @nthiery
Component: performance
Keywords: combinatorial free module, addition, days79
Author: Travis Scrimshaw, Nicolas M. Thiéry
Branch/Commit:
95aacfa
Reviewer: Nicolas M. Thiéry, Jeroen Demeyer
Issue created by migration from https://trac.sagemath.org/ticket/20680