Closed simon-king-jena closed 10 years ago
Nicolas, do you think that the new code layout (in particular for Category.join
) is sound?
Repeating the benchmarks:
sage: from sage.categories.category_with_axiom import canonicalize_axioms, all_axioms
sage: L = ["Commutative", "Connected", "Commutative", "WithBasis", "Finite"]
sage: %timeit canonicalize_axioms(all_axioms, L)
100000 loops, best of 3: 3.81 µs per loop
sage: L = [Coalgebras(QQ), Sets().Finite(), Algebras(ZZ), SimplicialComplexes()]
sage: %timeit Category._sort(L)
100000 loops, best of 3: 6.01 µs per loop
sage: %timeit Category._sort_uniq(L)
100000 loops, best of 3: 15.9 µs per loop
sage: L = [Coalgebras(QQ), Sets().Finite(), Algebras(ZZ), SimplicialComplexes()]
sage: %timeit Category.join(L)
1000 loops, best of 3: 510 µs per loop
Replying to @simon-king-jena:
Something more about the join:
- The cache key is computed with _sort_uniq. Fine, this is cythonised already.
- There remains a new cython function to create the flattened, normalised etc. list of super categories of the to-be-created join.
Category.join
would be simply like this:def join(...): T = _sort_uniq(...) try: if as_list: return Category._join_cache[T]._super_categories return Category._join_cache[T] except KeyError: pass S = create_super_categories(T) # this will be a list! if as_list: return S # or do caching? J = JoinCategory(S) _join_cache[T] = J return J
I think this would be a good compromise between cythonisation and "keeping the function where it belongs". In particular, the documentation would stay in
Categories.join
.
So the core of the join algorithm would be in
create_super_categories
, right? Since it's non trivial, we should
strive hard to keep it easy to find to read, and to debug. But yes,
this can be an option if we believe the difference is important enough.
Since
_sort
,_sort_uniq
and_flatten_categories
are not part of the docs anyway (or am I mistaken?), I think there should be no problem to move these over to the helper module.
I think I ended up adding them to the docs. But they are explicitly private methods, and thus susceptible to be moved at any time if this suits us.
Now, since they are static methods, maybe you could do
from ... import _sort_uniq
class Category:
_sort_uniq = _sort_uniq
and then the code could keep using Category._sort_uniq as before, right?
Cheers,
Replying to @nthiery:
Nice. How much would you say is due to the "cythonization" of all_axioms and friends, and how much to the cythonization of the full file?
Well, it's used in sort_uniq
:-) And we might want to use it in _super_categories
at some point. But I agree that's it's not critical. You could also just leave it in the Python file; if not just for testing purposes.
Cheers, Nicolas
Replying to @nthiery:
and then the code could keep using Category._sort_uniq as before, right?
Why should we add an attribute look-up?
PS: Of course, currently some tests will file due to some wrong import locations. I am running make ptest
now to get an overview.
PPS: We should find better benchmark tests.
On the one hand, we have the creation of the cache key, and there should be a test (with warm cache) that sees how much faster the cache key creation becomes.
On the other hand, we have the "category mangling" for finding the super categories of the join category. There should be a test that works around the cache. Say, with using a series of joins where the base ring varies.
Surprisingly, I only get few errors. But this one is annoying:
Failed example:
Category.join((Sets(), Rings(), Monoids()), as_list=True)
Expected:
[Category of rings]
Got:
[Category of rngs, Category of semirings]
But this is just due to my using the cache if as_list=True
. I must check whether the category in the cache is a JoinCategory
. If it isn't, then the cache item is returned, if it is, then the list of super categories is returned.
Branch pushed to git repo; I updated commit sha1. New commits:
cfe01ae | Fix Category.join with as_list=True. Fix doctests in sage.categories |
Dear Nicolas,
all tests in sage.categories pass. Some of the new stuff needs to be documented and tested. But please first comment if the code distribution of Python vs. Cython works for you.
And next I'll try to find better benchmarks.
Here is another benchmark. Idea: We have a list of tuples of categories, with varying base rings. For each tuple, we form the join category, first with cold cache, then with warm cache.
With #15801:
sage: L = [(Coalgebras(GF(q,'a')), Sets.Finite(), CommutativeRings(), SimplicialComplexes()) for q in prime_powers(2,50000)]
sage: %time C = [Category.join(Cats) for Cats in L]
CPU times: user 21.1 s, sys: 136 ms, total: 21.2 s
Wall time: 21.2 s
sage: %time C = [Category.join(Cats) for Cats in L]
CPU times: user 230 ms, sys: 1 ms, total: 231 ms
Wall time: 231 ms
With this branch:
sage: L = [(Coalgebras(GF(q,'a')), Sets.Finite(), CommutativeRings(), SimplicialComplexes()) for q in prime_powers(2,50000)]
sage: %time C = [Category.join(Cats) for Cats in L]
CPU times: user 18.7 s, sys: 172 ms, total: 18.9 s
Wall time: 18.9 s
sage: %time C = [Category.join(Cats) for Cats in L]
CPU times: user 150 ms, sys: 0 ns, total: 150 ms
Wall time: 162 ms
Discussion of the benchmark
FWIW, all tests pass. More tests of new functions need to be added.
Branch pushed to git repo; I updated commit sha1. New commits:
1e68e53 | Add doctests, remove a not used function |
Author: Simon King
Ping - needs rebasing.
Changed branch from u/SimonKing/ticket/16296 to public/categories/speed_improvements_category_with_axiom-16296
Reviewer: Travis Scrimshaw
I've rebased it (it was trivial) and removed get_all_axioms
since this is still in python and we can access the module level attribute all_axioms
easily (before this was needed since [Simon] had made all_axioms
a cdef attribute). Since my change was removing an unused method (except in testing), I'm going to set this to positive review since the patch LGTM.
New commits:
7432463 | Merge branch 'u/SimonKing/ticket/16296' of trac.sagemath.org:sage into public/categories/speed_improvements_category_with_axiom-16296 |
72d1365 | Very minor review changes. |
bc11438 | Removed get_all_axioms. |
5bad2ff | Replaced get_all_axioms with all_axioms in doctests. |
Changed branch from public/categories/speed_improvements_category_with_axiom-16296 to 5bad2ff
With the "axiomatic" approach towards categories introduced in #10963, we are now using join categories far more frequently than in the past. This involves many little operations (sorting, computation of index, ...) that should be made as fast as possible.
Hence, this ticket is about using Cython and optimizing basic algorithms in sage.categories.category and friends.
Depends on #10963 Depends on #15801 Depends on #16309
CC: @nthiery @mezzarobba
Component: categories
Keywords: cython performance categories
Author: Simon King
Branch/Commit:
5bad2ff
Reviewer: Travis Scrimshaw
Issue created by migration from https://trac.sagemath.org/ticket/16296