sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.34k stars 453 forks source link

Speed improvements for categories with axioms #16296

Closed simon-king-jena closed 10 years ago

simon-king-jena commented 10 years ago

With the "axiomatic" approach towards categories introduced in #10963, we are now using join categories far more frequently than in the past. This involves many little operations (sorting, computation of index, ...) that should be made as fast as possible.

Hence, this ticket is about using Cython and optimizing basic algorithms in sage.categories.category and friends.

Depends on #10963 Depends on #15801 Depends on #16309

CC: @nthiery @mezzarobba

Component: categories

Keywords: cython performance categories

Author: Simon King

Branch/Commit: 5bad2ff

Reviewer: Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/16296

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from d7f8250 to df486b8

simon-king-jena commented 10 years ago
comment:46

Nicolas, do you think that the new code layout (in particular for Category.join) is sound?

Repeating the benchmarks:

sage: from sage.categories.category_with_axiom import canonicalize_axioms, all_axioms
sage: L = ["Commutative", "Connected", "Commutative", "WithBasis", "Finite"]
sage: %timeit canonicalize_axioms(all_axioms, L)
100000 loops, best of 3: 3.81 µs per loop
sage: L = [Coalgebras(QQ), Sets().Finite(), Algebras(ZZ), SimplicialComplexes()]
sage: %timeit Category._sort(L)
100000 loops, best of 3: 6.01 µs per loop
sage: %timeit Category._sort_uniq(L)
100000 loops, best of 3: 15.9 µs per loop
sage: L = [Coalgebras(QQ), Sets().Finite(), Algebras(ZZ), SimplicialComplexes()]
sage: %timeit Category.join(L)
1000 loops, best of 3: 510 µs per loop
nthiery commented 10 years ago
comment:47

Replying to @simon-king-jena:

Something more about the join:

  • The cache key is computed with _sort_uniq. Fine, this is cythonised already.
  • There remains a new cython function to create the flattened, normalised etc. list of super categories of the to-be-created join.
  • Category.join would be simply like this:

    def join(...):
        T = _sort_uniq(...)
        try:
            if as_list:
                return Category._join_cache[T]._super_categories
            return Category._join_cache[T]
        except KeyError:
            pass
        S = create_super_categories(T) # this will be a list!
        if as_list:
            return S # or do caching?
        J = JoinCategory(S)
        _join_cache[T] = J
        return J

    I think this would be a good compromise between cythonisation and "keeping the function where it belongs". In particular, the documentation would stay in Categories.join.

So the core of the join algorithm would be in create_super_categories, right? Since it's non trivial, we should strive hard to keep it easy to find to read, and to debug. But yes, this can be an option if we believe the difference is important enough.

Since _sort, _sort_uniq and _flatten_categories are not part of the docs anyway (or am I mistaken?), I think there should be no problem to move these over to the helper module.

I think I ended up adding them to the docs. But they are explicitly private methods, and thus susceptible to be moved at any time if this suits us.

Now, since they are static methods, maybe you could do

from ... import _sort_uniq

class Category:
    _sort_uniq = _sort_uniq

and then the code could keep using Category._sort_uniq as before, right?

Cheers,

nthiery commented 10 years ago
comment:48

Replying to @nthiery:

Nice. How much would you say is due to the "cythonization" of all_axioms and friends, and how much to the cythonization of the full file?

Well, it's used in sort_uniq :-) And we might want to use it in _super_categories at some point. But I agree that's it's not critical. You could also just leave it in the Python file; if not just for testing purposes.

Cheers, Nicolas

simon-king-jena commented 10 years ago
comment:49

Replying to @nthiery:

and then the code could keep using Category._sort_uniq as before, right?

Why should we add an attribute look-up?

simon-king-jena commented 10 years ago
comment:50

PS: Of course, currently some tests will file due to some wrong import locations. I am running make ptest now to get an overview.

simon-king-jena commented 10 years ago
comment:51

PPS: We should find better benchmark tests.

On the one hand, we have the creation of the cache key, and there should be a test (with warm cache) that sees how much faster the cache key creation becomes.

On the other hand, we have the "category mangling" for finding the super categories of the join category. There should be a test that works around the cache. Say, with using a series of joins where the base ring varies.

simon-king-jena commented 10 years ago
comment:52

Surprisingly, I only get few errors. But this one is annoying:

Failed example:
    Category.join((Sets(), Rings(), Monoids()), as_list=True)
Expected:
    [Category of rings]
Got:
    [Category of rngs, Category of semirings]

But this is just due to my using the cache if as_list=True. I must check whether the category in the cache is a JoinCategory. If it isn't, then the cache item is returned, if it is, then the list of super categories is returned.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from df486b8 to cfe01ae

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

cfe01aeFix Category.join with as_list=True. Fix doctests in sage.categories
simon-king-jena commented 10 years ago
comment:54

Dear Nicolas,

all tests in sage.categories pass. Some of the new stuff needs to be documented and tested. But please first comment if the code distribution of Python vs. Cython works for you.

And next I'll try to find better benchmarks.

simon-king-jena commented 10 years ago
comment:55

Here is another benchmark. Idea: We have a list of tuples of categories, with varying base rings. For each tuple, we form the join category, first with cold cache, then with warm cache.

With #15801:

sage: L = [(Coalgebras(GF(q,'a')), Sets.Finite(), CommutativeRings(), SimplicialComplexes()) for q in prime_powers(2,50000)]
sage: %time C = [Category.join(Cats) for Cats in L]
CPU times: user 21.1 s, sys: 136 ms, total: 21.2 s
Wall time: 21.2 s
sage: %time C = [Category.join(Cats) for Cats in L]
CPU times: user 230 ms, sys: 1 ms, total: 231 ms
Wall time: 231 ms

With this branch:

sage: L = [(Coalgebras(GF(q,'a')), Sets.Finite(), CommutativeRings(), SimplicialComplexes()) for q in prime_powers(2,50000)]
sage: %time C = [Category.join(Cats) for Cats in L]
CPU times: user 18.7 s, sys: 172 ms, total: 18.9 s
Wall time: 18.9 s
sage: %time C = [Category.join(Cats) for Cats in L]
CPU times: user 150 ms, sys: 0 ns, total: 150 ms
Wall time: 162 ms

Discussion of the benchmark

simon-king-jena commented 10 years ago
comment:56

FWIW, all tests pass. More tests of new functions need to be added.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

1e68e53Add doctests, remove a not used function
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from cfe01ae to 1e68e53

simon-king-jena commented 10 years ago

Author: Simon King

tscrim commented 10 years ago
comment:59

Ping - needs rebasing.

tscrim commented 10 years ago

Changed branch from u/SimonKing/ticket/16296 to public/categories/speed_improvements_category_with_axiom-16296

tscrim commented 10 years ago

Reviewer: Travis Scrimshaw

tscrim commented 10 years ago

Changed commit from 1e68e53 to 5bad2ff

tscrim commented 10 years ago
comment:60

I've rebased it (it was trivial) and removed get_all_axioms since this is still in python and we can access the module level attribute all_axioms easily (before this was needed since [Simon] had made all_axioms a cdef attribute). Since my change was removing an unused method (except in testing), I'm going to set this to positive review since the patch LGTM.


New commits:

7432463Merge branch 'u/SimonKing/ticket/16296' of trac.sagemath.org:sage into public/categories/speed_improvements_category_with_axiom-16296
72d1365Very minor review changes.
bc11438Removed get_all_axioms.
5bad2ffReplaced get_all_axioms with all_axioms in doctests.
vbraun commented 10 years ago

Changed branch from public/categories/speed_improvements_category_with_axiom-16296 to 5bad2ff