Increase speed for Coxeter groups, Weyl groups, and quantum Bruhat graph

tscrim commented 8 years ago

The primary goal of this ticket is to improve the creation speed for the quantum Bruhat graph. We do this in a number of ways:

Better management of data associated to lattice.nonparabolic_positive_roots.
Implement a (temporary) cache of the lengths of elements.

In addition, we also provide some general speedups to all matrix groups and Coxeter groups that came from looking into the above improvements. The net result is over 12x speedup of the creation of the quantum Bruhat graph:

sage: W = WeylGroup(['D',5], prefix='s')
sage: %time G = W.quantum_bruhat_graph()
CPU times: user 14 s, sys: 60.6 ms, total: 14 s
Wall time: 14 s

whereas previously this took over 3 minutes to compute. The downside is this has a larger memory footprint because of the temporary cache, but repeatedly computing the lengths of the elements was far too expensive.

This also includes a speedup of iterating over the entire Coxeter/Weyl group.

CC: @sagetrac-sage-combinat @anneschilling @sagetrac-mshimo @nthiery @darijgr @fchapoton @stumpc5 @jplab

Component: combinatorics

Keywords: quantum bruhat graph

Author: Travis Scrimshaw

Branch/Commit: 0ce02a6

Reviewer: Frédéric Chapoton

Issue created by migration from https://trac.sagemath.org/ticket/19821

tscrim commented 8 years ago

New commits:

`0b4069c`	`Speedup has_right_descent() for Coxeter groups.`
`c2c3f17`	`Speedup quantum_bruhat_graph by doing some things locally.`
`2b1a117`	`Changed libs.gap.element.GapElement.matrix() to avoid matrix constructor.`
`2426155`	`Use a specialized version of GapElement.matrix() to avoid some overhead.`
`4caa2bf`	`Don't store matrix() of Weyl group elements and some cleanup.`
`fcf0e35`	`Added length cache to quantum_bruhat_graph().`
`0b5fcec`	`Get a little more speed by using the matrices as keys for the length cache.`

tscrim commented 8 years ago

Branch: public/combinat/speedup_coxeter_weyl_matrix_groups-19821

tscrim commented 8 years ago

Commit: 0b5fcec

tscrim commented 8 years ago

comment:2

Also, _finite_recognition was no longer needed since we now have Coxeter types. One question I have for CoxeterGroup is should the default ring for simply-laced types be ZZ? The code which compares elements in the universal cyclotomic field to 0 is horribly slow and is the main reason why iteration over an instance of CoxeterGroup (over the UCF) takes so long. Compare:

sage: WC = CoxeterGroup(['D',4], base_ring=ZZ)
sage: %timeit L = [x for x in WC]
100 loops, best of 3: 12.1 ms per loop
sage: WC = CoxeterGroup(['D',4]) # base_ring=UniversalCyclotomicField()
sage: %timeit L = [x for x in WC]
1 loops, best of 3: 224 ms per loop

sage: WC = CoxeterGroup(['D',5], base_ring=ZZ)
sage: %timeit L = [x for x in WC]
10 loops, best of 3: 152 ms per loop
sage: WC = CoxeterGroup(['D',5]) # base_ring=UniversalCyclotomicField()
sage: %timeit L = [x for x in WC]
1 loops, best of 3: 3.97 s per loop

tscrim commented 8 years ago

Description changed:

--- 
+++ 
@@ -12,3 +12,5 @@
 Wall time: 14 s

whereas previously this took over 3 minutes to compute. The downside is this has a larger memory footprint because of the temporary cache, but repeatedly computing the lengths of the elements was far too expensive. + +This also includes a speedup of iterating over the entire Coxeter/Weyl group.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 0b5fcec to f673aec

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

f673aec A better method to find the minimal length coset representatives.

darijgr commented 8 years ago

comment:4

A few questions:

1) What is a quantum Bruhat graph, and where can I read about it?

2) Have the indirect doctests from _finite_recognition been salvaged?

Also, when changing DiGraph, can you please add the right format in analogy to https://github.com/sagemath/sagetrac-mirror/commit/6cb47c00ca2315caf9b873506a0ea167d6706c41 ? Thank you!

tscrim commented 8 years ago

comment:5

Replying to @darijgr:

1) What is a quantum Bruhat graph, and where can I read about it?

It is the usual Bruhat poset but with added "quantum" edges that are used for many purposes. One particular use in Sage is for specializations of Macdonald polynomials at t=0. See, e.g.,

2) Have the indirect doctests from _finite_recognition been salvaged?

Not exactly, but essentially the same tests in recognize_coxeter_type_from_matrix in sage.combinat.root_system.coxeter_matrix. So I would say everything is covered.

Also, when changing DiGraph, can you please add the right format in analogy to https://github.com/sagemath/sagetrac-mirror/commit/6cb47c00ca2315caf9b873506a0ea167d6706c41 ? Thank you!

Technically I didn't change the input for the digraph, but I can add it in if you're not going to make other additions.

darijgr commented 8 years ago

comment:6

I'm not making any additions today, as I'd have to learn this stuff first (and I don't know if I have time for that) and even then I wouldn't be sure of the thinking behind the removal of __matrix. (This will be so much easier once I'm in Minneapolis...)

Thanks for the references!

videlec commented 8 years ago

comment:7

For UCF comparison, it will be faster after #19825 (it avoids QQbar usage).

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

af33acf Added explicit format indicator for DiGraph constructor for Darij.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from f673aec to af33acf

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

e014037 Making some additional improvements.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from af33acf to e014037

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

1129ab7 Removing comment about bottlenecks.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from e014037 to 1129ab7

tscrim commented 8 years ago

comment:11

Replying to @darijgr:

I'm not making any additions today, as I'd have to learn this stuff first (and I don't know if I have time for that) and even then I wouldn't be sure of the thinking behind the removal of __matrix. (This will be so much easier once I'm in Minneapolis...)

Added. The __matrix was just duplicating what matrix() already does and resulted in quite a big speedup to iteration (it cut it in half IIRC).

Replying to @videlec:

For UCF comparison, it will be faster after #19825 (it avoids QQbar usage).

With #19825 and my recent changes, it just takes 4 seconds on my laptop to iterate over D₅. Now the biggest speed gain (about 2.5s) will be in improving matrix multiplication over the UCF, as opposed to speeding up the comparisons (about 1.5s)

One takeaway message: if you are working in a simply-laced type for CoxeterGroup, use ZZ.

tscrim commented 8 years ago

comment:13

See #16116 for a discussion on speeding up matrices over (universal) cyclotomic fields.

nthiery commented 8 years ago

comment:14

Replying to @tscrim:

One takeaway message: if you are working in a simply-laced type for CoxeterGroup, use ZZ.

Or more generally a crystalographic type, I assume?

Thanks for the hard work btw!

nthiery commented 8 years ago

comment:15

Also: you may want to add a datapoint on #9285 after this.

tscrim commented 8 years ago

comment:16

Replying to @nthiery:

Replying to @tscrim:

One takeaway message: if you are working in a simply-laced type for CoxeterGroup, use ZZ.

Or more generally a crystalographic type, I assume?

No, it only works for simply-laced types as the cos(\pi / m) needs to be rational:

sage: W = CoxeterGroup(['B',2])
sage: W.gens()
(
[           -1 E(8) - E(8)^3]  [            1             0]
[            0             1], [E(8) - E(8)^3            -1]
)

I also added my datapoint for #9285, but it seems infeasible at this point using CoxeterGroup. The majority of the time is spent looking through the descents, which has to be done for keeping the constant memory iteration.

         16341032 function calls in 8.912 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   155520    2.187    0.000    3.494    0.000 group_element.py:308(_mul_)
   574775    1.657    0.000    3.515    0.000 coxeter_group.py:483(has_right_descent)
  2997405    0.621    0.000    0.621    0.000 coxeter_group.py:513(<genexpr>)
   574775    0.434    0.000    4.002    0.000 coxeter_groups.py:782(has_descent)
  1356910    0.418    0.000    0.568    0.000 coxeter_group.py:294(index_set)
   155520    0.357    0.000    0.357    0.000 matrix_space.py:142(__classcall__)
   155520    0.343    0.000    0.356    0.000 group_element.py:241(__init__)
   574775    0.298    0.000    0.881    0.000 {all}
   155520    0.254    0.000    0.294    0.000 {method '__copy__' of 'sage.matrix.matrix_integer_dense.Matrix_integer_dense' objects}
   155520    0.241    0.000    3.781    0.000 coxeter_groups.py:1498(apply_simple_reflection_right)
   155520    0.224    0.000    2.264    0.000 coxeter_groups.py:860(first_descent)
   103679    0.210    0.000    8.548    0.000 coxeter_groups.py:282(succ)
    51841    0.198    0.000    8.841    0.000 backtrack.py:139(search_forest_iterator)
  1356910    0.150    0.000    0.150    0.000 coxeter_matrix.py:779(index_set)
   155520    0.145    0.000    0.457    0.000 matrix_space.py:1247(matrix)
    51840    0.143    0.000    2.223    0.000 coxeter_groups.py:888(descents)
   574775    0.107    0.000    0.107    0.000 {method 'index' of 'tuple' objects}
   574775    0.106    0.000    0.106    0.000 {range}
  1667950    0.104    0.000    0.104    0.000 {method 'parent' of 'sage.structure.element.Element' objects}
   574775    0.076    0.000    0.076    0.000 group_element.py:282(matrix)
   155520    0.075    0.000    0.532    0.000 matrix_space.py:423(__call__)
   730298    0.072    0.000    0.072    0.000 {isinstance}
        1    0.071    0.071    8.912    8.912 <string>:1(<module>)
   155520    0.070    0.000    3.851    0.000 coxeter_groups.py:1526(apply_simple_reflection)

Moreover, a non-trivial amount of time is spent gathering the information to run the check for descents:

Timer unit: 1e-06 s

Total time: 4.06995 s
File: /home/travis/sage/local/lib/python2.7/site-packages/sage/groups/matrix_gps/coxeter_group.py
Function: has_right_descent at line 483

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
...
   509    574775       805826      1.4     19.8              i = self.parent().index_set().index(i)
   510    574775       668970      1.2     16.4              n = len(self.parent().index_set())
   511    574775       436368      0.8     10.7              M = self.matrix()
   512    574775       314255      0.5      7.7              zero = M.base_ring().zero()
   513    574775      1844527      3.2     45.3              return all(M[j,i] <= zero for j in range(n))

So there is definite room for optimization, but it would likely lead to some code duplication. I think Florent's parallelization of SearchForest and its move to the cython file will help a fair amount. Similarly, we could cythonize MatrixGroupElement_generic and its superclass MatrixGroupElement_base. We also get a bit of a penalty for the indirection (I implemented has_right_descent instead of has_descent).

However, this data will be rendered obsolete in a moment because I will be pushing a specialized version of descents and first_descent.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

8509534 Implementing custon first_descent() and descent() methods for CoxeterGroup.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 1129ab7 to 8509534

tscrim commented 8 years ago

comment:18

So that dropped my %prun iteration down to ~6.5 seconds (see #9285 for true timings for E₆ and E₇). However, there is a somewhat disturbing amount of overhead with recomputing initializing data (nearly 1s):

Timer unit: 1e-06 s

Total time: 2.32224 s
File: /home/travis/sage/local/lib/python2.7/site-packages/sage/groups/matrix_gps/coxeter_group.py
Function: first_descent at line 483

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   483                                                   def first_descent(self, side = 'right', index_set=None, positive=False):
   484                                                       """
...
   491                                                       """
   492    155520       177504      1.1      7.6              M = self.matrix()
   493    155520       106485      0.7      4.6              if side != 'right':
   494                                                           M = ~M
   495    155520       244268      1.6     10.5              I = self.parent().index_set()
   496    155520       115970      0.7      5.0              n = len(I)
   497    155520       133044      0.9      5.7              zero = M.base_ring().zero()
   498    155520        98536      0.6      4.2              if index_set is None:
   499    155520       145872      0.9      6.3                  index_set = range(n)
   500                                                       else:
   501                                                           index_set = [I.index(i) for i in index_set]
   502    155520        96996      0.6      4.2              if positive:
   503                                                           for i in index_set:
   504                                                               if any(M[j,i] > zero for j in range(n)):
   505                                                                   return I[i]
   506                                                       else:
   507    263735       183335      0.7      7.9                  for i in index_set:
   508    263735       902137      3.4     38.8                      if all(M[j,i] <= zero for j in range(n)):
   509    155520       118092      0.8      5.1                          return I[i]
   510                                                       return None

So if we really wanted to go a bit crazy with things, we should implement some kind of custom iterator that subclasses SearchForest, as well as avoid converting to/from the index set and use {0,1,...,n}. Although I would say the next step would be the cythonization of the matrix group element classes if we wanted to try and squeeze more out of it right now.

fchapoton commented 8 years ago

comment:19

Travis, thanks for your work. Have you looked at #11010 recently ? Do you think that this ticket will help to improve the situation of #11010 ?

stumpc5 commented 8 years ago

comment:20

I would also like to get a final version of #11187, even if GAP3 needs to be installed manually. I will try to get some comparisons to the stuff you did here, thanks for all that! (This will indeed also get #11010 to become much faster.)

stumpc5 commented 8 years ago

comment:21

(Indeed, I just wanted to start some comparisons, but ran into the problem fixed in #19795. I now have to first rebuild Sage to 7.0.beta2 which will take a another day or two before I can start doing the testing...)

tscrim commented 8 years ago

comment:22

@fchapoton I haven't looked too closely at the changes to #11010, but I do suspect it will improve things there.

@stumpc5 In case you didn't catch it, Jeoren had mentioned on sage-release that you will need to run make distclean && make for the version bump to 7.0.beta2 (which is why I haven't bumped yet from 7.0.beta1).

@all At some point we should all sit down and port everything that we need from GAP3 to GAP4/sage/separate-library. Although that could likely be quite a project for any one particular package...

stumpc5 commented 8 years ago

comment:23

Jeoren had mentioned on sage-release that you will need to run make distclean && make for the version bump to 7.0.beta2

Yep, realized that after compiling for a few hours.

At some point we should all sit down and port everything that we need from GAP3 to GAP4/sage/separate-library. Although that could likely be quite a project for any one particular package...

This seems like an infeasible project for anyone not having many months spare time doing nothing but this while sitting next to someone knowing about the needs of the particular packages. (In short: this seems like an infeasible project.) Nevertheless, it needs to be started at some point anyway...

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

2fdaef4 Forgot the doctests for first_descent.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 8509534 to 2fdaef4

tscrim commented 8 years ago

Dependencies: #19870

tscrim commented 8 years ago

comment:25

The cythonization of matrix group elements is now #19870. I pulled the changes to the matrix group elements to #19870, so that is why it is set as a dependency (but I did not merge that branch in yet). However, it is simple enough to swap the dependencies if this gets a positive review first.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 2fdaef4 to 2e9535a

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`0e04e0d`	`Merge branch 'develop' into public/combinat/speedup_coxeter_weyl_matrix_groups-19821`
`83c6b8d`	`Merge branch 'develop' into public/combinat/speedup_coxeter_weyl_matrix_groups-19821`
`2e9535a`	`Make the generators of CoxeterGroup dense.`

tscrim commented 8 years ago

comment:27

I have now merged in #19870. I also made the generators to CoxeterGroup be dense matrices for the following reason. The one() element is dense because the matrix_space() is dense. By making the generators dense, it means the generators are not converted to dense matrices upon multiplication, resulting in over a 2x speedup to iteration and sparse matrix multiplication is 3x slower in rank 6. Moreover, on average there are 11/18 non-zero entries in E₆, so IMO they are not really sparse matrices.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`989b6ea`	`Adding changes from #19821.`
`6358d53`	`Cythonized matrix_gp/group_element.py and simplified the class structure.`
`65ce465`	`Fixing modform_hecketriangle due to changes.`
`1110926`	`Fixing last doctests.`
`3efe9b6`	`Merge branch 'develop' into public/groups/cythonize_matrix_group_element-19870`
`29f7253`	`Special case for dense matrices over ZZ and making sure the inverse is in the base ring.`
`9582f28`	`Merge branch 'public/combinat/speedup_coxeter_weyl_matrix_groups-19821' into public/groups/cythonize_matrix_group_element-19870`
`525c753`	`Merge branch 'public/groups/cythonize_matrix_group_element-19870' into public/combinat/speedup_coxeter_weyl_matrix_groups-19821`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 2e9535a to 525c753

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`8cac13c`	`Merge branch 'public/combinat/speedup_coxeter_weyl_matrix_groups-19821' of trac.sagemath.org:sage into public/combinat/speedup_coxeter_weyl_matrix_groups-19821`
`5333b0f`	`Implementing reviewer comments.`
`274f662`	`Merge branch 'public/groups/cythonize_matrix_group_element-19870' of trac.sagemath.org:sage into public/combinat/speedup_coxeter_weyl_matrix_groups-19821`
`9250e99`	`Fixing doctest and adding is_one.`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 525c753 to 9250e99

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 9250e99 to bea6b3c

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

bea6b3c Even better is_one by avoiding unnecessary coercions for matrices.

tscrim commented 8 years ago

comment:31

I fixed a trivial failing doctest and implemented a custom is_one(), which is much faster (~30 microseconds to a little over 1) with the changes to the matrices (in particular, avoiding the coercion calls).

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`f38620a`	`Check identity rather than equality from coset_representative.`
`df18b13`	`Merge branch 'public/groups/cythonize_matrix_group_element-19870' of trac.sagemath.org:sage into public/groups/cythonize_matrix_group_element-19870`
`0ce02a6`	`Merge branch 'public/groups/cythonize_matrix_group_element-19870' into public/combinat/speedup_coxeter_weyl_matrix_groups-19821`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from bea6b3c to 0ce02a6

fchapoton commented 8 years ago

Changed dependencies from #19870 to none

fchapoton commented 8 years ago

Reviewer: Frédéric Chapoton

fchapoton commented 8 years ago

comment:34

ok, looks good enough to me. Let it be

vbraun commented 8 years ago

Changed branch from public/combinat/speedup_coxeter_weyl_matrix_groups-19821 to 0ce02a6

sagemath / sage

Increase speed for Coxeter groups, Weyl groups, and quantum Bruhat graph #19821