Parallel computation of number of solutions in dancing links

seblabbe commented 9 years ago

The following computation takes a lot of time:

sage: from sage.games.quantumino import QuantuminoSolver
sage: QuantuminoSolver(0).number_of_solutions()  # long time (several days)

but we can make it faster by doing the computation in parallel... This ticket does this (directly in the dancing links code).

Component: combinatorics

Author: Sébastien Labbé

Branch/Commit: 29ff986

Reviewer: Vincent Delecroix

Issue created by migration from https://trac.sagemath.org/ticket/18987

videlec commented 9 years ago

comment:36

Note that the patchbot was not able to build the doc.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from dc3797b to 1e01ba2

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`ca84fcf`	`Trac #18987: Parallel computation of the nb of solutions for tilings`
`3d4402b`	`Trac #18987: Tilings of polyominos modulo 180 degrees rotations`
`65f6512`	`Trac #18987: Add a transparent gray box to QuantuminoState.show3d`
`555f0a9`	`Trac #18987: doc + optim in dancing_links`
`481d96e`	`Trac #18987: rename orthogonal_transformation function`
`5f66e1d`	`Trac #18987: Improve explanation of the modpi argument`
`d359115`	`Trac #18987: Using cosets for when modpi=True`
`1e01ba2`	`Trac #18987: Change pentaminos to there canonical form`

seblabbe commented 9 years ago

comment:38

There was a TESTS:: with text following. Fixed that (and squashed it the the last commit). Also added the prefix Trac #18987: to all commit messages.

seblabbe commented 9 years ago

comment:39

There is one failing doctest on the patchbot that is not failing on my machine...

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

2133ecd Trac #18987: fixing intermittent doctest failure

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 1e01ba2 to 2133ecd

videlec commented 9 years ago

comment:42

Are you sure ncube_isometry_group is worth a @cached_function?

In ncube_isometry_group_modpi why are you building MatrixGroup?

In

P_coset = set(frozenset((m.matrix() * self).canonical() for m in coset) for coset in L)
return set(next(iter(s)) for s in P_coset)

You are building a lot of images to consider matrix x polyomino to use just one at the end. Why not

return set((L[0].matrix() * self).canonical() for coset in L)

And if you were not using matrix groups you can even do

return set((L[0] * self).canonical() for coset in L)

seblabbe commented 9 years ago

comment:43

You are using L instead of coset I think. Can you edit your previous comment to make sure I understand what you mean?

seblabbe commented 9 years ago

comment:44

Replying to @videlec:

Are you sure ncube_isometry_group is worth a @cached_function?

Calling this function takes 2.8s on my machine. And it is called once for each polyomino. That is 16 times for the Quantumino puzzle. With the cache, I gain about 40s to construct the rows to give to the dlx solver.

In ncube_isometry_group_modpi why are you building MatrixGroup?

Because otherwise, this

G = ncube_isometry_group(n, orientation_preserving)
H = [h for h in G if all(i==j for (i,j) in h.nonzero_positions())]
left_cosets = set(tuple(sorted(h*g for h in H)) for g in G)

throws a TypeError: mutable matrices are unhashable and I find it more fun to read like this instead of the .set_immutable() on every matrices.

seblabbe commented 9 years ago

comment:45

You are building a lot of images to consider matrix x polyomino to use just one at the end. Why not

I know but I need all of them. That is what I understood yesterday. Let me try to explain. In general it is okay to take only one matrix in the coset. The problem comes when the polyomino is invariant under some of the 24 orientation preserving isometries of the cube. For example, consider:

sage: from sage.games.quantumino import pentaminos                                  
sage: from sage.combinat.tiling import ncube_isometry_group                         
sage: p = pentaminos[7]                                                             
sage: m = ncube_isometry_group(3)[-3]                                               
sage: p                                                                             
Polyomino: [(0, 0, 0), (0, 1, 0), (0, 2, 0), (0, 2, 1), (1, 0, 0)], Color: orange   
sage: m                                                                             
[ 0  0  1]                                                                          
[ 0 -1  0]                                                                          
[ 1  0  0]                                                                          
sage: (m*p).canonical() == p                                                        
True

The polyomino p has 12 distinct rotation images instead of 24. And among the 12 ways of placing that polyomino into a box, there are 3 distinct ways up to rotation of the box keeping the box invariant. To compute this, we need to consider the whole coset:

sage: cosets = ncube_isometry_group_modpi(3)
sage: P_coset = set(frozenset((m.matrix() * p).canonical() for m in coset) for coset in cosets)
sage: len(P_coset)                         
3   
sage: len(set(next(iter(s)) for s in P_coset))
3

Otherwise, you obtain too many polyominos (see below).

sage: set((coset[0].matrix() * p).canonical() for coset in cosets)                 
{Polyomino: [(0, 0, 1), (1, 0, 1), (1, 1, 1), (1, 2, 0), (1, 2, 1)], Color: orange,
 Polyomino: [(0, 1, 2), (1, 0, 0), (1, 1, 0), (1, 1, 1), (1, 1, 2)], Color: orange,
 Polyomino: [(0, 0, 0), (0, 1, 0), (1, 1, 0), (2, 1, 0), (2, 1, 1)], Color: orange,
 Polyomino: [(0, 1, 0), (0, 1, 1), (1, 1, 1), (2, 0, 1), (2, 1, 1)], Color: orange,
 Polyomino: [(0, 2, 0), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 2, 0)], Color: orange}

videlec commented 9 years ago

comment:46

Replying to @seblabbe:

Replying to @videlec:

Are you sure ncube_isometry_group is worth a @cached_function?

Calling this function takes 2.8s on my machine. And it is called once for each polyomino. That is 16 times for the Quantumino puzzle. With the cache, I gain about 40s to construct the rows to give to the dlx solver.

Perhaps the way it is implemented is wrong. For example

sage: %timeit P = [s.matrix() for s in SymmetricGroup(4)]
1 loops, best of 3: 1.27 ms per loop

and with the signs

sage: from itertools import product
sage: %timeit M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
1 loops, best of 3: 2.9 ms per loop
sage: M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
sage: %timeit S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
100 loops, best of 3: 18.3 ms per loop

So it is likely to be ~20ms.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

0d68ece Trac #18987: removed a cached_function

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 2133ecd to 0d68ece

seblabbe commented 9 years ago

comment:48

Replying to @videlec:

Are you sure ncube_isometry_group is worth a @cached_function?

In fact, you make me realize that there is already caching involved in WeylGroup. Therefore caching that function is not so necessary. Without caching, I get:

sage: from sage.combinat.tiling import ncube_isometry_group
sage: time L = ncube_isometry_group(4)             
CPU times: user 1.14 s, sys: 19.7 ms, total: 1.16 s
Wall time: 1.3 s                                   
sage: time L = ncube_isometry_group(4)             
CPU times: user 358 ms, sys: 4.01 ms, total: 362 ms
Wall time: 448 ms

sage: from itertools import product
sage: %timeit M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
1 loops, best of 3: 2.9 ms per loop
sage: M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
sage: %timeit S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
100 loops, best of 3: 18.3 ms per loop

So it is likely to be ~20ms.

Note that you can't use timeit above since there is caching involved in SymmetricGroup.

For comparison, your solution on my machine gives the following timings:

sage: from itertools import product   
sage: # first call                                 
sage: time M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)] 
CPU times: user 11.4 ms, sys: 92 µs, total: 11.5 ms                    
Wall time: 11.9 ms                                                     
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]    
CPU times: user 191 ms, sys: 25.2 ms, total: 216 ms                    
Wall time: 667 ms                                                      
sage: # second call
sage: time M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
CPU times: user 11.7 ms, sys: 1.34 ms, total: 13 ms                   
Wall time: 16.6 ms                                                    
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]   
CPU times: user 108 ms, sys: 1.92 ms, total: 110 ms                   
Wall time: 114 ms

So it seems using WeylGroup(['B',4]) is about 4 times slower than your solution. But I don't know if I will change it. I mean that improvement could be done directly in WeylGroup code...

videlec commented 9 years ago

comment:49

Replying to @seblabbe:

sage: from itertools import product
sage: %timeit M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
1 loops, best of 3: 2.9 ms per loop
sage: M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
sage: %timeit S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
100 loops, best of 3: 18.3 ms per loop

So it is likely to be ~20ms.

Note that you can't use timeit above since there is caching involved in SymmetricGroup.

Where?! Beyond the construction of the group (SymmetricGroup(4) is SymmetricGroup(4) gives True) nothing is cached.

seblabbe commented 9 years ago

comment:50

Where?! Beyond the construction of the group (SymmetricGroup(4) is SymmetricGroup(4) gives True) nothing is cached.

Don't you get that the first execution of

sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]

is slower than the second? And that the second takes about the same time than the third, the fourth, etc. ? Like me:

sage: from itertools import product                                
sage: M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]  
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
CPU times: user 184 ms, sys: 18 ms, total: 202 ms                  
Wall time: 710 ms                                                  
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
CPU times: user 109 ms, sys: 2.05 ms, total: 111 ms                
Wall time: 166 ms                                                  
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
CPU times: user 109 ms, sys: 2.6 ms, total: 111 ms                 
Wall time: 113 ms                                                  
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
CPU times: user 110 ms, sys: 2.12 ms, total: 112 ms                
Wall time: 169 ms                                                  
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
CPU times: user 108 ms, sys: 2.44 ms, total: 111 ms                
Wall time: 114 ms

videlec commented 9 years ago

comment:51

Yes for the timing. But this does not implies that something is cached. Actually, the reason is because gap is launched (I do not know why)

sage: S = SymmetricGroup(4)
sage: time gap(3)
CPU times: user 16 ms, sys: 12 ms, total: 28 ms
Wall time: 295 ms
3
sage: from itertools import product
sage: M = [diagonal_matrix(p) for p in product([1,-1], repeat=4)]
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
CPU times: user 56 ms, sys: 0 ns, total: 56 ms
Wall time: 56.9 ms
sage: time S = [m*s.matrix() for m in M for s in SymmetricGroup(4)]
CPU times: user 56 ms, sys: 0 ns, total: 56 ms
Wall time: 53 ms

seblabbe commented 9 years ago

comment:52

caching ncube_isometry_group gives:

sage: from sage.games.quantumino import QuantuminoSolver 
sage: q = QuantuminoSolver(0)                            
sage: t = q.tiling_solver()                              
sage: time rows = t.rows()                               
CPU times: user 18.8 s, sys: 168 ms, total: 19 s         
Wall time: 19 s

not caching ncube_isometry_group gives:

sage: from sage.games.quantumino import QuantuminoSolver  
sage: q = QuantuminoSolver(0)                             
sage: t = q.tiling_solver()                               
sage: time rows = t.rows()                                
CPU times: user 19.4 s, sys: 288 ms, total: 19.7 s        
Wall time: 20 s

So I confirm that I don't lose relatively much time by not caching.

seblabbe commented 9 years ago

comment:53

Replying to @videlec:

Perhaps the way it is implemented is wrong. For example
sage: %timeit P = [s.matrix() for s in SymmetricGroup(4)]
1 loops, best of 3: 1.27 ms per loop

I removed the cached_method. Do you want me to replace the code based on WeylGroup(['B',n]) by the SymmetricGroup + diagonal_matrix of signs code you propose? I think that this improvement should be done in WeylGroup in another ticket.

Also, for timing improvements in that file tiling.py, there is a another more important thing to spend time on because a lot of time is spent creating vectors from tuple (because coordinates are stored as tuple). This should be done in another ticket.

Needs review!

videlec commented 9 years ago

comment:54

Replying to @seblabbe:

Replying to @videlec:
Perhaps the way it is implemented is wrong. For example
sage: %timeit P = [s.matrix() for s in SymmetricGroup(4)]
1 loops, best of 3: 1.27 ms per loop
I removed the cached_method. Do you want me to replace the code based on WeylGroup(['B',n]) by the SymmetricGroup + diagonal_matrix of signs code you propose? I think that this improvement should be done in WeylGroup in another ticket.

Nope. I also the think that the WeylGroup code has to be improved.

Also, for timing improvements in that file tiling.py, there is a another more important thing to spend time on because a lot of time is spent creating vectors from tuple (because coordinates are stored as tuple). This should be done in another ticket.

Did you notice that

sage: t = map(vector, t)

is much slower than

sage: V = FreeModule(ZZ,12)
sage: t = map(V,t)

(I measure a factor x8 in __sub__ and __add__ for example)

Moreover, everything would be faster if you would store integer vectors instead of tuples (and an attribute self._free_module). Why aren't you doing that?

videlec commented 9 years ago

comment:55

I still do not understand why you are parallelizing the polyomino solver code and not the DLX one. Isn't your strategy exactly equivalent to this one: pick a column, for each row that does have a 1 in this column remove all the columns occuppied by the piece and launch an independent process?

seblabbe commented 9 years ago

comment:56

Moreover, everything would be faster if you would store integer vectors instead of tuples (and an attribute self._free_module). Why aren't you doing that?

Because I think I wanted to use the most basic hashable container (tuple) for the need. I realized only recently that it was slow because there are many matrix operations and additions involved. So indeed, storing vectors would be better.
Because it is not the purpose of this ticket.

seblabbe commented 9 years ago

comment:57

Replying to @videlec:

I still do not understand why you are parallelizing the polyomino solver code and not the DLX one. Isn't your strategy exactly equivalent to this one: pick a column, for each row that does have a 1 in this column remove all the columns occuppied by the piece and launch an independent process?

Okay, I will work on a new commit for that.

videlec commented 9 years ago

comment:58

Replying to @seblabbe:

Moreover, everything would be faster if you would store integer vectors instead of tuples (and an attribute self._free_module). Why aren't you doing that?

Because I think I wanted to use the most basic hashable container (tuple) for the need. I realized only recently that it was slow because there are many matrix operations and additions involved. So indeed, storing vectors would be better.

Because it is not the purpose of this ticket.

I opened #19036

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

df65587 Trac #18987: Moved the paralel computation of nb of sol in dancing links module

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 0d68ece to df65587

seblabbe commented 9 years ago

comment:60

(I will not have access to a machine with Sage for the next 7 days).

seblabbe commented 9 years ago

comment:61

Ok, Vincent. I am now back and available to do the follow up on this ticket.

With the new proposal you made, this ticket now consist of two independant things. Do you prefer me to split the code into two tickets (easier to review) or is it ok like this?

Sébastien

seblabbe commented 9 years ago

comment:62

I finally decided to split this ticket into two to ease the review. The second part is now available at #19107.

seblabbe commented 9 years ago

Description changed:

--- 
+++ 
@@ -2,7 +2,7 @@

sage: from sage.games.quantumino import QuantuminoSolver -sage: QuantuminoSolver(0).number_of_solutions() # long time (about 30 days) +sage: QuantuminoSolver(0).number_of_solutions() # long time (several days)


 but we can make it faster by doing the computation in parallel... This ticket does this.

seblabbe commented 9 years ago

Description changed:

--- 
+++ 
@@ -5,6 +5,4 @@
 sage: QuantuminoSolver(0).number_of_solutions()  # long time (several days)

-but we can make it faster by doing the computation in parallel... This ticket does this.

-It also simplify the computation by avoiding to compute 4 times the same solution (obtained by rotation of angle pi of the 5x8x2 box). +but we can make it faster by doing the computation in parallel... This ticket does this (directly in the dancing links code).

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from df65587 to c016126

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`8db8f72`	`Trac #18987: Number of solutions method for dancing links`
`9d25f7c`	`Trac #18987: doc + optim in dancing_links`
`c016126`	`Trac #18987: parallel computations in dancing links`

videlec commented 9 years ago

comment:65

(solving) independant (cases) -> independent
I do not understand this comment

If possible, a good choice of column gives a partition
of solutions where each part has about the same number
of solutions.

could you make a complete sentence for the OUTPUT section instead of dict, row number -> list of rows
this is not quite accurate

After the split each subproblem has the same number of columns and
rows and the same solutions as above::

it is the union of solutions of the subproblems that form the solution of the initial one.

I do not see the point of having number_of_solutions_iterator and number_of_solutions. You should just get rid of the first one and allow parallelization in the second.
Why not also make available parallelization for getting the list of solutions?

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

29ff986 Trac #18987: Fixed reviewer comment 65

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from c016126 to 29ff986

seblabbe commented 9 years ago

comment:67

Thanks for your review! I fixed the first comments.

I do not see the point of having number_of_solutions_iterator and number_of_solutions. You should just get rid of the first one and allow parallelization in the second.

number_of_solutions_iterator is now _number_of_solutions_iterator
number_of_solutions now allows parallel computation
I kept _number_of_solutions_iterator because for the problem I am currently looking at, number_of_solutions takes days while _number_of_solutions_iterator yield something once every hours. So it allows me to follow the computation, making sure it is not stuck and evaluate the duration left to do. I prefer this way rather than removing method _number_of_solutions_iterator and adding some verbose thing in number_of_solutions.

Why not also make available parallelization for getting the list of solutions?

Because I don't know how or if it is possible to use the parallel decorator to merge iterators.

videlec commented 9 years ago

comment:68

Salut Sebastien,

Thanks for your patience. I am sure that the choice of splitting is suboptimal but at least the design is much cleaner than in the first version.

Vincent

vbraun commented 9 years ago

Changed branch from public/18987 to 29ff986

sagemath / sage

Parallel computation of number of solutions in dancing links #18987

-but we can make it faster by doing the computation in parallel... This ticket does this.