sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.33k stars 453 forks source link

Parallel map reduce on SearchForest #13580

Closed hivert closed 8 years ago

hivert commented 11 years ago

Implement a map reduce algorithm in parallel on large sets described by a SearchForest. We use a work-stealing algorithm (see https://en.wikipedia.org/wiki/Work_stealing) based on Python's multiprocessing.

CC: @sagetrac-sage-combinat @nathanncohen @jdemeyer @seblabbe

Component: combinatorics

Keywords: map-reduce, days57, days77

Author: Florent Hivert, Jean-Baptiste Priez, Nathann Cohen

Branch: 134c1fa

Reviewer: Sébastien Labbé, Jean-Baptiste Priez

Issue created by migration from https://trac.sagemath.org/ticket/13580

seblabbe commented 11 years ago
comment:1

Salut Florent et Nathann,

I am starting to think/work on #6637... I have some questions. Mainly, I would like to know what is this ticket, because the above one liner in the description does not say much...

Also, Florent wrote on sage-devel in October 2012 that

   I'm also in the process of finalizing a patch which do parallel and even
   distributed map-reduce on recursively enumerated sets (currently badly named
   SearchForest, I'll change the name in my patch, ticket #13580, patch on
   Sage-Combinat queue [1]).
    trac_13580-map_reduce-fh.patch #+experimental
    map_reduce_improved_loop-fh.patch #+experimental
    map_reduce_condition-fh.patch #+experimental
    trac_13580-map_reduce-old-fh.patch 
hivert commented 10 years ago

Branch: u/hivert/13580/map_reduce

hivert commented 10 years ago

New commits:

693c672Imported code from trac_13580-map_reduce-fh.patch + fixed multiline doctests.
hivert commented 10 years ago

Commit: 693c672

hivert commented 10 years ago

Changed keywords from map-reduce to map-reduce, days57

nthiery commented 9 years ago

Changed branch from u/hivert/13580/map_reduce to u/nthiery/13580/map_reduce

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

addb17a13580: Trivial rest fix
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 693c672 to addb17a

hivert commented 9 years ago

Changed branch from u/nthiery/13580/map_reduce to u/hivert/13580/map_reduce

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

92e6e68Merge branch 't/13580/map_reduce' into 13580
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from addb17a to 92e6e68

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

734c748#13580 Fixed test after merging #6637
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 92e6e68 to 734c748

seblabbe commented 9 years ago
comment:13

I saw the following typo in map reduce file while looking at the previous commit: "As an example, ee compute"

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

f234f60#13580 : improved documentation
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 734c748 to f234f60

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from f234f60 to 7a33037

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

171be75Merge branch 'master' into t/13580/13580/map_reduce
bfdf33eFixed naming convention
7a33037Work in progress on the DOC
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 7a33037 to 168ceae

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

168ceaeAdded architecture picture for map_reduce
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

366fc17More Doc
6c12752Tested timeout option
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 168ceae to 6c12752

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

493bb52Done the doc of Map/Reduce
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 6c12752 to 493bb52

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

2534f12Moved map/reduce to sage/parallel
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 493bb52 to 2534f12

hivert commented 8 years ago

Description changed:

--- 
+++ 
@@ -1,2 +1,2 @@
-Implement a map reduce algorithm in parallel on large sets described by a `SearchForest`.
+Implement a map reduce algorithm in parallel on large sets described by a `SearchForest`. We use a work-stealing algorithm (see https://en.wikipedia.org/wiki/Work_stealing) based on Python's multiprocessing. 
seblabbe commented 8 years ago
comment:21

The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.

hivert commented 8 years ago
comment:22

Replying to @seblabbe:

The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.

Yes ! there is a trivial conflict. Thank you for pointing it. I'm fixing it, testing and re-uploading.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 2534f12 to e5b7477

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

e5b7477Merge 6.10.rc1 + fixed conflict
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from e5b7477 to 82fd1e4

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

82fd1e4Fixed links according to deprecation/rebase
hivert commented 8 years ago
comment:25

Replying to @hivert:

Replying to @seblabbe:

The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.

Yes ! there is a trivial conflict. Thank you for pointing it. I'm fixing it, testing and re-uploading.

Done ! I was actually based on 6.9.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 82fd1e4 to 68b6530

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

68b6530Removed TODO in doc
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

b93ba7dFixed a link + doctest
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 68b6530 to b93ba7d

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from b93ba7d to 5c7720d

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

5c7720dFixed doctest continuations and raise statements
hivert commented 8 years ago
comment:30
Replying to @sagetrac-git:
5c7720d Fixed doctest continuations and raise statements

Patchbot was complaining about old style multiline doctests and exception raises. Fixed and reuped. Should be Ok now.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

cb9c011Fixed another multiline doctest
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 5c7720d to cb9c011

seblabbe commented 8 years ago
comment:32

I just looked at the code in the browser. Looks good. I will do real code review later this week hopefully. Quickly, I saw two typos:

First line of /src/sage/parallel/map_reduce.py : Paralell

Later in the same file I think: parameters tu the

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from cb9c011 to 14b086b

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

14b086bFixed Sebastien's Typos
hivert commented 8 years ago
comment:34

Replying to @seblabbe:

I just looked at the code in the browser. Looks good. I will do real code review later this week hopefully. Quickly, I saw two typos:

Fixed ! I'm afraid you'll find more ! By the way, removed distributed from the title since I don't do distributed computation anymore. I had a prototype which allowed to launch those computation on a cluster of machines, but it induces a huge performance loss do I dropped It. maybe we will work on this with Jeroen...

seblabbe commented 8 years ago
comment:35

Dear Florent, I know that you like multiline doctests, but for two reasons I prefer to avoid them as much as possible in doctests:

  1. It is not fun for the user who wants to adapt an example after a copy paste in the console. The up arrow brings the multiline all at once and it is not fun to change the value 10 to a value 25 let say.
  2. Using variables to store the input before calling the method gives lot of information easily to the user: what are the argument names, in what order should they be used.

I know I am asking you to change your style by asking you this, but don't you agree with me or can you provide some counter-arguments to mine? I give an example below:

diff --git a/src/sage/combinat/backtrack.py b/src/sage/combinat/backtrack.py
index 9dee057..ac1243a 100644
--- a/src/sage/combinat/backtrack.py
+++ b/src/sage/combinat/backtrack.py
@@ -695,17 +695,19 @@ class SearchForest(Parent):
                    reduce_function = None,
                    reduce_init = None):
         r"""
-        Apply en Map Reduce algorithm on ``self``
+        Apply a Map Reduce algorithm on ``self``

         EXAMPLES::

-            sage: F = RecursivelyEnumeratedSet( [([i],i, i) for i in range(1,10)],
-            ....:     lambda (list, sum, last):
-            ....:         [(list + [i], sum + i, i) for i in range(1,last)],
-            ....:     structure='forest', enumeration='depth')
+            sage: seeds = [([i],i, i) for i in range(1,10)]
+            sage: succ = lambda (list, sum, last): 
+            ....:               [(list + [i], sum + i, i) for i in range(1,last)]
+            sage: F = RecursivelyEnumeratedSet(seeds, succ, 
+            ....:                       structure='forest', enumeration='depth')
             sage: y = var('y')
-            sage: F.map_reduce(
-            ....:     lambda (li, sum, _): y**sum, lambda x,y: x + y,  0 )
+            sage: map_function = lambda (li, sum, _): y**sum
+            sage: reduce_function = lambda x,y: x + y
+            sage: F.map_reduce(map_function, reduce_function, 0)
             y^45 + y^44 + y^43 + 2*y^42 + 2*y^41 + 3*y^40 + 4*y^39 + 5*y^38 + 6*y^37 + 8*y^36 + 9*y^35 

         .. SEEALSO:: :mod:`sage.parallel.map_reduce`

I noticed another typo on the same line as the one parameters tu the : On -> One

hivert commented 8 years ago
comment:36

Replying to @seblabbe:

Dear Florent, I know that you like multiline doctests, but for two reasons I prefer to avoid them as much as possible in doctests:

  1. It is not fun for the user who wants to adapt an example after a copy paste in the console. The up arrow brings the multiline all at once and it is not fun to change the value 10 to a value 25 let say.

I'm tempted to answer that I don't have this problem with emacs ;-). Of course, for the people using a two letters editor...

  1. Using variables to store the input before calling the method gives lot of information easily to the user: what are the argument names, in what order should they be used.

I'm much more convinced by this argument. I'm watching an exam tomorrow evening. I'll change my code accordingly.

hivert commented 8 years ago
comment:37

Replying to @hivert:

I'm much more convinced by this argument. I'm watching an exam tomorrow evening. I'll change my code accordingly.

Actually, I'm not that much convinced. It's not clear for me that

      sage: map_function = lambda x: 1
      sage: reduce_function = lambda x,y: x+y
      sage: reduce_init = 0
      sage: S.map_reduce(map_function, reduce_function, reduce_init)
      131071

is much better than

      sage: S.map_reduce(
      ....:   map_function = lambda x: 1,
      ....:   reduce_function = lambda x,y: x+y,
      ....:   reduce_init = 0 )

In this second version which is the style I adopted in map_reduce.py, we don't repeat the name twice, which is better in my opinion.

A third possibility is

      sage: S.map_reduce(lambda x: 1, lambda x,y: x+y, 0)

It is the one used through the whole backtrack.py files. Unless for very short example, I don't find it very readable. But since it's not adding new code but changing old one, which needs even more cleanup and doc improvement, I'd rather keep it for another ticket. More precisly, I'd rather do that during #16351.

What do you think ?