Parallel map reduce on SearchForest

hivert commented 11 years ago

Implement a map reduce algorithm in parallel on large sets described by a SearchForest. We use a work-stealing algorithm (see https://en.wikipedia.org/wiki/Work_stealing) based on Python's multiprocessing.

CC: @sagetrac-sage-combinat @nathanncohen @jdemeyer @seblabbe

Component: combinatorics

Keywords: map-reduce, days57, days77

Author: Florent Hivert, Jean-Baptiste Priez, Nathann Cohen

Branch: 134c1fa

Reviewer: Sébastien Labbé, Jean-Baptiste Priez

Issue created by migration from https://trac.sagemath.org/ticket/13580

seblabbe commented 11 years ago

comment:1

Salut Florent et Nathann,

I am starting to think/work on #6637... I have some questions. Mainly, I would like to know what is this ticket, because the above one liner in the description does not say much...

Will SearchForest survive or not?
Does it replace SearchForest?
Does it improve SearchForest?
Does it use SearchForest?
Or is it used by SearchForest?

Also, Florent wrote on sage-devel in October 2012 that

   I'm also in the process of finalizing a patch which do parallel and even
   distributed map-reduce on recursively enumerated sets (currently badly named
   SearchForest, I'll change the name in my patch, ticket #13580, patch on
   Sage-Combinat queue [1]).

I do not see in the cited patch that the name of SearchForest is changed.
What would be a better name for SearchForest ?
What patch is the more recent? the "old" one or the "experimental" one?

    trac_13580-map_reduce-fh.patch #+experimental
    map_reduce_improved_loop-fh.patch #+experimental
    map_reduce_condition-fh.patch #+experimental
    trac_13580-map_reduce-old-fh.patch

hivert commented 10 years ago

Branch: u/hivert/13580/map_reduce

hivert commented 10 years ago

New commits:

693c672 Imported code from trac_13580-map_reduce-fh.patch + fixed multiline doctests.

hivert commented 10 years ago

Commit: 693c672

hivert commented 10 years ago

Changed keywords from map-reduce to map-reduce, days57

nthiery commented 9 years ago

Changed branch from u/hivert/13580/map_reduce to u/nthiery/13580/map_reduce

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

addb17a 13580: Trivial rest fix

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 693c672 to addb17a

hivert commented 9 years ago

Changed branch from u/nthiery/13580/map_reduce to u/hivert/13580/map_reduce

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

92e6e68 Merge branch 't/13580/map_reduce' into 13580

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from addb17a to 92e6e68

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

734c748 #13580 Fixed test after merging #6637

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 92e6e68 to 734c748

seblabbe commented 9 years ago

comment:13

I saw the following typo in map reduce file while looking at the previous commit: "As an example, ee compute"

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

f234f60 #13580 : improved documentation

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 734c748 to f234f60

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from f234f60 to 7a33037

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`171be75`	`Merge branch 'master' into t/13580/13580/map_reduce`
`bfdf33e`	`Fixed naming convention`
`7a33037`	`Work in progress on the DOC`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 7a33037 to 168ceae

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

168ceae Added architecture picture for map_reduce

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`366fc17`	`More Doc`
`6c12752`	`Tested timeout option`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 168ceae to 6c12752

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

493bb52 Done the doc of Map/Reduce

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 6c12752 to 493bb52

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

2534f12 Moved map/reduce to sage/parallel

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 493bb52 to 2534f12

hivert commented 8 years ago

Description changed:

--- 
+++ 
@@ -1,2 +1,2 @@
-Implement a map reduce algorithm in parallel on large sets described by a `SearchForest`.
+Implement a map reduce algorithm in parallel on large sets described by a `SearchForest`. We use a work-stealing algorithm (see https://en.wikipedia.org/wiki/Work_stealing) based on Python's multiprocessing.

seblabbe commented 8 years ago

comment:21

The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.

hivert commented 8 years ago

comment:22

Replying to @seblabbe:

The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.

Yes ! there is a trivial conflict. Thank you for pointing it. I'm fixing it, testing and re-uploading.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 2534f12 to e5b7477

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

e5b7477 Merge 6.10.rc1 + fixed conflict

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from e5b7477 to 82fd1e4

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

82fd1e4 Fixed links according to deprecation/rebase

hivert commented 8 years ago

comment:25

Replying to @hivert:

Replying to @seblabbe:

The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.

Yes ! there is a trivial conflict. Thank you for pointing it. I'm fixing it, testing and re-uploading.

Done ! I was actually based on 6.9.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 82fd1e4 to 68b6530

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

68b6530 Removed TODO in doc

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

b93ba7d Fixed a link + doctest

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 68b6530 to b93ba7d

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from b93ba7d to 5c7720d

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

5c7720d Fixed doctest continuations and raise statements

hivert commented 8 years ago

comment:30

Replying to @sagetrac-git:
5c7720d	`Fixed doctest continuations and raise statements`

Patchbot was complaining about old style multiline doctests and exception raises. Fixed and reuped. Should be Ok now.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

cb9c011 Fixed another multiline doctest

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from 5c7720d to cb9c011

seblabbe commented 8 years ago

comment:32

I just looked at the code in the browser. Looks good. I will do real code review later this week hopefully. Quickly, I saw two typos:

First line of /src/sage/parallel/map_reduce.py : Paralell

Later in the same file I think: parameters tu the

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Changed commit from cb9c011 to 14b086b

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 8 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

14b086b Fixed Sebastien's Typos

hivert commented 8 years ago

comment:34

Replying to @seblabbe:

I just looked at the code in the browser. Looks good. I will do real code review later this week hopefully. Quickly, I saw two typos:

Fixed ! I'm afraid you'll find more ! By the way, removed distributed from the title since I don't do distributed computation anymore. I had a prototype which allowed to launch those computation on a cluster of machines, but it induces a huge performance loss do I dropped It. maybe we will work on this with Jeroen...

seblabbe commented 8 years ago

comment:35

Dear Florent, I know that you like multiline doctests, but for two reasons I prefer to avoid them as much as possible in doctests:

It is not fun for the user who wants to adapt an example after a copy paste in the console. The up arrow brings the multiline all at once and it is not fun to change the value 10 to a value 25 let say.
Using variables to store the input before calling the method gives lot of information easily to the user: what are the argument names, in what order should they be used.

I know I am asking you to change your style by asking you this, but don't you agree with me or can you provide some counter-arguments to mine? I give an example below:

diff --git a/src/sage/combinat/backtrack.py b/src/sage/combinat/backtrack.py
index 9dee057..ac1243a 100644
--- a/src/sage/combinat/backtrack.py
+++ b/src/sage/combinat/backtrack.py
@@ -695,17 +695,19 @@ class SearchForest(Parent):
                    reduce_function = None,
                    reduce_init = None):
         r"""
-        Apply en Map Reduce algorithm on ``self``
+        Apply a Map Reduce algorithm on ``self``

         EXAMPLES::

-            sage: F = RecursivelyEnumeratedSet( [([i],i, i) for i in range(1,10)],
-            ....:     lambda (list, sum, last):
-            ....:         [(list + [i], sum + i, i) for i in range(1,last)],
-            ....:     structure='forest', enumeration='depth')
+            sage: seeds = [([i],i, i) for i in range(1,10)]
+            sage: succ = lambda (list, sum, last): 
+            ....:               [(list + [i], sum + i, i) for i in range(1,last)]
+            sage: F = RecursivelyEnumeratedSet(seeds, succ, 
+            ....:                       structure='forest', enumeration='depth')
             sage: y = var('y')
-            sage: F.map_reduce(
-            ....:     lambda (li, sum, _): y**sum, lambda x,y: x + y,  0 )
+            sage: map_function = lambda (li, sum, _): y**sum
+            sage: reduce_function = lambda x,y: x + y
+            sage: F.map_reduce(map_function, reduce_function, 0)
             y^45 + y^44 + y^43 + 2*y^42 + 2*y^41 + 3*y^40 + 4*y^39 + 5*y^38 + 6*y^37 + 8*y^36 + 9*y^35 

         .. SEEALSO:: :mod:`sage.parallel.map_reduce`

I noticed another typo on the same line as the one parameters tu the : On -> One

hivert commented 8 years ago

comment:36

Replying to @seblabbe:

Dear Florent, I know that you like multiline doctests, but for two reasons I prefer to avoid them as much as possible in doctests:

It is not fun for the user who wants to adapt an example after a copy paste in the console. The up arrow brings the multiline all at once and it is not fun to change the value 10 to a value 25 let say.

I'm tempted to answer that I don't have this problem with emacs ;-). Of course, for the people using a two letters editor...

Using variables to store the input before calling the method gives lot of information easily to the user: what are the argument names, in what order should they be used.

I'm much more convinced by this argument. I'm watching an exam tomorrow evening. I'll change my code accordingly.

hivert commented 8 years ago

comment:37

Replying to @hivert:

I'm much more convinced by this argument. I'm watching an exam tomorrow evening. I'll change my code accordingly.

Actually, I'm not that much convinced. It's not clear for me that

      sage: map_function = lambda x: 1
      sage: reduce_function = lambda x,y: x+y
      sage: reduce_init = 0
      sage: S.map_reduce(map_function, reduce_function, reduce_init)
      131071

is much better than

      sage: S.map_reduce(
      ....:   map_function = lambda x: 1,
      ....:   reduce_function = lambda x,y: x+y,
      ....:   reduce_init = 0 )

In this second version which is the style I adopted in map_reduce.py, we don't repeat the name twice, which is better in my opinion.

A third possibility is

      sage: S.map_reduce(lambda x: 1, lambda x,y: x+y, 0)

It is the one used through the whole backtrack.py files. Unless for very short example, I don't find it very readable. But since it's not adding new code but changing old one, which needs even more cleanup and doc improvement, I'd rather keep it for another ticket. More precisly, I'd rather do that during #16351.

What do you think ?

sagemath / sage

Parallel map reduce on SearchForest #13580