Closed hivert closed 8 years ago
Salut Florent et Nathann,
I am starting to think/work on #6637... I have some questions. Mainly, I would like to know what is this ticket, because the above one liner in the description does not say much...
Also, Florent wrote on sage-devel in October 2012 that
I'm also in the process of finalizing a patch which do parallel and even
distributed map-reduce on recursively enumerated sets (currently badly named
SearchForest, I'll change the name in my patch, ticket #13580, patch on
Sage-Combinat queue [1]).
trac_13580-map_reduce-fh.patch #+experimental
map_reduce_improved_loop-fh.patch #+experimental
map_reduce_condition-fh.patch #+experimental
trac_13580-map_reduce-old-fh.patch
Branch: u/hivert/13580/map_reduce
New commits:
693c672 | Imported code from trac_13580-map_reduce-fh.patch + fixed multiline doctests. |
Changed keywords from map-reduce to map-reduce, days57
Changed branch from u/hivert/13580/map_reduce to u/nthiery/13580/map_reduce
Branch pushed to git repo; I updated commit sha1. New commits:
addb17a | 13580: Trivial rest fix |
Changed branch from u/nthiery/13580/map_reduce to u/hivert/13580/map_reduce
Branch pushed to git repo; I updated commit sha1. New commits:
92e6e68 | Merge branch 't/13580/map_reduce' into 13580 |
Branch pushed to git repo; I updated commit sha1. New commits:
734c748 | #13580 Fixed test after merging #6637 |
I saw the following typo in map reduce file while looking at the previous commit: "As an example, ee compute"
Branch pushed to git repo; I updated commit sha1. New commits:
f234f60 | #13580 : improved documentation |
Branch pushed to git repo; I updated commit sha1. New commits:
168ceae | Added architecture picture for map_reduce |
Branch pushed to git repo; I updated commit sha1. New commits:
493bb52 | Done the doc of Map/Reduce |
Branch pushed to git repo; I updated commit sha1. New commits:
2534f12 | Moved map/reduce to sage/parallel |
Description changed:
---
+++
@@ -1,2 +1,2 @@
-Implement a map reduce algorithm in parallel on large sets described by a `SearchForest`.
+Implement a map reduce algorithm in parallel on large sets described by a `SearchForest`. We use a work-stealing algorithm (see https://en.wikipedia.org/wiki/Work_stealing) based on Python's multiprocessing.
The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.
Replying to @seblabbe:
The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.
Yes ! there is a trivial conflict. Thank you for pointing it. I'm fixing it, testing and re-uploading.
Branch pushed to git repo; I updated commit sha1. New commits:
e5b7477 | Merge 6.10.rc1 + fixed conflict |
Branch pushed to git repo; I updated commit sha1. New commits:
82fd1e4 | Fixed links according to deprecation/rebase |
Replying to @hivert:
Replying to @seblabbe:
The branch currently have conflicts with development version of Sage (because the link is red in the description of the ticket above). The branch seems to be based on 6.2.beta6 which is old and may explain the presence of a conflict.
Yes ! there is a trivial conflict. Thank you for pointing it. I'm fixing it, testing and re-uploading.
Done ! I was actually based on 6.9.
Branch pushed to git repo; I updated commit sha1. New commits:
68b6530 | Removed TODO in doc |
Branch pushed to git repo; I updated commit sha1. New commits:
b93ba7d | Fixed a link + doctest |
Branch pushed to git repo; I updated commit sha1. New commits:
5c7720d | Fixed doctest continuations and raise statements |
Replying to @sagetrac-git: | ||
---|---|---|
5c7720d | Fixed doctest continuations and raise statements |
Patchbot was complaining about old style multiline doctests and exception raises. Fixed and reuped. Should be Ok now.
Branch pushed to git repo; I updated commit sha1. New commits:
cb9c011 | Fixed another multiline doctest |
I just looked at the code in the browser. Looks good. I will do real code review later this week hopefully. Quickly, I saw two typos:
First line of /src/sage/parallel/map_reduce.py
: Paralell
Later in the same file I think: parameters tu the
Branch pushed to git repo; I updated commit sha1. New commits:
14b086b | Fixed Sebastien's Typos |
Replying to @seblabbe:
I just looked at the code in the browser. Looks good. I will do real code review later this week hopefully. Quickly, I saw two typos:
Fixed ! I'm afraid you'll find more ! By the way, removed distributed from the title since I don't do distributed computation anymore. I had a prototype which allowed to launch those computation on a cluster of machines, but it induces a huge performance loss do I dropped It. maybe we will work on this with Jeroen...
Dear Florent, I know that you like multiline doctests, but for two reasons I prefer to avoid them as much as possible in doctests:
I know I am asking you to change your style by asking you this, but don't you agree with me or can you provide some counter-arguments to mine? I give an example below:
diff --git a/src/sage/combinat/backtrack.py b/src/sage/combinat/backtrack.py
index 9dee057..ac1243a 100644
--- a/src/sage/combinat/backtrack.py
+++ b/src/sage/combinat/backtrack.py
@@ -695,17 +695,19 @@ class SearchForest(Parent):
reduce_function = None,
reduce_init = None):
r"""
- Apply en Map Reduce algorithm on ``self``
+ Apply a Map Reduce algorithm on ``self``
EXAMPLES::
- sage: F = RecursivelyEnumeratedSet( [([i],i, i) for i in range(1,10)],
- ....: lambda (list, sum, last):
- ....: [(list + [i], sum + i, i) for i in range(1,last)],
- ....: structure='forest', enumeration='depth')
+ sage: seeds = [([i],i, i) for i in range(1,10)]
+ sage: succ = lambda (list, sum, last):
+ ....: [(list + [i], sum + i, i) for i in range(1,last)]
+ sage: F = RecursivelyEnumeratedSet(seeds, succ,
+ ....: structure='forest', enumeration='depth')
sage: y = var('y')
- sage: F.map_reduce(
- ....: lambda (li, sum, _): y**sum, lambda x,y: x + y, 0 )
+ sage: map_function = lambda (li, sum, _): y**sum
+ sage: reduce_function = lambda x,y: x + y
+ sage: F.map_reduce(map_function, reduce_function, 0)
y^45 + y^44 + y^43 + 2*y^42 + 2*y^41 + 3*y^40 + 4*y^39 + 5*y^38 + 6*y^37 + 8*y^36 + 9*y^35
.. SEEALSO:: :mod:`sage.parallel.map_reduce`
I noticed another typo on the same line as the one parameters tu the
: On
-> One
Replying to @seblabbe:
Dear Florent, I know that you like multiline doctests, but for two reasons I prefer to avoid them as much as possible in doctests:
- It is not fun for the user who wants to adapt an example after a copy paste in the console. The up arrow brings the multiline all at once and it is not fun to change the value 10 to a value 25 let say.
I'm tempted to answer that I don't have this problem with emacs ;-). Of course, for the people using a two letters editor...
- Using variables to store the input before calling the method gives lot of information easily to the user: what are the argument names, in what order should they be used.
I'm much more convinced by this argument. I'm watching an exam tomorrow evening. I'll change my code accordingly.
Replying to @hivert:
I'm much more convinced by this argument. I'm watching an exam tomorrow evening. I'll change my code accordingly.
Actually, I'm not that much convinced. It's not clear for me that
sage: map_function = lambda x: 1
sage: reduce_function = lambda x,y: x+y
sage: reduce_init = 0
sage: S.map_reduce(map_function, reduce_function, reduce_init)
131071
is much better than
sage: S.map_reduce(
....: map_function = lambda x: 1,
....: reduce_function = lambda x,y: x+y,
....: reduce_init = 0 )
In this second version which is the style I adopted in map_reduce.py
, we don't repeat the name twice, which is better in my opinion.
A third possibility is
sage: S.map_reduce(lambda x: 1, lambda x,y: x+y, 0)
It is the one used through the whole backtrack.py
files. Unless for very short example, I don't find it very readable. But since it's not adding new code but changing old one, which needs even more cleanup and doc improvement, I'd rather keep it for another ticket. More precisly, I'd rather do that during #16351.
What do you think ?
Implement a map reduce algorithm in parallel on large sets described by a
SearchForest
. We use a work-stealing algorithm (see https://en.wikipedia.org/wiki/Work_stealing) based on Python's multiprocessing.CC: @sagetrac-sage-combinat @nathanncohen @jdemeyer @seblabbe
Component: combinatorics
Keywords: map-reduce, days57, days77
Author: Florent Hivert, Jean-Baptiste Priez, Nathann Cohen
Branch:
134c1fa
Reviewer: Sébastien Labbé, Jean-Baptiste Priez
Issue created by migration from https://trac.sagemath.org/ticket/13580