Closed nbruin closed 9 years ago
Author: Travis Scrimshaw
In order to get the behavior to match, I had to call sorted(set(S))
on the base set S
when using itertools
. On using the output from GAP, I had to do map(tuple, eval(ans))
because that returned lists (or sometimes strings). I've given the choice to the user to decide on which algorithm. Here are some timings (with GAP already running):
sage: %timeit unordered_tuples([1,2,3],2,algorithm='itertools')
10000 loops, best of 3: 31.9 µs per loop
sage: %timeit unordered_tuples([1,2,3],2,algorithm='gap')
1000 loops, best of 3: 1.65 ms per loop
sage: %timeit eval(gap.eval( "UnorderedTuples(%s,%s)"%([1,2,3],ZZ(2)) ))
1000 loops, best of 3: 1.65 ms per loop
sage: %timeit unordered_tuples([1,2],6)
10000 loops, best of 3: 31.4 µs per loop
sage: %timeit unordered_tuples([1,2],6,algorithm='gap')
100 loops, best of 3: 2.33 ms per loop
sage: %timeit eval(gap.eval( "UnorderedTuples(%s,%s)"%([1,2],ZZ(6)) ))
100 loops, best of 3: 2.06 ms per loop
sage: %timeit unordered_tuples(range(30), 6)
1 loops, best of 3: 590 ms per loop
sage: %timeit eval(gap.eval( "UnorderedTuples(%s,%s)"%(range(30),ZZ(6)) ))
# I got bored after a minute and killed it
So even with everything, there is a major speedup. Now once this gets replaced with libgap (#16719), we'll have to rerun timings.
New commits:
373e585 | Implemented unordered_tuples using itertools. |
Please justify the removal of the docstring with result ['aa', 'ab', 'ac', 'bb', 'bc', 'cc']
.
It's an unexpected format, in particular, the results aren't tuples. One can argue from the POV of ducktyping, it's okay, but I feel the inconsistency and the fact that it would fail a more explicit isinstance
check the behavior should be standardized for all inputs. (Side note, I feel that should be a needs_info
.)
Hello,
Why not a deprecation? It is nowhere used in Sage code and it would be good to advertise users that it is infinitely better to use itertools directly.
Vincent
Because using the itertools
directly for a multiset gives multiple copies of the same tuple
sage: list(itertools.combinations_with_replacement([1,1], 2))
[(1, 1), (1, 1), (1, 1)]
which is a different behavior (this should just return a single tuple of [(1, 1)]
).
Needs a merge/rebase.
Nathann
Branch pushed to git repo; I updated commit sha1. New commits:
ca8ec66 | Merge branch 'public/combinat/rewrite_unordered_tuples-13982' of trac.sagemath.org:sage into public/combinat/rewrite_unordered_tuples-13982 |
Done.
What about tuples
vs itertools.product
sage: S = [1,2]
sage: %timeit tuples(S,3)
10000 loops, best of 3: 26.9 µs per loop
sage: %timeit list(product(S,repeat=3))
100000 loops, best of 3: 1.74 µs per loop
And the function number_of_tuples
should not wrap anything, the answer is just len(S)^k
.
Branch pushed to git repo; I updated commit sha1. New commits:
3c0c50a | Implemented faster versions of tuples. |
I've added that as well. Note the output order has changed (and the fact that it now returns honest tuples).
Hello,
Much cleaner now! What do you think of deprecations (not necessarily in this ticket)? Do we really want to keep the interface to GAP for those two iterators?
Vincent
I think it is good to have access to multiple implementations since onemight be faster in certain situations and it makes for easier testing. At the very least, it doesn't do any harm being there IMO. (There's also a president for doing this with other such functions.)
Replying to @tscrim:
I think it is good to have access to multiple implementations since onemight be faster in certain situations and it makes for easier testing.
In the present case itertools
is always faster and I think people should use itertools directly.
At the very least, it doesn't do any harm being there IMO. (There's also a president for doing this with other such functions.)
Yes it does. This is one more function in the global namespace, one more file to maintain.
Yes it does. This is one more function in the global namespace, one more file to maintain.
Plus let us only keep the code that we think worthy of being used.
Nathann
It's not adding anything more to the global namespace. Plus it is a separate (possibly different) algorithm, and you have to prove that in every case that the itertools implementation is faster.
It's not adding anything more to the global namespace. Plus it is a separate (possibly different) algorithm, and you have to prove that in every case that the itertools implementation is faster.
This seems to be a very bad political idea. It is very likely that even though nobody ever uses this implementation because it is obviously slower than itertools, nobody will ever be able to give a formal proof that it is slower in all cases.
Nathann
Replying to @tscrim:
It's not adding anything more to the global namespace.
All of tuples
, number_of_tuples
, unordered_tuples
and number_of_unordered_tuples
are in the global namespace. I do not claim that your branch adds something to the namespace but that it was already useless noise.
@nathanncohen Even if you could show it is always slower (large sets can behave very differently, so it's not obvious to me), it makes it better testing because they are independent algorithms (although it is highly unlikely one of the algorithms has a bug).
@videlec Sorry, I interpreted the "this" as my branch.
The "native" implementation of tuple
is recursive, and with the current implementation it passes the string 'native'
to itself a zillion times. Am I seeing ghosts or could this be an inefficiency? (I would implement the native algorithm, if anywhere, then in a separate method tuples_native
, which would recurse upon itself, and which I would then reference one time from tuples
.)
On number_of_tuples
, is there really any chance for a gap call to beat return ZZ( len(set(S)) )**k
? I imagine the set
call could take a while if S
consists of complex things, but then again the doc of unordered_tuples
warns against passing complex things to gap, which I assume should also apply to number_of_tuples
. But I am too lazy to do timings...
Sorry for the badly named merge commit (I started writing my next console command before the pull was finished, and what I typed ended up in the merge message instead; I thought I had cleaned it). If you are fine with my changes, feel free to set this to positive review! (I have dealt with the first paragraph of my previous post; the second paragraph has become two TODOs, but they are very low-priority.)
The merge message is fine with me. I'm happy with your changes except I think we should remove the todo because having the Gap version allows us to do better testing by comparing the output from the algorithms. However if you really want to leave those in there, then you can go ahead and set a positive review.
Reviewer: Vincent Delecroix, Darij Grinberg
Thanks -- feel free to remove the TODOs; I would do so myself but currently I'm not at my home or work computer.
Branch pushed to git repo; I updated commit sha1. New commits:
697b1ed | Removed todo messages. |
PDF docs don't bulid
Branch pushed to git repo; I updated commit sha1. New commits:
8676829 | Fixing pdf build. |
This should fix the pdf build...
Changed branch from public/combinat/rewrite_unordered_tuples-13982 to 8676829
With Python 2.7 there's a native implementation in python's itertools:
Given that the current doc of
unordered_tuples
mentions that it wraps a GAP routine and that this is undesirable, replacing it (or deprecating it!) may be a good idea.CC: @tscrim
Component: combinatorics
Author: Travis Scrimshaw
Branch/Commit:
8676829
Reviewer: Vincent Delecroix, Darij Grinberg
Issue created by migration from https://trac.sagemath.org/ticket/13982