rewrite sage.combinat.combinat.unordered_tuples using itertools.combinations_with_replacement

nbruin commented 11 years ago

With Python 2.7 there's a native implementation in python's itertools:

sage: list(itertools.combinations_with_replacement([1,2,3,4],2))
[(1, 1), (1, 2), (1, 3), (1, 4), (2, 2), (2, 3), (2, 4), (3, 3), (3, 4), (4, 4)]
sage: sage.combinat.combinat.unordered_tuples([1,2,3,4],2)
[[1, 1], [1, 2], [1, 3], [1, 4], [2, 2], [2, 3], [2, 4], [3, 3], [3, 4], [4, 4]]

Given that the current doc of unordered_tuples mentions that it wraps a GAP routine and that this is undesirable, replacing it (or deprecating it!) may be a good idea.

CC: @tscrim

Component: combinatorics

Author: Travis Scrimshaw

Branch/Commit: 8676829

Reviewer: Vincent Delecroix, Darij Grinberg

Issue created by migration from https://trac.sagemath.org/ticket/13982

tscrim commented 10 years ago

Commit: 373e585

tscrim commented 10 years ago

Author: Travis Scrimshaw

tscrim commented 10 years ago

Branch: public/combinat/rewrite_unordered_tuples-13982

tscrim commented 10 years ago

comment:6

In order to get the behavior to match, I had to call sorted(set(S)) on the base set S when using itertools. On using the output from GAP, I had to do map(tuple, eval(ans)) because that returned lists (or sometimes strings). I've given the choice to the user to decide on which algorithm. Here are some timings (with GAP already running):

sage: %timeit unordered_tuples([1,2,3],2,algorithm='itertools')
10000 loops, best of 3: 31.9 µs per loop
sage: %timeit unordered_tuples([1,2,3],2,algorithm='gap')
1000 loops, best of 3: 1.65 ms per loop
sage: %timeit eval(gap.eval( "UnorderedTuples(%s,%s)"%([1,2,3],ZZ(2)) ))
1000 loops, best of 3: 1.65 ms per loop

sage: %timeit unordered_tuples([1,2],6)
10000 loops, best of 3: 31.4 µs per loop
sage: %timeit unordered_tuples([1,2],6,algorithm='gap')
100 loops, best of 3: 2.33 ms per loop
sage: %timeit eval(gap.eval( "UnorderedTuples(%s,%s)"%([1,2],ZZ(6)) ))
100 loops, best of 3: 2.06 ms per loop

sage: %timeit unordered_tuples(range(30), 6)
1 loops, best of 3: 590 ms per loop
sage: %timeit eval(gap.eval( "UnorderedTuples(%s,%s)"%(range(30),ZZ(6)) ))
# I got bored after a minute and killed it

So even with everything, there is a major speedup. Now once this gets replaced with libgap (#16719), we'll have to rerun timings.

New commits:

373e585 Implemented unordered_tuples using itertools.

rwst commented 10 years ago

comment:7

Please justify the removal of the docstring with result ['aa', 'ab', 'ac', 'bb', 'bc', 'cc'].

tscrim commented 10 years ago

comment:8

It's an unexpected format, in particular, the results aren't tuples. One can argue from the POV of ducktyping, it's okay, but I feel the inconsistency and the fact that it would fail a more explicit isinstance check the behavior should be standardized for all inputs. (Side note, I feel that should be a needs_info.)

videlec commented 10 years ago

comment:9

Hello,

Why not a deprecation? It is nowhere used in Sage code and it would be good to advertise users that it is infinitely better to use itertools directly.

Vincent

tscrim commented 10 years ago

comment:10

Because using the itertools directly for a multiset gives multiple copies of the same tuple

sage: list(itertools.combinations_with_replacement([1,1], 2))
[(1, 1), (1, 1), (1, 1)]

which is a different behavior (this should just return a single tuple of [(1, 1)]).

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

comment:11

Needs a merge/rebase.

Nathann

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

ca8ec66 Merge branch 'public/combinat/rewrite_unordered_tuples-13982' of trac.sagemath.org:sage into public/combinat/rewrite_unordered_tuples-13982

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 373e585 to ca8ec66

tscrim commented 9 years ago

comment:13

Done.

videlec commented 9 years ago

comment:14

What about tuples vs itertools.product

sage: S = [1,2]
sage: %timeit tuples(S,3)
10000 loops, best of 3: 26.9 µs per loop
sage: %timeit list(product(S,repeat=3))
100000 loops, best of 3: 1.74 µs per loop

And the function number_of_tuples should not wrap anything, the answer is just len(S)^k.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

3c0c50a Implemented faster versions of tuples.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from ca8ec66 to 3c0c50a

tscrim commented 9 years ago

comment:16

I've added that as well. Note the output order has changed (and the fact that it now returns honest tuples).

videlec commented 9 years ago

comment:17

Hello,

Much cleaner now! What do you think of deprecations (not necessarily in this ticket)? Do we really want to keep the interface to GAP for those two iterators?

Vincent

tscrim commented 9 years ago

comment:18

I think it is good to have access to multiple implementations since onemight be faster in certain situations and it makes for easier testing. At the very least, it doesn't do any harm being there IMO. (There's also a president for doing this with other such functions.)

videlec commented 9 years ago

comment:19

Replying to @tscrim:

I think it is good to have access to multiple implementations since onemight be faster in certain situations and it makes for easier testing.

In the present case itertools is always faster and I think people should use itertools directly.

At the very least, it doesn't do any harm being there IMO. (There's also a president for doing this with other such functions.)

Yes it does. This is one more function in the global namespace, one more file to maintain.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

comment:20

Yes it does. This is one more function in the global namespace, one more file to maintain.

Plus let us only keep the code that we think worthy of being used.

Nathann

tscrim commented 9 years ago

comment:21

It's not adding anything more to the global namespace. Plus it is a separate (possibly different) algorithm, and you have to prove that in every case that the itertools implementation is faster.

6bdad4c1-1e26-4f2f-a442-a01a2292c181 commented 9 years ago

comment:22

It's not adding anything more to the global namespace. Plus it is a separate (possibly different) algorithm, and you have to prove that in every case that the itertools implementation is faster.

This seems to be a very bad political idea. It is very likely that even though nobody ever uses this implementation because it is obviously slower than itertools, nobody will ever be able to give a formal proof that it is slower in all cases.

Nathann

videlec commented 9 years ago

comment:23

Replying to @tscrim:

It's not adding anything more to the global namespace.

All of tuples, number_of_tuples, unordered_tuples and number_of_unordered_tuples are in the global namespace. I do not claim that your branch adds something to the namespace but that it was already useless noise.

tscrim commented 9 years ago

comment:24

@nathanncohen Even if you could show it is always slower (large sets can behave very differently, so it's not obvious to me), it makes it better testing because they are independent algorithms (although it is highly unlikely one of the algorithms has a bug).

@videlec Sorry, I interpreted the "this" as my branch.

darijgr commented 9 years ago

comment:25

The "native" implementation of tuple is recursive, and with the current implementation it passes the string 'native' to itself a zillion times. Am I seeing ghosts or could this be an inefficiency? (I would implement the native algorithm, if anywhere, then in a separate method tuples_native, which would recurse upon itself, and which I would then reference one time from tuples.)

On number_of_tuples, is there really any chance for a gap call to beat return ZZ( len(set(S)) )**k ? I imagine the set call could take a while if S consists of complex things, but then again the doc of unordered_tuples warns against passing complex things to gap, which I assume should also apply to number_of_tuples. But I am too lazy to do timings...

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`256c54f`	`Merge branches 'develop' and`
`64c8ab7`	`review changes and add analogous change to number_of_unordered_tuples`
`79cb892`	`let's not scare people with symbolics`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 3c0c50a to 79cb892

darijgr commented 9 years ago

comment:27

Sorry for the badly named merge commit (I started writing my next console command before the pull was finished, and what I typed ended up in the merge message instead; I thought I had cleaned it). If you are fine with my changes, feel free to set this to positive review! (I have dealt with the first paragraph of my previous post; the second paragraph has become two TODOs, but they are very low-priority.)

tscrim commented 9 years ago

comment:28

The merge message is fine with me. I'm happy with your changes except I think we should remove the todo because having the Gap version allows us to do better testing by comparing the output from the algorithms. However if you really want to leave those in there, then you can go ahead and set a positive review.

tscrim commented 9 years ago

Reviewer: Vincent Delecroix, Darij Grinberg

darijgr commented 9 years ago

comment:29

Thanks -- feel free to remove the TODOs; I would do so myself but currently I'm not at my home or work computer.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

697b1ed Removed todo messages.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 79cb892 to 697b1ed

vbraun commented 9 years ago

comment:32

PDF docs don't bulid

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

8676829 Fixing pdf build.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 697b1ed to 8676829

tscrim commented 9 years ago

comment:34

This should fix the pdf build...

vbraun commented 9 years ago

Changed branch from public/combinat/rewrite_unordered_tuples-13982 to 8676829

sagemath / sage

rewrite sage.combinat.combinat.unordered_tuples using itertools.combinations_with_replacement #13982