python / cpython

The Python programming language
https://www.python.org
Other
63.47k stars 30.39k forks source link

Search not needed in combinations_with_replacement #62130

Closed tim-one closed 11 years ago

tim-one commented 11 years ago
BPO 17930
Nosy @tim-one, @rhettinger, @terryjreedy, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at = created_at = labels = ['extension-modules', 'easy', 'performance'] title = 'Search not needed in combinations_with_replacement' updated_at = user = 'https://github.com/tim-one' ``` bugs.python.org fields: ```python activity = actor = 'tim.peters' assignee = 'rhettinger' closed = True closed_date = closer = 'tim.peters' components = ['Extension Modules'] creation = creator = 'tim.peters' dependencies = [] files = [] hgrepos = [] issue_num = 17930 keywords = ['easy'] message_count = 5.0 messages = ['188686', '188687', '188735', '188897', '196815'] nosy_count = 4.0 nosy_names = ['tim.peters', 'rhettinger', 'terry.reedy', 'serhiy.storchaka'] pr_nums = [] priority = 'low' resolution = 'rejected' stage = 'resolved' status = 'closed' superseder = None type = 'performance' url = 'https://bugs.python.org/issue17930' versions = ['Python 2.7', 'Python 3.5'] ```

tim-one commented 11 years ago

Each time thru, CWR searches for the rightmost position not containing the maximum index. But this is wholly determined by what happened the last time thru - search isn't really needed. Here's Python code:

def cwr2(iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    if not n and r:
        return
    indices = [0] * r
    yield tuple(pool[i] for i in indices)
    j = r-1 if n > 1 else -1
    while j >= 0:
        newval = indices[j] + 1
        indices[j:] = [newval] * (r - j)
        yield tuple(pool[i] for i in indices)
        j = r-1 if newval < n-1 else j-1

There j is the rightmost position not containing the maximum index. A little thought suffices to see that the next j is either r-1 (if newval is not the maximum index) or j-1 (if newval is the maximum index: since the indices vector is non-decreasing, if indices[j] was r-2 then indices[j-1] is also at most r-2).

I don't much care if this goes in, but Raymond should find it amusing so assigning it to him ;-)

tim-one commented 11 years ago

Oops! Last part should read

"since the indices vector is non-decreasing, if indices[j] was n-2 then indices[j-1] is also at most n-2"

That is, the instances of "r-2" in the original should have been "n-2".

tim-one commented 11 years ago

There's another savings to be had when an index becomes the maximum: in that case, all the indices to its right are already at the maximum, so no need to overwrite them. This isn't as big a savings as skipping the search, but still buys about 10% more in the Python code. Like so:

    def cwr3(iterable, r):
        pool = tuple(iterable)
        n = len(pool)
        if n == 0 and r:
            return
        indices = [0] * r
        yield tuple(pool[i] for i in indices)
        rmax, nmax = r-1, n-1
        j = rmax if n > 1 else -1
        while j >= 0:
            newval = indices[j] + 1
            if newval < nmax:
                indices[j:] = [newval] * (r - j)
                j = rmax
            else:
                assert newval == nmax
                indices[j] = newval
                j -= 1
            yield tuple(pool[i] for i in indices)
rhettinger commented 11 years ago

Thanks Tim :-)

tim-one commented 11 years ago

I'm closing this. While it makes a big difference for a cwr coded in Python, it turn out to be minor in C. The extra complications (more state to remember and update across next() invocations) isn't worth the minor speedup in C.