pytoolz / toolz

A functional standard library for Python.
http://toolz.readthedocs.org/
Other
4.7k stars 264 forks source link

Behavior of `isdistinct` #89

Closed eriknw closed 10 years ago

eriknw commented 11 years ago

itertoolz.isdistinct does not accept iterators, and it does not short-circuit. Both of these issues can be resolved as follows:

def isdistinct(seq):
    seen = set()
    for item in seq:
        if item in seen:
            return False
        seen.add(item)
    return True

Here are benchmarks when all elements are distinct:

l1 = range(1)
l10 = range(10)
l100 = range(100)
l1000 = range(1000)

%timeit isdistinct(l1)
Old: 1000000 loops, best of 3: 968 ns per loop
New: 1000000 loops, best of 3: 1.02 µs per loop

%timeit isdistinct(l10)
Old: 100000 loops, best of 3: 1.86 µs per loop
New: 100000 loops, best of 3: 3.77 µs per loop

%timeit isdistinct(l100)
Old: 100000 loops, best of 3: 8.49 µs per loop
New: 10000 loops, best of 3: 30.1 µs per loop

%timeit isdistinct(l1000)
Old: 10000 loops, best of 3: 65.9 µs per loop
New: 1000 loops, best of 3: 283 µs per loop

Thoughts? Can you think of a better way? Should we allow the current method to be chosen (such as if seq is expected to be distinct nearly all the time)?

mrocklin commented 11 years ago

Probably not a big deal in either way. I'd suggest either going with what you have now or using the old method if isinstance(seq, [list, tuple, set]). I'm ambivalent between the two.

mrocklin commented 10 years ago

94 was merged. Closing this.