itertoolz.isdistinct does not accept iterators, and it does not short-circuit. Both of these issues can be resolved as follows:
def isdistinct(seq):
seen = set()
for item in seq:
if item in seen:
return False
seen.add(item)
return True
Here are benchmarks when all elements are distinct:
l1 = range(1)
l10 = range(10)
l100 = range(100)
l1000 = range(1000)
%timeit isdistinct(l1)
Old: 1000000 loops, best of 3: 968 ns per loop
New: 1000000 loops, best of 3: 1.02 µs per loop
%timeit isdistinct(l10)
Old: 100000 loops, best of 3: 1.86 µs per loop
New: 100000 loops, best of 3: 3.77 µs per loop
%timeit isdistinct(l100)
Old: 100000 loops, best of 3: 8.49 µs per loop
New: 10000 loops, best of 3: 30.1 µs per loop
%timeit isdistinct(l1000)
Old: 10000 loops, best of 3: 65.9 µs per loop
New: 1000 loops, best of 3: 283 µs per loop
Thoughts? Can you think of a better way? Should we allow the current method to be chosen (such as if seq is expected to be distinct nearly all the time)?
Probably not a big deal in either way. I'd suggest either going with what you have now or using the old method if isinstance(seq, [list, tuple, set]). I'm ambivalent between the two.
itertoolz.isdistinct
does not accept iterators, and it does not short-circuit. Both of these issues can be resolved as follows:Here are benchmarks when all elements are distinct:
Thoughts? Can you think of a better way? Should we allow the current method to be chosen (such as if
seq
is expected to be distinct nearly all the time)?