quantifiedcode / python-anti-patterns

An open collection of Python anti-patterns and worst practices.
https://quantifiedcode.github.io/python-anti-patterns
Other
1.72k stars 249 forks source link

using key in list for performance needs more qualification #70

Open tacaswell opened 9 years ago

tacaswell commented 9 years ago

http://docs.quantifiedcode.com/python-anti-patterns/performance/using_key_in_list_to_check_if_key_is_contained_in_a_list.html

def f_a():
    my_list = list(range(500))

    bool(3 in my_list)

def f_b():
    my_set = set(range(500))

    bool(3 in my_set)

gives

In [30]: %timeit f_a()
100000 loops, best of 3: 7.7 µs per loop

In [31]: %timeit f_b()
100000 loops, best of 3: 17.5 µs per loop

It is only worth paying the cost of creating a set (O(n log n)) to get the O(log n) loop up speed if you are going to be doing it many times.

vogelsgesang commented 9 years ago

Yes, you are right, we need an additional of the pros and cons of lists vs. keys. I will write it within the next week. Would be perfect if you could be available as a reviewer :)

By the way: I just tried to reproduce your numbers with a slightly different testcase:

def f_a():
    my_list = list()
    for i in range(500):
        my_list.add(i)

    bool(3 in my_list)

def f_b():
    my_set = set()
    for i in range(500):
        my_set.add(i)

    bool(3 in my_set)

%timeit f_a()
# 10000 loops, best of 3: 27.6 µs per loop

%timeit f_b()
# 10000 loops, best of 3: 33 µs per loop

This time the difference in execution speed is not that big...

tacaswell commented 9 years ago

That makes sense though, creating a set by appending is still O(n log n), whereas creating a list by appending is (worst case) O(n**2) (python over allocates so to avoid doing a fully copy every time a new element is appended so it isn't as bad as it could be).

Yes, ping me in the PR.

[edit because I am wrong]

coady commented 9 years ago

Python sets are implemented with hash tables, so their creation is still O(n). Although there's more constant-time overhead for sets vs. lists - as well as for adding/appending vs. pre-allocating - it's all asymptotically O(n).

And the timing examples are combining the creation time with just 1 membership test. Naturally if you already have a list, and are only testing membership once, then there's no point in converting to a set.

But I agree there's a general anti-pattern here; it just needs more context, such as repeated membership testing in a loop. In mentoring Pythoneers, I have definitely seen the pattern of over list-ifying, instead of using the right data structure. It seems to stem from not trusting duck-typing and iteration as a protocol.

tacaswell commented 9 years ago

Yeah, I am wrong. Should not post comments before coffee.

A bit more context was all I was asking for.

vogelsgesang commented 9 years ago

Ok, we will provide more context in the article.

It would be great to have some links concerning the time complexities. I only know https://wiki.python.org/moin/TimeComplexity. But this page does not contain any informations about the time needed for creation of list/set. Do you know some good links to include in the article?

rdkr commented 8 years ago

I found this issue having just read and tested this anti-pattern, so I agree some further clarification would be good. Perhaps simply changing the title to 'repeatedly using key in list', with an explanation of the complexities and practicalities below (O(n) set creation + O(1) checks vs O(n) checks)?