Open tacaswell opened 9 years ago
Yes, you are right, we need an additional of the pros and cons of lists vs. keys. I will write it within the next week. Would be perfect if you could be available as a reviewer :)
By the way: I just tried to reproduce your numbers with a slightly different testcase:
def f_a():
my_list = list()
for i in range(500):
my_list.add(i)
bool(3 in my_list)
def f_b():
my_set = set()
for i in range(500):
my_set.add(i)
bool(3 in my_set)
%timeit f_a()
# 10000 loops, best of 3: 27.6 µs per loop
%timeit f_b()
# 10000 loops, best of 3: 33 µs per loop
This time the difference in execution speed is not that big...
That makes sense though, creating a set by appending is still O(n log n)
, whereas creating a list by appending is (worst case) O(n**2)
(python over allocates so to avoid doing a fully copy every time a new element is appended so it isn't as bad as it could be).
Yes, ping me in the PR.
[edit because I am wrong]
Python sets are implemented with hash tables, so their creation is still O(n). Although there's more constant-time overhead for sets vs. lists - as well as for adding/appending vs. pre-allocating - it's all asymptotically O(n).
And the timing examples are combining the creation time with just 1 membership test. Naturally if you already have a list, and are only testing membership once, then there's no point in converting to a set.
But I agree there's a general anti-pattern here; it just needs more context, such as repeated membership testing in a loop. In mentoring Pythoneers, I have definitely seen the pattern of over list-ifying, instead of using the right data structure. It seems to stem from not trusting duck-typing and iteration as a protocol.
Yeah, I am wrong. Should not post comments before coffee.
A bit more context was all I was asking for.
Ok, we will provide more context in the article.
It would be great to have some links concerning the time complexities. I only know https://wiki.python.org/moin/TimeComplexity. But this page does not contain any informations about the time needed for creation of list
/set
. Do you know some good links to include in the article?
I found this issue having just read and tested this anti-pattern, so I agree some further clarification would be good. Perhaps simply changing the title to 'repeatedly using key in list', with an explanation of the complexities and practicalities below (O(n) set creation + O(1) checks vs O(n) checks)?
http://docs.quantifiedcode.com/python-anti-patterns/performance/using_key_in_list_to_check_if_key_is_contained_in_a_list.html
gives
It is only worth paying the cost of creating a set (
O(n log n)
) to get theO(log n)
loop up speed if you are going to be doing it many times.