Closed raylutz closed 6 months ago
tested with this code:
import timeit
# Sample list of keys
key_col = list(range(1000))
# Option 1: Dict comprehension with enumerated loop
def option1():
return {key: index for index, key in enumerate(key_col)}
# Option 2: Using a generator function
def with_index(iterable):
"""Like enumerate, but (val, i) tuples instead of (i, val)."""
for i, item in enumerate(iterable):
yield (item, i)
def option2():
return dict(with_index(key_col))
# Option 3: Using dict zip
def option3():
return dict(zip(key_col, range(len(key_col))))
# Measure execution time for each option
time_option1 = timeit.timeit(option1, number=10000)
time_option2 = timeit.timeit(option2, number=10000)
time_option3 = timeit.timeit(option3, number=10000)
# Print results
print("Option 1 execution time:", time_option1)
print("Option 2 execution time:", time_option2)
print("Option 3 execution time:", time_option3)
result:
Option 1 execution time: 0.7322188999969512 Option 2 execution time: 1.1231585000059567 Option 3 execution time: 0.4843974999967031
Conclusion: Change to dict-zip-range approach. This appears to be faster because the algorithm uses "invisible indexes" that are created from the range() statement and then combing with the keys without ever resolving those to any python identifiers.
Option 1 is 60.13% slower than Option 3. Option 2 is 119.56% slower than Option 3.
Key dictionary construction is performed for both the hd and kd. The kd may be very large. Therefore, it will be helpful for this to be as fast as possible.
Options: