Key dictionary construction

raylutz commented 6 months ago

Key dictionary construction is performed for both the hd and kd. The kd may be very large. Therefore, it will be helpful for this to be as fast as possible.

Options:

dict comprehension with enumerated loop:

        kd = {key: index for index, key in enumerate(key_col)}

use generator function

        kd = dict(with_index(key_col))

def with_index(iterable):
    """Like enumerate, but (val, i) tuples instead of (i, val)."""
    for i, item in enumerate(iterable):
        yield (item, i)

dict zip

kd = dict(zip(key_col, range(len(key_col))))

raylutz commented 6 months ago

tested with this code:

import timeit

# Sample list of keys
key_col = list(range(1000))

# Option 1: Dict comprehension with enumerated loop
def option1():
    return {key: index for index, key in enumerate(key_col)}

# Option 2: Using a generator function
def with_index(iterable):
    """Like enumerate, but (val, i) tuples instead of (i, val)."""
    for i, item in enumerate(iterable):
        yield (item, i)

def option2():
    return dict(with_index(key_col))

# Option 3: Using dict zip
def option3():
    return dict(zip(key_col, range(len(key_col))))

# Measure execution time for each option
time_option1 = timeit.timeit(option1, number=10000)
time_option2 = timeit.timeit(option2, number=10000)
time_option3 = timeit.timeit(option3, number=10000)

# Print results
print("Option 1 execution time:", time_option1)
print("Option 2 execution time:", time_option2)
print("Option 3 execution time:", time_option3)

result:

Option 1 execution time: 0.7322188999969512 Option 2 execution time: 1.1231585000059567 Option 3 execution time: 0.4843974999967031

raylutz commented 6 months ago

Conclusion: Change to dict-zip-range approach. This appears to be faster because the algorithm uses "invisible indexes" that are created from the range() statement and then combing with the keys without ever resolving those to any python identifiers.

Option 1 is 60.13% slower than Option 3. Option 2 is 119.56% slower than Option 3.

raylutz commented 6 months ago

fixed in https://github.com/raylutz/daffodil/commit/5d461b905a28c5633da5fe647609c155d24a7ea3

raylutz / daffodil

Key dictionary construction #6