raylutz / daffodil

Python Daffodil: 2-D data arrays with mixed types, lightweight package, can be faster than Pandas
MIT License
7 stars 2 forks source link

Use dictionary containment check instead of -1 checking #7

Closed raylutz closed 6 months ago

raylutz commented 6 months ago

When appending, if the key already exists, the entry is updated, otherwise, it is appended. In other places, we are building a list of indexes which correspond to the keys.

It was thought that this implementation would be better:

                idx = keydict.get(gkey, -1)
                if idx >= 0:
                    idxs.append(idx)
                elif not silent_error:
                    raise KeyError                    

instead of

                if gkey in keydict:
                    idxs.append(keydict[gkey])
                elif not silent_error:
                    raise KeyError

because it appears that the dict need only be searched one time, and the idx found there used, without searching again. However, in testing, we found that the latter was about 10% faster. It is better, in general, to avoid actually resolving any indexes to actual variables because that is time consuming, and it is faster to use invisible values that are the result of keydict[gkey].

Another option is to use try/except:

                try:
                    idxs.append(keydict[gkey])
                except KeyError:
                    if not silent_error:
                        raise
raylutz commented 6 months ago

Execution time for function 1: 10.801304700027686 Execution time for function 2: 9.092454200028442 Execution time for function 3: 9.054714599973522

Interestingly, doing the try/except in the loop is a tiny bit faster, but not that much different from checking in advance, which is about 10% faster than using the -1 index trick.

raylutz commented 6 months ago

fixed in https://github.com/raylutz/daffodil/commit/5d461b905a28c5633da5fe647609c155d24a7ea3