keithwill / TrieHard

A playground for building an efficient key and key prefix lookup collection.
MIT License
12 stars 1 forks source link

Consider SortedDictionary IPrefixLookup Wrapper #7

Closed keithwill closed 11 months ago

keithwill commented 1 year ago

The standard class library Dictionary has unreasonably good performance for get and set operations directly on a key, as hash based approaches are superior for this type of operation. The hashing makes the key useless for sorted / ordered searches however. The comparison for Dictionary is unfair in both directions. A Dictionary might be three times faster than a trie for an exact key lookup, but will be more than multiple orders of magnitude slower for key prefix search. The hashing of keys attempts to avoid collisions when bucketing entries, and creates an ordering that is practically random, which is a worst case when attempting to get back ordered search results.

Since we can't fairly compare to a Dictionary, I'm curious what performance characteristics the SortedDictionary would have compared to the tries implemented in this library. Its probably the closest mutable collection that Microsoft offers that can be used for prefix searching.

keithwill commented 1 year ago

SortedSet also exists and might make sense to test too. For example, a SortedSet<KeyValuePair<TKey, TValue>>.

keithwill commented 11 months ago

I reviewed both classes in the base .NET library, and neither offers a simple way to perform a starts-with style search without materializing the entire list of keys to a collection that can be binary searched for each prefix lookup operation. The comparison wouldn't be fair, it would perform similarly to the ListPrefixLookup we already have.