laysakura / trie-rs

Memory efficient trie (prefix tree) library based on LOUDS
https://crates.io/crates/trie-rs
Apache License 2.0
90 stars 10 forks source link

Trie as facade for map::Trie #22

Closed shanecelis closed 5 months ago

shanecelis commented 5 months ago

I apologize; this one's a doozy. I've been working happily with trie-rs::map::Trie but I uncovered some bugs depending on what order entries were inserted, they would overwrite the values of preceding entries. This bothered me enough that I decided to try swapping from crate::Trie being the "root" data structure to crate::map::Trie being the root. Each struct remains in their expected place.

Iterators

I got a PR into louds-rs that adds iterators that I wanted to use in trie-rs. I've added variants of the search methods ending in _ref() since they return references instead of clones.

Decloned

I removed the dependency on Clone for Label and Value. Some methods do require Clone but it's now on a method-by-method basis and it's not essential to the library's use.

Mutable Values

Values in map::Trie can be mutated with exact_match_mut().

Breaking changes

The API has been kept such that it should be source compatible with any crate::Trie uses. However, there are breaking changes for crate::map::Trie and the builders now consume self on build().

Incremental Search

I added an incremental search so that a user can service interactive applications as performantly as possible. It's accessed with trie.inc_search(). There are tests and the docs show an example of its usage.

Docs

I updated the readme with an example of mutating values and incremental search.

Performance

I ran the benchmarks with --release and backported a few benchmarks to master to confirm there weren't any regressions. Much of it remains the same, predictive_search() appears to be two times faster. And the *_big_output() tests run with a query that has a huge number of matches, 4220 to be exact but only consumes 100 of them, which is a very contrived way to show off lazy iterator performance.

Current

[5825661] Trie::build() 10000 items
                        time:   [4.3962 ms 4.4093 ms 4.4253 ms]
[5825661] Trie::exact_match() 100 times
                        time:   [3.5043 ms 3.5600 ms 3.6154 ms]
[5825661] Trie::predictive_search() 100 times
                        time:   [11.629 ms 11.654 ms 11.693 ms]
[5825661] Trie::predictive_search_big_output()
                        time:   [111.87 ms 112.11 ms 112.35 ms]
[5825661] Trie::common_prefix_search() 100 times
                        time:   [3.5102 ms 3.5244 ms 3.5333 ms]

New

[9229648] Trie::build() 10000 items
                        time:   [3.0010 ms 3.0112 ms 3.0271 ms]
[9229648] Trie::exact_match() 100 times
                        time:   [3.0662 ms 3.1059 ms 3.1328 ms]
[9229648] Trie::predictive_search() 100 times
                        time:   [5.5695 ms 5.6354 ms 5.7003 ms]
[9229648] Trie::predictive_search_ref() 100 times
                        time:   [5.5139 ms 5.5504 ms 5.6199 ms]
[9229648] Trie::predictive_search_big_output()
                        time:   [41.849 ms 41.958 ms 42.026 ms]
[9229648] Trie::predictive_search_ref_big_output()
                        time:   [662.31 us 668.07 us 673.85 us] // microseconds
[9229648] Trie::common_prefix_search() 100 times
                        time:   [3.1777 ms 3.1972 ms 3.2197 ms]
[9229648] Trie::common_prefix_search_ref() 100 times
                        time:   [3.1445 ms 3.1656 ms 3.1860 ms]
shanecelis commented 5 months ago

Doh. I forgot to rebase onto master before submitting PR. Rebased and force pushed to map-as-root branch.

shanecelis commented 5 months ago

I'm gonna work on this one a little more and try to minimize the additional methods. Instead of adding *_ref() variants, maybe I'll have another module called declone or something that has declone::{Trie, TrieBuilder}.

shanecelis commented 5 months ago

I'm withdrawing this one. I've got to workshop it some more.