michaelsproul / rust_radix_trie

Fast generic radix trie implemented in Rust
https://docs.rs/radix_trie/
MIT License
188 stars 33 forks source link

Extremely poor memory efficiency (many times worse than Vec even with near-ideal conditions) #68

Closed dralley closed 1 year ago

dralley commented 2 years ago

I want to use radix_trie to improve the memory efficiency of storing a large number of Unix file paths. Here's a sample of some of the input.

/usr/share/applications/cadence.desktop
/usr/share/applications/catarina.desktop
/usr/share/applications/catia.desktop
/usr/share/applications/claudia-launcher.desktop
/usr/share/applications/claudia.desktop
/usr/share/cadence
/usr/share/cadence/icons
/usr/share/cadence/icons/claudia-hicolor
/usr/share/cadence/icons/claudia-hicolor/16x16
/usr/share/cadence/icons/claudia-hicolor/16x16/apps
/usr/share/cadence/icons/claudia-hicolor/16x16/apps/arctican_plugins.png
/usr/share/cadence/icons/claudia-hicolor/16x16/apps/ardour.png
/usr/share/cadence/icons/claudia-hicolor/16x16/apps/aria.png
/usr/share/cadence/icons/claudia-hicolor/16x16/apps/arpage.png
/usr/share/cadence/icons/claudia-hicolor/16x16/apps/azr3.png
/usr/share/cadence/icons/claudia-hicolor/16x16/apps/bristol_BME700.png
/usr/share/cadence/icons/claudia-hicolor/16x16/apps/bristol_aks.png
/usr/share/cadence/icons/claudia-hicolor/16x16/apps/bristol_arp2600.png
/usr/share/cadence/icons/claudia-hicolor/16x16/apps/bristol_axxe.png
/usr/share/cadence/icons/claudia-hicolor/16x16/apps/bristol_bassmaker.png
/usr/share/cadence/icons/claudia-hicolor/16x16/apps/bristol_bit99.png
/usr/share/cadence/icons/claudia-hicolor/16x16/apps/bristol_bitone.png
/usr/share/cadence/icons/claudia-hicolor/16x16/apps/bristol_cs80.png
/usr/share/cadence/icons/claudia-hicolor/16x16/apps/bristol_dx.png

This ought to be a near-ideal case for a Radix Trie because each path is near certain to share a prefix (a fairly long prefix, usually) with other paths. But that's not what I see when I run an experiment -- the Radix Trie requires 4x more memory to store the strings than a normal Vec. 3.73 gigabytes vs 925 megabytes.

Experiment code here: https://github.com/dralley/radix_trie_test

[dalley@thinkpad radix_trie_test]$ /usr/bin/time -v target/release/radix_test 
7223777
    Command being timed: "target/release/radix_test"
    User time (seconds): 6.24
    System time (seconds): 1.00
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.27
        ... snip ...
    Maximum resident set size (kbytes): 3734400
        ... snip ...
[dalley@thinkpad radix_trie_test]$ /usr/bin/time -v target/release/vec_test 
8568961
    Command being timed: "target/release/vec_test"
    User time (seconds): 0.97
    System time (seconds): 0.34
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.32
        ... snip ...
    Maximum resident set size (kbytes): 924672
    ... snip ...
dralley commented 2 years ago

Update, partially tempering my own expectations:

One issue is that each node stores a full copy of the key - that has already been noted previously in other issues.

But one reason which is not really an issue is that standard radix tries are not really optimized for compactness and it's not entirely realistic to expect memory savings from a standard radix trie, because each node has to hold N pointers even if few of them are used. So to actually save memory, realistically it wouldn't be likely without an adaptive radix trie.