Closed LucaCappelletti94 closed 4 months ago
Will mem_dbg
let us compare the size of the dictionary versus the size of the trie like this issue aims to?
Yes, it should easily be able to do that. If you scroll down in the readme of this other project, you will find mem_dbg examples comparing trie, a simple Vec and an RCL. Next up I will add an RCL compressed with Huffman.
Hi Luca,
This mem_dbg stuff is cool. I wrote a little exploratory code to get a feel for it that I'd like to turn into a test.
#[cfg(feature = "mem_dbg")]
#[test]
fn memsize() {
use std::{env, io::{BufReader, BufRead}, fs::{File}};
use mem_dbg::*;
let mut builder = TrieBuilder::new();
let repo_root = env::var("CARGO_MANIFEST_DIR").expect("CARGO_MANIFEST_DIR environment variable must be set.");
let edict2_path = format!("{}/benches/edict.furigana", repo_root);
println!("Reading dictionary file from: {}", edict2_path);
let mut n_words = 0;
for result in BufReader::new(File::open(edict2_path).unwrap()).lines() {
let l = result.unwrap();
builder.push(l);
n_words += 1;
}
println!("Read {} words.", n_words);
let trie = builder.build();
let size = trie.mem_size(SizeFlags::default());
eprintln!("Trie size {size}");
let uncompressed: Vec<String> = trie.predictive_search("").collect();
eprintln!("Uncompressed size {}", uncompressed.mem_size(SizeFlags::default()));
}
The file itself is 3,717,355 bytes. When I run the above
$ cargo t --features mem_dbg memsize -- --nocapture
...
Read 185536 words.
Trie size 4174393
Uncompressed size 7984707
I'm surprised the trie is bigger than the dictionary file itself. I would have expected it to be smaller. Fortunately, it is half the size of the "uncompressed" version of the contents.
Collecting the uncompressed version takes a while, so it should probably not be kept as a unit test. Maybe it could be a benchmark somehow.
I'm intent on merging this. Can you update the Changelog?
Sure, on it.
Should I remove the travis badge? It is no longer working since travis became a paid services.
Updated changelog - feel free to publish. I would suggest to remove the travis stuff before.
Thanks for the PR.
Updated version of pull request #25
Once merged, this pull request resolves the issues #23 and #24