sile / patricia_tree

A memory-efficient patricia tree implementation written in Rust
MIT License
111 stars 17 forks source link

Unable to get StringPatriciaMap to work with utf-8 keys #35

Closed jannisbecker closed 9 months ago

jannisbecker commented 9 months ago

Hello!

I'm trying to build a japanese dictionary app with the StringPatriciaMap as a searchable index for searching the dictionary quickly.

Given the following mini example, this does not work:

    let mut map = StringPatriciaMap::<u8>::new();
    map.insert("インターポール", 1);
    map.insert("インターポル", 2);
    map.insert("インターリーブ", 3);
    map.insert("インターン", 4);

    assert_eq!(map.get("インターポール"), Some(&1)); //works
    assert_eq!(map.get("インターポル"), Some(&2)); //doesn't work

map.get will return Some if I search for the first key, but None if I search for the second or any other key.

This might just be a misunderstanding on how this data structure works, so please enlighten me if that's the case.

sile commented 9 months ago

Thank you for reporting this issue. I'm not certain, but it appears to be a bug within this crate. I will conduct a more thorough investigation when I have the time, likely within a week.

jannisbecker commented 9 months ago

I tested a bit more and it seems that it works as expected with the regular PatriciaMap:

#[test]
fn issue35() {
    // Insert as bytes.
    let mut map = PatriciaMap::new();
    map.insert("インターポール", ());
    map.insert("インターポル", ());
    map.insert("インターポ", ());

    assert_eq!(map.len(), map.iter().count());
    assert_eq!(map.get("インターポール"), Some(&())); //first entry
    assert_eq!(map.get("インターポル"), Some(&())); //second entry
    assert_eq!(map.get("インターポ"), Some(&())); //third entry
    assert_eq!(map.get("インターン"), None); //something else

    // make sure prefix search correctly finds both keys
    let prefixes: Vec<&[u8]> = map
        .common_prefixes("インターポール".as_bytes())
        .into_iter()
        .map(|(key, _)| key)
        .collect();

    assert_eq!(
        prefixes,
        vec!["インターポ".as_bytes(), "インターポール".as_bytes()]
    )
}

however if I switch to a StringPatriciaMap while keeping the test logic the same, the second entry cannot be found:

#[test]
fn issue35_string() {
    // Insert as strings.
    let mut map = StringPatriciaMap::new();
    map.insert("インターポール", ());
    map.insert("インターポル", ());
    map.insert("インターポ", ());

    assert_eq!(map.len(), map.iter().count());
    assert_eq!(map.get("インターポール"), Some(&())); //first entry
    assert_eq!(map.get("インターポル"), Some(&())); //second entry (fails here!)
    assert_eq!(map.get("インターポ"), Some(&())); //third entry
    assert_eq!(map.get("インターン"), None); //something else

    // make sure prefix search correctly finds both keys
    let prefixes: Vec<&str> = map
        .common_prefixes("インターポール".as_bytes())
        .into_iter()
        .map(|(key, _)| key)
        .collect();

    assert_eq!(prefixes, vec!["インターポ", "インターポール"])
}
sile commented 9 months ago

Fixed this issue and just released v0.7.0. Thanks again for reporting this bug!