lerouxrgd / ngt-rs

Rust wrappers for NGT approximate nearest neighbor search
Apache License 2.0
36 stars 6 forks source link

Error ngt_remove_index() only with normalized distance types #17

Closed cjrh closed 9 months ago

cjrh commented 10 months ago

Summary

I've come across a problem when calling index.remove(id: VecId), but it only happens with certain distance types:

The common theme seems to be that these are normalized?

This is the error that is produced:

Error: Capi : ngt_remove_index() : Error: /.../ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

Reproducer

I've made a basic project with test cases:

ngtbug.zip

These are the versions in use:

[dependencies]
anyhow = "1.0.75"
ngt = "0.6.1"
uuid = { version = "1.5.0", features = ["v4"] }

and in the lock file, ngt-sys is at 2.1.3.

This is test case, to explain what is happening:

#[cfg(test)]
mod tests {
    use anyhow::Result;
    use ngt::{NgtIndex, NgtDistance, NgtProperties, EPSILON};

    fn run_test(dist: NgtDistance) -> Result<()> {
        // Create a new index
        let prop = NgtProperties::<f32>::dimension(3)?
            .distance_type(dist)?;

        // Temp dir for tests
        let temp_dir_p = std::env::temp_dir();
        let temp_dir = temp_dir_p.to_string_lossy();
        let u = uuid::Uuid::new_v4();
        let u = u.as_simple();
        let index_path = format!("{temp_dir}/{u}");
        std::fs::remove_dir_all(&index_path).ok();

        let mut index = NgtIndex::create(&index_path, prop)?;

        // Insert two vectors and get their id
        let vec1 = vec![1.0, 2.0, 3.0];
        let vec2 = vec![4.0, 5.0, 6.0];
        let id1 = index.insert(vec1)?;
        let id2 = index.insert(vec2)?;
        println!("id1: {}", id1);
        println!("id2: {}", id2);

        // Actually build the index (not yet persisted on disk)
        // This is required in order to be able to search vectors
        index.build(2)?;
        index.persist()?;

        // Perform a vector search (with 1 result)
        let res = index.search(&[1.1, 2.1, 3.1], 1, EPSILON)?;
        assert_eq!(res[0].id, id1);

        // PROBLEM HERE
        index.remove(id1)?;

        // Cleanup - remove the index path
        std::fs::remove_dir_all(&index_path)?;

        Ok(())
    }

    #[test]
    fn test01_L1() -> Result<()> {
        run_test(NgtDistance::L1)
    }

    #[test]
    fn test02_L2() -> Result<()> {
        run_test(NgtDistance::L2)
    }

    #[test]
    fn test03_Angle() -> Result<()> {
        run_test(NgtDistance::Angle)
    }

    #[test]
    fn test04_Hamming() -> Result<()> {
        run_test(NgtDistance::Hamming)
    }

    #[test]
    fn test05_Cosine() -> Result<()> {
        run_test(NgtDistance::Cosine)
    }

    #[test]
    fn test06_NormalizedAngle() -> Result<()> {
        run_test(NgtDistance::NormalizedAngle)
    }

    #[test]
    fn test07_NormalizedCosine() -> Result<()> {
        run_test(NgtDistance::NormalizedCosine)
    }

    #[test]
    fn test08_Jaccard() -> Result<()> {
        run_test(NgtDistance::Jaccard)
    }

    #[test]
    fn test09_SparseJaccard() -> Result<()> {
        run_test(NgtDistance::SparseJaccard)
    }

    #[test]
    fn test10_NormalizedL2() -> Result<()> {
        run_test(NgtDistance::NormalizedL2)
    }

    #[test]
    fn test11_Poincare() -> Result<()> {
        run_test(NgtDistance::Poincare)
    }

    #[test]
    fn test12_Lorentz() -> Result<()> {
        run_test(NgtDistance::Lorentz)
    }
}

This is the output from the test run:

$ cargo test -- --test-threads=1
   Compiling ngtbug v0.1.0 (/home/caleb/tmp/ngtbug)

running 12 tests
test tests::test01_L1 ... ok
test tests::test02_L2 ... ok
test tests::test03_Angle ... ok
test tests::test04_Hamming ... ok
test tests::test05_Cosine ... ok
test tests::test06_NormalizedAngle ... FAILED
test tests::test07_NormalizedCosine ... FAILED
test tests::test08_Jaccard ... ok
test tests::test09_SparseJaccard ... FAILED
test tests::test10_NormalizedL2 ... FAILED
test tests::test11_Poincare ... ok
test tests::test12_Lorentz ... FAILED

failures:

---- tests::test06_NormalizedAngle stdout ----
id1: 1
id2: 2
Error: Capi : ngt_remove_index() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

---- tests::test07_NormalizedCosine stdout ----
id1: 1
id2: 2
Error: Capi : ngt_remove_index() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

---- tests::test09_SparseJaccard stdout ----
Error: Capi : ngt_insert_index_as_float() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/ObjectRepository.h:allocatePersistentObject:345: ObjectSpace::allocatePersistentObject: Fatal error! The dimensionality is invalid. The specified dimensionality=3. The specified object=2.

---- tests::test10_NormalizedL2 stdout ----
id1: 1
id2: 2
Error: Capi : ngt_remove_index() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

---- tests::test12_Lorentz stdout ----
id1: 1
id2: 2
thread 'tests::test12_Lorentz' panicked at src/main.rs:40:23:
index out of bounds: the len is 0 but the index is 0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

failures:
    tests::test06_NormalizedAngle
    tests::test07_NormalizedCosine
    tests::test09_SparseJaccard
    tests::test10_NormalizedL2
    tests::test12_Lorentz

test result: FAILED. 7 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s

The failures in SparseJaccard and Lorentz are interesting but unrelated to my issue. I need to use NormalizedCosine for my application.

Comments

It is possible this is an issue with the upstream NGT library. I am not sure but I decided to ask here first. It is also possible that I have missed some detail about these distance types are supposed to be used.

lerouxrgd commented 10 months ago

Thanks for the detailed report, I am indeed able to reproduce the errors you see. I also think it might be due to some bugs in the underlying NGT code and it would be worth reporting it there too. I will try to investigate NGT code.

cjrh commented 10 months ago

Ok I made a reproducer using ngt-py and made a corresponding issue there ☝🏼

wallies commented 9 months ago

@cjrh @lerouxrgd this looks to be fixed in the latest release https://github.com/yahoojapan/NGT/releases/tag/v2.1.5

lerouxrgd commented 9 months ago

GitHub automatically closed this issue when I merged the PR, however there seems to be an issue remaining for Lorenz and SparseJaccard distances. I have mentioned it in the related NGT issue.

lerouxrgd commented 9 months ago

As Lorentz and SparseJaccard seem a bit special, I have released 0.7.0 that fixes the issue for the other distances.