lerouxrgd / ngt-rs

Rust wrappers for NGT approximate nearest neighbor search
Apache License 2.0
36 stars 6 forks source link

Question: how to think about search `radius` when using `NormalizedCosine` distance type #18

Closed cjrh closed 10 months ago

cjrh commented 10 months ago

Hi @lerouxrgd!

I hope this is an easy one to answer.

I have simple code that does the following:

I am trying to understand how the radius numerically affects the search results. I am also asking about whether the normalization is fully handled for me, or whether I need to do my own normalization on search vector, for example.

Basic code, for discussion, looks something like this:

        // Create a new index
        let prop = NgtProperties::<f32>::dimension(3)?
        .creation_edge_size(10)?
        .search_edge_size(40)?
            .distance_type(NgtDistance::NormalizedCosine)?;

        let temp_dir_p = std::env::temp_dir();
        let temp_dir = temp_dir_p.to_string_lossy();
        let index_path = format!("{temp_dir}/ngttest");
        std::fs::remove_dir_all(&index_path).unwrap_or_else(|e| {
            println!("Got error removing dir: {}", e);
        });

        let _index = NgtIndex::create(&index_path, prop)?;

        // Open an existing index
        let mut index = NgtIndex::open(&index_path)?;

        // Insert two vectors and get their id
        let vec1 = vec![1.0, 2.0, 3.0];
        let vec2 = vec![4.0, 5.0, 6.0];
        let id1 = index.insert(vec1)?;
        let id2 = index.insert(vec2)?;

        // Actually build the index (not yet persisted on disk)
        // This is required in order to be able to search vectors
        index.build(2)?;

        // Perform a search with a specific radius
        use ngt::NgtQuery;
        let query = NgtQuery::new(&[1.1, 2.1, 3.1])
            .size(10)
            .radius(0.004);                                          // <--------------- How to set this?
        let res = index.search_query(query)?;
        println!("radius res {:?}", &res);
        assert_eq!(res.len(), 1);

These are my questions:

lerouxrgd commented 10 months ago

Hello !

Sadly I'm afraid that I won't be of much help on this one. In my use cases I haven't used the normalized distances, I have made them available in this crate for completeness.

However I would also make the same assumptions as you, that is:

The vectors I am adding to the index are not normalized. Is this correct?

I think this is correct.

My search vector in the code above is not normalized. Is that correct?

I also thinks this is how it should be done

Is there a simple way I can reason about the quantitative value of the radius parameter for the NormalizedCosine distance type?

I am not much knowledgeable for this one neither, although I think you can find a kind of graphical explanation for the radius parameter on this blog post. For more precise information I would indeed recommend to get in touch with NGT's original author.

cjrh commented 10 months ago

Thanks!