unum-cloud / usearch

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
https://unum-cloud.github.io/usearch/
Apache License 2.0
1.93k stars 109 forks source link

Removing items from the index leaves the index with an inconsistent size...and therefore state #355

Open spicymatt opened 4 months ago

spicymatt commented 4 months ago

Describe the bug

The following code (Swift) adds 500 items to the index, and then removes them. The count of items in the index after removal is 448 (consistently ... when we expect 0) Search for items does not seem to return any of the original items, but the size of the index is inconsistent.

If we try to add the items back by reserving the expected space (500), it crashes when we add() because of insufficient capacity. If we reserve more space (448 + 500) and add the 500, it works, and the final count is 500, as expected.

It seems that after a removal, the index has some nodes that are not properly removed.

We tested the compact() method after removal and it does not help...still 448 items in the index

Steps to reproduce

Here is the code:

func testSimpleRemove() async throws {
    // build the vectors
    let numberOfVectors:UInt32 = 500
    let vectorLength:UInt32 = 512
    var idList:[UInt64] = []
    let numberOfInsersions = 20

    var signatures: [USearchSignature] = []
    for index in 0..<numberOfVectors {
        var vector: [Float] = []
        for i in 0..<vectorLength {
            let randomValue = Float(index)
            vector.append(randomValue * Float(i))
        }
        signatures.append(USearchEntry(key: UInt64(index), values: vector))
    }
    let index = USearchIndex.make(metric: USearchMetric.l2sq, dimensions: vectorLength, connectivity: 8, quantization: USearchScalar.F32)
    var clusterSize = index.count
    XCTAssertEqual(clusterSize, 0)

    index.reserve(UInt32(signatures.count))
    clusterSize = index.count
    XCTAssertEqual(clusterSize, 0, "still empty")

    for signature in signatures {
        index.add(key: signature.us_id, vector: signature.us_vector)
    }
    clusterSize = index.count
    XCTAssertEqual(clusterSize, 500)
    var results = index.search(vector: signatures[10].us_vector as [Float], count: 10)
    XCTAssertEqual(results.0.first, 10)
    XCTAssertEqual(results.1.first, 0.0)
    print("Find :\(results)")

    for signature in signatures {
        index.remove(key: signature.us_id)
    }
    XCTAssertEqual(index.contains(key: 0), false)

    index.compact()
    clusterSize = index.count
    // ⚠️ this musn't fail -> it returns clusterSize = 448
    XCTAssertEqual(clusterSize, 0)

    // no item from initial set should be found
    for signature in signatures {
        let resultsAfterRemoval = index.search(vector: signature.us_vector as [Float], count: 10)
        XCTAssertTrue(resultsAfterRemoval.0.first ?? 0 > signatures.count,"no item from initial set should be found")
    }

    // add the items back --> Crash as the capacity is not sufficient
    index.reserve(UInt32(signatures.count + index.count) ) // we reserve the expected size
    clusterSize = index.count
    // ⚠️ this musn't fail -> it returns clusterSize = 448
    XCTAssertEqual(clusterSize, 0, "should be empty")

    for signature in signatures {
        index.add(key: signature.us_id, vector: signature.us_vector)
    }
    clusterSize = index.count
    XCTAssertEqual(clusterSize, 500)

    results = index.search(vector: signatures[10].us_vector as [Float], count: 10)
    XCTAssertEqual(results.0.first, 10)
    XCTAssertEqual(results.1.first, 0.0)
}

Expected behavior

I would expect the index.count to be 0 after a removal of everything.

USearch version

2.9

Operating System

macOS Sonoma

Hardware architecture

Arm

Which interface are you using?

Other bindings

Contact Details

Matthieu.kopp@gmail.com

Is there an existing issue for this?

Code of Conduct

spicymatt commented 3 months ago

hi, is there any progress on that task ? It seems that if you remove all vectors from the cluster, and then save the index, there is a crash. It is a side effect of this bug. Looks pretty important as an index with removed items is inconsistent

regards

Screenshot 2024-03-12 at 23 56 54
ashvardanian commented 3 months ago

This snippet looks interesting. I'll try to reproduce it in the C layer and debug there. Thanks for the patience 🤗