opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
149 stars 112 forks source link

Add a reference in the comment to confirm that the calculation logic is accurate #1789

Closed heemin32 closed 4 weeks ago

heemin32 commented 2 months ago

I think the logic here is not correct. For example, if the java round up the memory usage by 8 bytes, at least the logic should be

int vectorSize = vectorLength * FLOAT_BYTE_SIZE;
            if (vectorSize % JAVA_ROUNDING_NUMBER != 0) {
                vectorSize += (JAVA_ROUNDING_NUMBER - vectorSize % JAVA_ROUNDING_NUMBER);
            }
            int vectorsSize = numVectors * (vectorSize + JAVA_REFERENCE_SIZE);
            if (vectorsSize % JAVA_ROUNDING_NUMBER != 0) {
                vectorsSize += (JAVA_ROUNDING_NUMBER - vectorsSize % JAVA_ROUNDING_NUMBER);
            }
            return vectorsSize;

https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/codec/util/KNNCodecUtil.java#L94

We need a comment and reference to document to show that the logic is accurate

jmazanec15 commented 1 month ago

@heemin32 what is potential impact of this bug? Does this need prioritization for 2.16?

heemin32 commented 1 month ago

Not super critical. The current_size_in_bytes and total_size_in_bytes in merge stats will be incorrect.

"graph_stats": {
    "merge": {
        "current": 0,
        "current_docs": 0,
        "current_size_in_bytes": 0,
        "total_size_in_bytes": 0,
        "total": 0,
        "total_time_in_millis": 0,
        "total_docs": 0
    },
    "refresh": {
        "total": 0,
        "total_time_in_millis": 0
    }
ryanbogan commented 1 month ago

Forgot to comment on this last week. I'll pick this up later this week, should be a minor change.