milvus-io / milvus-sdk-java

Java SDK for Milvus.
https://milvus.io
Apache License 2.0
380 stars 153 forks source link

Iterators and range_filter error (when no range filter is specified) #881

Open PiercarloSlavazza opened 4 months ago

PiercarloSlavazza commented 4 months ago

I started to use the search iterators.

I am using:

I create the SearchIteratorParam in the following way:

SearchIteratorParam.newBuilder()
                .withCollectionName(metadata.getCollectionName())
                .withMetricType(MetricType.IP)
                .withOutFields(Arrays.asList(
                        METADATA_FIELD_ID_COURT.getKeyId(),
                        METADATA_FIELD_ID_PDD_SETTING_JUDGEMENTS_YEARS.getKeyId(),
                        METADATA_FIELD_ID_SENTENCE_UUID.getKeyId(),
                        METADATA_FIELD_ID_TAGS.getKeyId(),
                        METADATA_FIELD_ID_TEXT_TYPE.getKeyId(),
                        METADATA_FIELD_ID_TEXT_UUID_PK.getKeyId(),
                        METADATA_FIELD_ID_YEAR.getKeyId()
                ))
                .withFloatVectors(singletonList(embeddings.getVector().stream().map(Double::floatValue).toList()))
                .withBatchSize(500L)
                .withVectorFieldName(METADATA_FIELD_ID_EMBEDDINGS.getKeyId())

and afterwards I experience the following weird behaviour:

This is very strange because:

  1. I used IP
  2. I do not set any radius/rangefilter parameter - so I suppose that they come from some weird defaults
  3. even if I switch on machine n. 2 to Milvus 2.3.3, I still get the error (maybe defaults are written somewhere in the index/database?)

Moreover, if I try to set radius/rangefilter so that the defaults do not kick in, such as in

                .withParams("""
                        {
                            "radius": 1.0,
                            "range_filter" : 1.0
                        }""")

I have only found this potentially related issue: https://github.com/milvus-io/milvus/issues/29305

Any hint?

Thank you.

PiercarloSlavazza commented 4 months ago

Tried Milvus 2.4.0 as well - still getting the error.

PiercarloSlavazza commented 4 months ago

Just in case someone will experience this very same issue, I managed to solve it by:

This wasy the iterators are working fine.

I assume that sdk 2.4.0 iterators have been tested with Milvus 2.4.0 - so I think that it is something on my side, but I wasn't able to track it down (apart going back to Milvus 2.3.3)…

yhmo commented 4 months ago

A valid range search param for L2 metric should be:

String params = "{\"radius\": 1.0, \"range_filter\": 0.5}";
SearchIteratorParam.Builder searchIteratorParamBuilder = SearchIteratorParam.newBuilder()
                ......
                .withParams(params)
                .withMetricType(MetricType.L2);

range_search drawio

PiercarloSlavazza commented 4 months ago

Thanks for your clarification, but my search does not use L2 - it actually uses .withMetricType(MetricType.IP) (see 1st comment).

yhmo commented 4 months ago

But you set "radius"=1.0 and "range_filter"=1.0, the two parameters cannot be equal, neither with IP nor L2.

With IP metric type, the value of radius should be less than range_filter.

String params = "{\"radius\": 0.5, \"range_filter\": 0.8}";
SearchIteratorParam.Builder searchIteratorParamBuilder = SearchIteratorParam.newBuilder()
                ......
                .withParams(params)
                .withMetricType(MetricType.IP);
PiercarloSlavazza commented 4 months ago

I see - thank you for pointing it.

However:

xiaofan-luan commented 4 months ago

I thought it is because you don't specify metrics when you create index. and by default it is L2. ignore the metrics when search, it only helps on verify the build index metrics

PiercarloSlavazza commented 4 months ago

Guys, I finally found the bugs.

There are - according to my investigations - two (unrelated) bugs:

Bug n.1

In statement that starts at this line: https://github.com/milvus-io/milvus-sdk-java/blob/bf981d037c487aadc35b2adeb072485331d63dd4/src/main/java/io/milvus/orm/iterator/SearchIterator.java#L221 the metric type is NOT initialized using the searchIteratorParam.getMetricType().

This affects ONLY the iterator.next calls subsequent to the first one: the SDK computes correctly the new params, BUT, given that the metric type is not re-initialized AND I am using IP which is NOT the default, the server thinks that the intended metric is L2 (the default), and therefore it complains that the range and radius paramters are inverted - which indeed are NOT, but only if the correct metric type is reported to the server.

Bug n. 2

This bug is actually unrelated to bug n.1 - it just happened to occur while I was trying hard to find a workaround to bug n. 1 (well, before I understood the very nature of that first issue, of course).

In this line: https://github.com/milvus-io/milvus-sdk-java/blob/bf981d037c487aadc35b2adeb072485331d63dd4/src/main/java/io/milvus/orm/iterator/SearchIterator.java#L143 the result of params.get(RADIUS) is cast to a Float - but this leads to a casting error because params.get(RADIUS) is read as a Double, which of course cannot be just cast to a Float. BTW the parameter is read as a Double because this is parsed via JacksonUtils.fromJson(searchIteratorParam.getParams(), new TypeReference<Map<String, Object>>(){}), and Jackson wil always read a real number as a Double in this case, and this behaviour is not - to my knowledge - configurable.

So, if you confirm my findings, I will report the Bug. n. 2 in another issue.

I would like to highlight that bug n. 1 is blocking for whoever wants to use iterators with IP metric in Java.

xiaofan-luan commented 4 months ago

Guys, I finally found the bugs.

There are - according to my investigations - two (unrelated) bugs:

Bug n.1

In statement that starts at this line:

https://github.com/milvus-io/milvus-sdk-java/blob/bf981d037c487aadc35b2adeb072485331d63dd4/src/main/java/io/milvus/orm/iterator/SearchIterator.java#L221

the metric type is NOT initialized using the searchIteratorParam.getMetricType(). This affects ONLY the iterator.next calls subsequent to the first one: the SDK computes correctly the new params, BUT, given that the metric type is not re-initialized AND I am using IP which is NOT the default, the server thinks that the intended metric is L2 (the default), and therefore it complains that the range and radius paramters are inverted - which indeed are NOT, but only if the correct metric type is reported to the server.

Bug n. 2

This bug is actually unrelated to bug n.1 - it just happened to occur while I was trying hard to find a workaround to bug n. 1 (well, before I understood the very nature of that first issue, of course).

In this line:

https://github.com/milvus-io/milvus-sdk-java/blob/bf981d037c487aadc35b2adeb072485331d63dd4/src/main/java/io/milvus/orm/iterator/SearchIterator.java#L143

the result of params.get(RADIUS) is cast to a Float - but this leads to a casting error because params.get(RADIUS) is read as a Double, which of course cannot be just cast to a Float. BTW the parameter is read as a Double because this is parsed via JacksonUtils.fromJson(searchIteratorParam.getParams(), new TypeReference<Map<String, Object>>(){}), and Jackson wil always read a real number as a Double in this case, and this behaviour is not - to my knowledge - configurable. So, if you confirm my findings, I will report the Bug. n. 2 in another issue.

I would like to highlight that bug n. 1 is blocking for whoever wants to use iterators with IP metric in Java.

Thanks for reporting the bugs. We will fix bug 1 as soon as possible and maybe you want to take bug 2? Questions: what's the reason of cast could causing a problem?

PiercarloSlavazza commented 4 months ago

Hi @xiaofan-luan ,

We will fix bug 1 as soon as possible

Actually I already attached a PR fixing bug n. 1 - the fix was trivial, but I am a bit in the rush and wasn't even able to run the unit tests, let alone adding a new one… so I would like to ask if someone could please add the related unit test, thank you.

and maybe you want to take bug 2?

I will do the fix - ditto for the unit tests.

what's the reason of cast could causing a problem?

I am not sure I am getting your point… reason of what, exactly? The cast from Double to Float is forbidden by the VM, if done as it has been done in the code base (the root of all evil is of course in first place having used a Map of Objects… imho choices like this nullify the whole point of having a type system, and in the end lead to "undetectable" bugs like this one) - the way to go is Double.floatValue(), so in this case ((Double) params.get(RADIUS)).floatValue().

xiaofan-luan commented 4 months ago

pr looks good to me, please fix the DCO so I can merge the PR, thanks for the contribution.

yhmo commented 4 months ago

I just tried the IteratorExample.java, I got the exception about the cast:

Exception in thread "main" java.lang.RuntimeException: class java.lang.Double cannot be cast to class java.lang.Float (java.lang.Double and java.lang.Float are in module java.base of loader 'bootstrap')
    at io.milvus.CommonUtils.handleResponseStatus(CommonUtils.java:34)
    at io.milvus.IteratorExample.getSearchIterator(IteratorExample.java:308)

I agree the executeNextSearch() should pass the metricType to SearchParam, it is a bug. Wait @lentitude2tk to condirm.

xiaofan-luan commented 4 months ago

Hi @xiaofan-luan ,

We will fix bug 1 as soon as possible

Actually I already attached a PR fixing bug n. 1 - the fix was trivial, but I am a bit in the rush and wasn't even able to run the unit tests, let alone adding a new one… so I would like to ask if someone could please add the related unit test, thank you.

and maybe you want to take bug 2?

I will do the fix - ditto for the unit tests.

what's the reason of cast could causing a problem?

I am not sure I am getting your point… reason of what, exactly? The cast from Double to Float is forbidden by the VM, if done as it has been done in the code base (the root of all evil is of course in first place having used a Map of Objects… imho choices like this nullify the whole point of having a type system, and in the end lead to "undetectable" bugs like this one) - the way to go is Double.floatValue(), so in this case ((Double) params.get(RADIUS)).floatValue().

Thanks for the clarification . I think the reason is not cast double to int, but to convert a object to primitive directly. make sense to me

PiercarloSlavazza commented 4 months ago

I think the reason is not cast double to int, but to convert a object to primitive directly. make sense to me

You are wrong - look at this code from JShell - no primitives here, and the error message is very clear:

jshell> (Float) new Double(0.1)
|  Warning:
|  Double(double) in java.lang.Double has been deprecated and marked for removal
|  (Float) new Double(0.1)
|          ^-------------^
|  Error:
|  incompatible types: java.lang.Double cannot be converted to java.lang.Float
|  (Float) new Double(0.1)
|          ^-------------^
yhmo commented 4 months ago

@PiercarloSlavazza Just confirm, the metricType for search() has been deprecated from v2.3 on the server side. Metric type is determined by the CreateIndexParam when you call createIndex(). So, we can pass None to SearchParam, just let the server get the metric type from index.

So, if you want to use MetricType.IP, you can specify it to the index:

R<RpcStatus> response = milvusClient.createIndex(CreateIndexParam.newBuilder()
                .withCollectionName(COLLECTION_NAME)
                .withFieldName(VECTOR_FIELD)
                .withIndexName(INDEX_NAME)
                .withIndexType(INDEX_TYPE)
                .withMetricType(MetricType.IP)
                .withExtraParam(INDEX_PARAM)
                .withSyncMode(Boolean.TRUE)
                .build());

And we don't need to change the code of SearchIterator.

PiercarloSlavazza commented 4 months ago

Hi,

yhmo commented 4 months ago

@PiercarloSlavazza The pr has been merged. Could you cherry-pick it to the 2.3 branch and master branch?

PiercarloSlavazza commented 4 months ago

@yhmo

yhmo commented 4 months ago

The code logic is a bit different between 2.3 and 2.4 and cause the conflict. I have merged the pr to 2.3 branch.