milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.51k stars 2.83k forks source link

[Bug]: [Pagination] Search pagination with partition got inaccurate results sometimes #19367

Closed NicoYuan1986 closed 1 year ago

NicoYuan1986 commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:71d4a32
- Deployment mode(standalone or cluster):standalone
- SDK version(e.g. pymilvus v2.0.0rc2):2.2.0.dev32
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[Pagination] Search pagination with partition got inaccurate results sometimes.

E       AssertionError: assert [2719, 599, 1...6, 2837, 1054] == [2719, 599, 1...55, 2221, ...]
E         At index 319 diff: 635 != 834
E         Full diff:
E           [
E         +  2719, 599, 1241, 1342, 1255, 2221, 31, 554, 440, 1605, 1855, 2512, 2532, 2994, 330, 1768, 1699, 1427, 349, 1204, 829, 2490, 1362, 2607, 2961, 823, 2012, 185, 591, 94, 2459, 132, 2406, 2181, 1902, 1746, 2944, 1410, 618, 2213, 1820, 1609, 2923, 2681, 2272, 1527, 636, 2236, 2418, 2773, 2947, 2332, 2026, 2665, 1330, 474, 2426, 401, 2168, 537, 2845, 2314, 2864, 397, 567, 1980, 634, 1165, 1364, 2553, 2, 2465, 2126, 2183, 2129, 849, 319, 189, 72, 1576, 838, 852, 1567, 324, 1852, 656, 2604, 2428, 1395, 2218, 763, 1136, 2230, 2765, 60...
E         
E         ...Full output truncated (1002 lines hidden), use '-vv' to show

Expected Behavior

pass

Steps To Reproduce

Run:

    @pytest.mark.tags(CaseLabel.L1)
    def test_search_pagination_with_partition(self, offset, auto_id, _async):
        """
        target: test search pagination with partition
        method: create connection, collection, insert data and search
        expected: searched successfully
        """
        # 1. initialize with data
        collection_w, _, _, insert_ids = self.init_collection_general(prefix, True,
                                                                      partition_num=1,
                                                                      auto_id=auto_id)[0:4]
        vectors = [[random.random() for _ in range(default_dim)] for _ in range(default_nq)]
        collection_w.load()
        # 2. search through partitions
        par = collection_w.partitions
        limit = 1000
        search_param = {"metric_type": "L2", "params": {"nprobe": 10}, "offset": offset}
        search_res = collection_w.search(vectors[:default_nq], default_search_field,
                                         search_param, limit, default_search_exp,
                                         [par[0].name, par[1].name], _async=_async,
                                         check_task=CheckTasks.check_search_results,
                                         check_items={"nq": default_nq,
                                                      "ids": insert_ids,
                                                      "limit": limit,
                                                      "_async": _async})[0]
        # 3. search through partitions with offset+limit
        res = collection_w.search(vectors[:default_nq], default_search_field, default_search_params,
                                  limit + offset, default_search_exp,
                                  [par[0].name, par[1].name], _async=_async)[0]
        if _async:
            search_res.done()
            search_res = search_res.result()
            res.done()
            res = res.result()
        assert search_res[0].ids == res[0].ids[offset:]

Milvus Log

No response

Anything else?

No response

NicoYuan1986 commented 1 year ago

The issue is similar to #19366. And the case will fail when limit=100.

XuanYang-cn commented 1 year ago

pr #19401 will fix this too

xiaofan-luan commented 1 year ago

/assign @NicoYuan1986 could you verify on it?

NicoYuan1986 commented 1 year ago

/assign @NicoYuan1986 could you verify on it?

Yes!

XuanYang-cn commented 1 year ago

/unassign /assign @NicoYuan1986

NicoYuan1986 commented 1 year ago

Fixed.