[Bug]: search results is less than topk even when using nprobe=nlist

yanliang567 commented 6 months ago

Is there an existing issue for this?

[X] I have searched the existing issues

Environment

- Milvus version: master-20240425-f06509bf-amd64
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):

Current Behavior

search results is less than topk if topk is 1000

Expected Behavior

search results equals topk

Steps To Reproduce

1. create a collection
2. insert 10000 entities and flush
3. build index(IVF_FLAT, IVF_SQ8, HNSW)
4. load
5. search topk=1000 with high params(nprobe=nlist if IVF* index)

Milvus Log

reproduce code

# create
        name = cf.gen_unique_str(prefix)
        t0 = time.time()
        collection_w = self.init_collection_wrap(name=name, active_trace=True)
        tt = time.time() - t0
        assert collection_w.name == name

        # insert
        for _ in range(5):
            data = cf.gen_default_list_data()
            t0 = time.time()
            _, res = collection_w.insert(data)
            tt = time.time() - t0
            log.info(f"assert insert: {tt}")
            assert res

        # flush
        t0 = time.time()
        _, check_result = collection_w.flush(timeout=180)
        assert check_result
        assert collection_w.num_entities == len(data[0]) * 5
        tt = time.time() - t0
        entities = collection_w.num_entities
        log.info(f"assert flush: {tt}, entities: {entities}")

        # index
        # index_params = {"index_type": "HNSW", "params": {"M":32, "efConstruction": 360}, "metric_type": "L2"}
        index_params = {"index_type": "IVF_FLAT", "params": {"nlist": 64}, "metric_type": "L2"}
        t0 = time.time()
        index, _ = collection_w.create_index(field_name=ct.default_float_vec_field_name,
                                             index_params=index_params,
                                             index_name=cf.gen_unique_str())
        index, _ = collection_w.create_index(field_name=ct.default_string_field_name,
                                             index_params={},
                                             index_name=cf.gen_unique_str())
        tt = time.time() - t0
        log.info(f"assert index: {tt}")
        assert len(collection_w.indexes) == 2

        entities = collection_w.num_entities
        log.info(f"assert create collection: {tt}, init_entities: {entities}")

        # load
        collection_w.load()

        # search
        search_vectors = cf.gen_vectors(1, ct.default_dim)
        # search_params = {"metric_type": "L2", "params": {"ef": 2000}}
        search_params = {"metric_type": "L2", "params": {"nprobe": 64}}
        t0 = time.time()
        res_1, _ = collection_w.search(data=search_vectors,
                                       anns_field=ct.default_float_vec_field_name,
                                       param=search_params, limit=1000)
        tt = time.time() - t0
        log.info(f"assert search: {tt}")
        assert len(res_1[0]) == 1000

Anything else?

No response

yanliang567 commented 6 months ago

/assign @liliu-z /unassign

yanliang567 commented 6 months ago

if reproduces even with FLAT index :(

xiaofan-luan commented 6 months ago

this could be related to interim index?

for some of the index type, we should not use interim index.
the interim index has fixed nprobe, but seems need to be changed by some configs?
Maybe rethink the implementation of interim index?

liliu-z commented 6 months ago

this could be related to interim index?

for some of the index type, we should not use interim index.

the interim index has fixed nprobe, but seems need to be changed by some configs?

Maybe rethink the implementation of interim index?

Interim index is disabled for FLAT
Interim index's param is tunable through milvus.yaml
Yes, we are on it.

Will take a look at this issue

liliu-z commented 6 months ago

/assign @zhengbuqian