milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.39k stars 2.82k forks source link

[Bug]: test_search_pagination_group_by failed #32428

Open longjiquan opened 4 months ago

longjiquan commented 4 months ago

Is there an existing issue for this?

Environment

- Milvus version:
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2024-04-18T07:24:08.291Z] 2024-04-18 06:32:32.838895: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

[2024-04-18T07:24:20.470Z]

[2024-04-18T07:24:20.470Z]

[2024-04-18T07:24:20.470Z] =================================== FAILURES ===================================

[2024-04-18T07:24:20.470Z] __ TestSearchGroupBy.test_search_pagination_group_by ___

[2024-04-18T07:24:20.470Z] [gw4] linux -- Python 3.8.17 /usr/local/bin/python3

[2024-04-18T07:24:20.470Z] [gw4] linux -- Python 3.8.17 /usr/local/bin/python3[gw4] linux -- Python 3.8.17 /usr/local/bin/python3

[2024-04-18T07:24:20.470Z]

[2024-04-18T07:24:20.470Z] self = <test_search.TestSearchGroupBy object at 0x7f31b9e97eb0>

[2024-04-18T07:24:20.470Z]

[2024-04-18T07:24:20.470Z] @pytest.mark.tags(CaseLabel.L1)

[2024-04-18T07:24:20.470Z] # @pytest.mark.xfail(reason="issue #30828")

[2024-04-18T07:24:20.470Z] def test_search_pagination_group_by(self):

[2024-04-18T07:24:20.470Z] """

[2024-04-18T07:24:20.470Z] target: test search pagination with group by

[2024-04-18T07:24:20.470Z] method: 1. create a collection with data

[2024-04-18T07:24:20.470Z] 2. create index HNSW

[2024-04-18T07:24:20.470Z] 3. search with groupby and pagination

[2024-04-18T07:24:20.470Z] 4. search with groupby and limits=pages*page_rounds

[2024-04-18T07:24:20.470Z] verify: search with groupby and pagination returns correct results

[2024-04-18T07:24:20.470Z] """

[2024-04-18T07:24:20.470Z] # 1. create a collection

[2024-04-18T07:24:20.470Z] metric = "COSINE"

[2024-04-18T07:24:20.470Z] collection_w = self.init_collection_general(prefix, auto_id=True, insert_data=False, is_index=False,

[2024-04-18T07:24:20.470Z] is_all_data_type=True, with_json=False)[0]

[2024-04-18T07:24:20.470Z] # insert with the same values for scalar fields

[2024-04-18T07:24:20.470Z] for _ in range(50):

[2024-04-18T07:24:20.470Z] data = cf.gen_dataframe_all_data_type(nb=100, auto_id=True, with_json=False)

[2024-04-18T07:24:20.470Z] collection_w.insert(data)

[2024-04-18T07:24:20.470Z]

[2024-04-18T07:24:20.470Z] collection_w.flush()

[2024-04-18T07:24:20.470Z] _index = {"index_type": "HNSW", "metric_type": metric, "params": {"M": 16, "efConstruction": 128}}

[2024-04-18T07:24:20.470Z] collection_w.create_index(ct.default_float_vec_field_name, index_params=_index)

[2024-04-18T07:24:20.470Z] collection_w.load()

[2024-04-18T07:24:20.470Z] # 2. search pagination with offset

[2024-04-18T07:24:20.470Z] limit = 10

[2024-04-18T07:24:20.470Z] page_rounds = 3

[2024-04-18T07:24:20.470Z] search_param = {"metric_type": metric}

[2024-04-18T07:24:20.470Z] grpby_field = ct.default_string_field_name

[2024-04-18T07:24:20.470Z] search_vectors = cf.gen_vectors(1, dim=ct.default_dim)

[2024-04-18T07:24:20.470Z] all_pages_ids = []

[2024-04-18T07:24:20.470Z] all_pages_grpby_field_values = []

[2024-04-18T07:24:20.470Z] for r in range(page_rounds):

[2024-04-18T07:24:20.470Z] page_res = collection_w.search(search_vectors, anns_field=default_search_field,

[2024-04-18T07:24:20.470Z] param=search_param, limit=limit, offset=limit * r,

[2024-04-18T07:24:20.470Z] expr=default_search_exp, group_by_field=grpby_field,

[2024-04-18T07:24:20.470Z] output_fields=["*"],

[2024-04-18T07:24:20.470Z] check_task=CheckTasks.check_search_results,

[2024-04-18T07:24:20.470Z] check_items={"nq": 1, "limit": limit},

[2024-04-18T07:24:20.470Z] )[0]

[2024-04-18T07:24:20.471Z] for j in range(limit):

[2024-04-18T07:24:20.471Z] all_pages_grpby_field_values.append(page_res[0][j].get(grpby_field))

[2024-04-18T07:24:20.471Z] all_pages_ids += page_res[0].ids

[2024-04-18T07:24:20.471Z] hit_rate = round(len(set(all_pages_grpby_field_values)) / len(all_pages_grpby_field_values), 3)

[2024-04-18T07:24:20.471Z] > assert hit_rate > 0.8

[2024-04-18T07:24:20.471Z] E assert 0.767 > 0.8

[2024-04-18T07:24:20.471Z]

[2024-04-18T07:24:20.471Z] testcases/test_search.py:10328: AssertionError

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

longjiquan commented 4 months ago

https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20HA%20CI/detail/PR-32346/5/pipeline/

yanliang567 commented 4 months ago

I think @MrPresent-Han is working on the improvements