milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
28.06k stars 2.71k forks source link

[Bug]: query should return the latest inserted entity if there are dupicate primary keys #33883

Open yanliang567 opened 2 weeks ago

yanliang567 commented 2 weeks ago

Is there an existing issue for this?

Environment

- Milvus version: standalone
- Deployment mode(standalone or cluster): 2.4.4
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.4.3

Current Behavior

query returns the oldest entity if there are duplicate primary keys

Expected Behavior

returns the latest entity

Steps To Reproduce

1. create a collection
2. insert 2000 entities with dup_id=0
3. query with expr='id==0'

run the test below

Milvus Log

No response

Anything else?

@pytest.mark.tags(CaseLabel.L1)
    def test_query_to_get_latest_entity_with_dup_ids(self):
        """
        target: test query to get latest entity with duplicate primary keys
        method: 1.create collection and insert dup primary key = 0
                2.query with expr=dup_id
        expected: return the latest entity
        """
        collection_w = self.init_collection_wrap(name=cf.gen_unique_str(prefix))
        nb = 200
        rounds = 10
        for i in range(rounds):
            df = cf.gen_default_dataframe_data(nb=nb, start=i * nb)
            df[ct.default_int64_field_name] = 0
            collection_w.insert(df)
        collection_w.create_index(ct.default_float_vec_field_name, index_params=ct.default_index)
        collection_w.load()
        expr = f'{ct.default_int64_field_name} == 0'
        res = collection_w.query(expr=expr, output_fields=[ct.default_int64_field_name, ct.default_float_field_name])[0]
        assert len(res) == 1 and res[0][ct.default_float_field_name] == (rounds * nb - 1) * 1.0
yanliang567 commented 2 weeks ago

/assign @tedxu /unassign

smellthemoon commented 2 weeks ago

I will take a look. /assign