Closed yanliang567 closed 6 months ago
/assign @liliu-z /unassign
if reproduces even with FLAT index :(
this could be related to interim index?
this could be related to interim index?
- for some of the index type, we should not use interim index.
- the interim index has fixed nprobe, but seems need to be changed by some configs?
- Maybe rethink the implementation of interim index?
Will take a look at this issue
/assign @zhengbuqian
data were generated by 5 calls of cf.gen_default_list_data()
, this method generates identical int/float field data thus the pk fields of all 5 inserts were identical: from 0 to 1999.
modifying the insertion part of the script as follows can help the test case to pass:
ttl = 10000
batch = 200
for x in range(ttl // batch):
data = cf.gen_default_list_data(nb=batch, start = batch * x)
t0 = time.time()
_, res = collection_w.insert(data)
tt = time.time() - t0
log.info(f"assert insert: {tt}")
assert res
with that being said, we still should have sufficient rows for a top-1000 query, still looking.
in the original script, we inserted all 10k rows in 5 batches, and then called flush
, all 10k rows resulted in the same sealed segment. Only the 2k rows in the last batch were valid, the previous 8k rows were invalid due to duplicate primary keys.
I checked at search time, the only knowhere index of the only sealed segment contains 10k rows, but the bitset passed in did not filter out a single row.
So knowhere returned top-1k of all 10k rows, then milvus removed some of the 1k results since they are in the first 8k.
Milvus failed to properly set up the bitset before sending to knowhere.
The other issue is: insert
should fail on duplicate primary key, overwrite should happen only for upsert
.
/assign @yanliang567
/unassign
chatted with @liliu-z, this is the current expected behavior to have less than k results in the occurrence of duplicate pk
Looks like the root cause is duplicated pk. For this, we have two ways to treat it:
/unassign
Mmm...let me check my script today.
my mistakes in the test script.:(
Is there an existing issue for this?
Environment
Current Behavior
search results is less than topk if topk is 1000
Expected Behavior
search results equals topk
Steps To Reproduce
Milvus Log
reproduce code
Anything else?
No response