milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.36k stars 2.91k forks source link

[Bug]: [bulk load] Search with exception when bulk load just completed #16607

Closed yanliang567 closed 2 years ago

yanliang567 commented 2 years ago

Is there an existing issue for this?

Environment

- Milvus version: master-20220421-a6a3b69d
- Deployment mode(standalone or cluster): standalone
- SDK version(e.g. pymilvus v2.0.0rc2):  2.1.0.dev29

Current Behavior

[2022-04-23 15:51:21 - DEBUG - ci_test]: (api_request)  : [get_bulk_load_state] args: [11, 30.0, 'default'], kwargs: {} (api_request.py:55)
[2022-04-23 15:51:21 - DEBUG - ci_test]: (api_response) :  Bulk load state:
- taskID    : 11,
- state     : BulkLoadPersisted,
- row_count : 10000,
- infos     : {'files': 'col_cust_float_vectors_int_scalar_4d_10000.json', 'failed_reason': ''},
  (api_request.py:27)
[2022-04-23 15:51:21 - INFO - ci_test]: bulk load state:True in 1.0987582206726074 (test_import.py:167)
[2022-04-23 15:51:21 - DEBUG - ci_test]: (api_request)  : [Collection.search] args: [[[0.5426245149717431, 0.3201991519554591, 0.4152912178133261, 0.6561740190268204]], 'vectors', {'metric_type': 'L2', 'params': {'nprobe': 16}}, 1, None, None, None, 20, -1], kwargs: {} (api_request.py:55)
[2022-04-23 15:51:21 - ERROR - pymilvus.decorators]: RPC error: [_execute_search_requests], <MilvusException: (code=-1, message=Unsupported ids type)>, <Time:{'RPC start': '2022-04-23 15:51:21.459287', 'RPC error': '2022-04-23 15:51:21.756022'}> (decorators.py:73)
[2022-04-23 15:51:21 - ERROR - pymilvus.decorators]: RPC error: [search], <MilvusException: (code=-1, message=Unsupported ids type)>, <Time:{'RPC start': '2022-04-23 15:51:21.416789', 'RPC error': '2022-04-23 15:51:21.756569'}> (decorators.py:73)

Expected Behavior

Search successfully as bulk load completed.

Notes: if importing entities =100, it does not reproduce. So I guess the query node is still in loading process on the backend, while the error msg is quite confusing.

Steps To Reproduce

1. create collection 
2. create index and load collection
3. bulk load data (10000 entities)
4. search immediately after bulk load completed

Anything else?

No response

yanliang567 commented 2 years ago

/assign @yhmo /unassign

yanliang567 commented 2 years ago

check the get_query_segment_info immediately after bulk load completed, it returns []. While it returns the correct loading state if add 10s sleep.

soothing-rain commented 2 years ago

Makes sense. BulkLoadPersisted only indicates things are done on datanode/datacoord side, but there might be a gap between BulkLoadPersisted and "bulk load data is searchable". I will take a look at this.

yanliang567 commented 2 years ago

but the question comes to be: how can an user know the data is searchable? (and "index_built_completed" as well)

yanliang567 commented 2 years ago

now if the bulk load task is in persisted state, it means the data was imported completely. But do they ready for search, it is still a black box to users.

xiaofan-luan commented 2 years ago

@soothing-rain is this still a valid issue?

yanliang567 commented 2 years ago

@xiaofan-luan yes, as data_queryable is not working well for now, so users are not able to know when the imported data is ready for searching.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

xiaofan-luan commented 2 years ago

keep it. Should bulkload respect consistency?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.