milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.04k stars 2.88k forks source link

[Bug]: The full-text search function allows inserting empty strings, but it does not support using empty strings as search data during search operations. #37022

Open zhuwenxing opened 2 hours ago

zhuwenxing commented 2 hours ago

Is there an existing issue for this?

Environment

- Milvus version:346510e-dev
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc101
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

The current error is "fail to search on QueryNode 1: can't build BM25 IDF for data not varchar". However, in my personal opinion, for searches with a token count of 0 after tokenization, it's better to directly return an empty result.

[2024-10-21 16:09:01 - INFO - ci_test]: dataframe
        id       word                                           sentence                                          paragraph                                               text                                                emb
0        0     career                                      them at some.  stage us ok office sit rate. think cold marria...  election new risk along. start admit parent be...  [0.4934216178138785, 0.39034774645470205, 0.40...
1        1       same                       whose idea expect party far.  nothing water bank full close. drop strong fiv...  could popular world clearly lot. method star o...  [0.16044665878800168, 0.7060933017329347, 0.95...
2        2    mention  try offer citizen because discuss station arti...  three order rather network fund none. owner co...  month something their. all side focus once onl...  [0.3457558758390712, 0.236007238316436, 0.5572...
3        3  necessary        discuss share month establish they account.  day financial red ahead watch design. notice r...  special moment fire loss best pick. mr full pl...  [0.9124762384965902, 0.9715284102187233, 0.337...
4        4       that                       account guess live continue.  worry page night design. discussion will road ...  field full include five middle goal. specific ...  [0.18660411943766408, 0.6210301154146072, 0.01...
...    ...        ...                                                ...                                                ...                                                ...                                                ...
4995  4995    mention  method finish show present of money everything...  none keep stage at him others herself enjoy. c...  say traditional view term. per admit ability e...  [0.2784927986209893, 0.6326771858929588, 0.457...
4996  4996  agreement                       continue probably per class.  season structure pull defense concern pay figu...  happen what guess and personal year three. fou...  [0.8610058541963198, 0.3990729934883801, 0.009...
4997  4997       game       mrs trial choice evening economy first drug.  word value nation past race have happen. toget...  force go along represent skin. meet threat fly...  [0.41181370519675986, 0.45152238537780975, 0.8...
4998  4998       read                    mean image western detail also.  agent night skill our boy. down real power ite...  themselves writer themselves list realize appr...  [0.43636499668438733, 0.5333805669754456, 0.79...
4999  4999    country  type conference become career value sense scor...  hundred matter tend ground anyone guy now baby...  pass adult effect school while benefit east he...  [0.8620305090004367, 0.7880380167528181, 0.926...

[5000 rows x 6 columns] (test_full_text_search.py:2874)
[2024-10-21 16:09:02 - INFO - ci_test]: Analyze document cost time: 0.0665740966796875 (common_func.py:169)
[2024-10-21 16:09:28 - INFO - ci_test]: search data: ['', ''] (test_full_text_search.py:2915)
[2024-10-21 16:09:29 - ERROR - pymilvus.decorators]: RPC error: [search], <MilvusException: (code=65535, message=fail to search on QueryNode 1: can't build BM25 IDF for data not varchar)>, <Time:{'RPC start': '2024-10-21 16:09:28.160243', 'RPC error': '2024-10-21 16:09:29.053421'}> (decorators.py:140)
[2024-10-21 16:09:29 - ERROR - ci_test]: Traceback (most recent call last):
  File "/Users/zilliz/workspace/milvus/tests/python_client/utils/api_request.py", line 32, in inner_wrapper
    res = func(*args, **_kwargs)
  File "/Users/zilliz/workspace/milvus/tests/python_client/utils/api_request.py", line 63, in api_request
    return func(*arg, **kwargs)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/orm/collection.py", line 801, in search
    resp = conn.search(
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 141, in handler
    raise e from e
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 137, in handler
    return func(*args, **kwargs)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 176, in handler
    return func(self, *args, **kwargs)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 116, in handler
    raise e from e
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 86, in handler
    return func(*args, **kwargs)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 805, in search
    return self._execute_search(request, timeout, round_decimal=round_decimal, **kwargs)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 746, in _execute_search
    raise e from e
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 735, in _execute_search
    check_status(response.status)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/utils.py", line 63, in check_status
    raise MilvusException(status.code, status.reason, status.error_code)
pymilvus.exceptions.MilvusException: <MilvusException: (code=65535, message=fail to search on QueryNode 1: can't build BM25 IDF for data not varchar)>
 (api_request.py:45)

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

zhuwenxing commented 2 hours ago

/assign @zhengbuqian