[Bug]: When full-text search is enabled, an "Insert missed an field `text_sparse_emb` to collection without set nullable==true or set default_value" error occurs during insertion. #36860
- Milvus version:5ec4163-dev
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc95
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
text_sparse_emb is a field in function output, so it does not need with data
[2024-10-14 20:37:59 - INFO - ci_test]: [initialize_milvus] Log cleaned up, start testing... (conftest.py:233)
[2024-10-14 20:37:59 - INFO - ci_test]: [setup_class] Start setup class... (client_base.py:40)
[2024-10-14 20:37:59 - INFO - ci_test]: *********************************** setup *********************************** (client_base.py:46)
[2024-10-14 20:37:59 - INFO - ci_test]: pymilvus version: 2.5.0rc95 (client_base.py:47)
[2024-10-14 20:37:59 - INFO - ci_test]: [setup_method] Start setup test case test_full_text_search_default. (client_base.py:49)
-------------------------------- live log call ---------------------------------
[2024-10-14 20:37:59 - INFO - ci_test]: server version: 5ec4163-dev (client_base.py:165)
[2024-10-14 20:37:59 - INFO - ci_test]: dataframe
id word sentence paragraph text emb
0 0 career them at some. stage us ok office sit rate. think cold marria... election new risk along. start admit parent be... [0.38130239080312345, 0.4129195604198841, 0.94...
1 1 same whose idea expect party far. nothing water bank full close. drop strong fiv... could popular world clearly lot. method star o... [0.4998943407873816, 0.9610945003226037, 0.609...
2 2 mention try offer citizen because discuss station arti... three order rather network fund none. owner co... month something their. all side focus once onl... [0.625669641698085, 0.7938970243204132, 0.5171...
3 3 necessary discuss share month establish they account. day financial red ahead watch design. notice r... special moment fire loss best pick. mr full pl... [0.8634757822550295, 0.1427588550215242, 0.572...
4 4 that account guess live continue. worry page night design. discussion will road ... field full include five middle goal. specific ... [0.6197244834058141, 0.5557701233091245, 0.193...
... ... ... ... ... ... ...
4995 4995 mention method finish show present of money everything... none keep stage at him others herself enjoy. c... say traditional view term. per admit ability e... [0.7785692289302282, 0.7533609272719551, 0.788...
4996 4996 agreement continue probably per class. season structure pull defense concern pay figu... happen what guess and personal year three. fou... [0.1843558041570924, 0.04934656676373783, 0.26...
4997 4997 game mrs trial choice evening economy first drug. word value nation past race have happen. toget... force go along represent skin. meet threat fly... [0.31889645245729215, 0.18197038708099578, 0.9...
4998 4998 read mean image western detail also. agent night skill our boy. down real power ite... themselves writer themselves list realize appr... [0.37054949583512764, 0.3374745492036224, 0.19...
4999 4999 country type conference become career value sense scor... hundred matter tend ground anyone guy now baby... pass adult effect school while benefit east he... [0.05222852868069561, 0.6383223384677691, 0.85...
[5000 rows x 6 columns] (test_full_text_search.py:1465)
[2024-10-14 20:38:00 - INFO - ci_test]: Analyze document cost time: 0.20666980743408203 (common_func.py:340)
[2024-10-14 20:38:00 - ERROR - pymilvus.decorators]: RPC error: [insert_rows], <DataNotMatchException: (code=1, message=Insert missed an field `text_sparse_emb` to collection without set nullable==true or set default_value)>, <Time:{'RPC start': '2024-10-14 20:38:00.579745', 'RPC error': '2024-10-14 20:38:00.592640'}> (decorators.py:140)
[2024-10-14 20:38:00 - ERROR - ci_test]: Traceback (most recent call last):
File "/Users/zilliz/workspace/milvus/tests/python_client/utils/api_request.py", line 32, in inner_wrapper
res = func(*args, **_kwargs)
File "/Users/zilliz/workspace/milvus/tests/python_client/utils/api_request.py", line 63, in api_request
return func(*arg, **kwargs)
File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/orm/collection.py", line 507, in insert
return conn.insert_rows(
File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 141, in handler
raise e from e
File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 137, in handler
return func(*args, **kwargs)
File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 176, in handler
return func(self, *args, **kwargs)
File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 116, in handler
raise e from e
File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 86, in handler
return func(*args, **kwargs)
File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 493, in insert_rows
request = self._prepare_row_insert_request(
File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 519, in _prepare_row_insert_request
return Prepare.row_insert_param(
File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/prepare.py", line 583, in row_insert_param
return cls._parse_row_request(request, fields_info, enable_dynamic, entities)
File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/prepare.py", line 443, in _parse_row_request
raise DataNotMatchException(
pymilvus.exceptions.DataNotMatchException: <DataNotMatchException: (code=1, message=Insert missed an field `text_sparse_emb` to collection without set nullable==true or set default_value)>
(api_request.py:45)
[2024-10-14 20:38:00 - ERROR - ci_test]: (api_response) : <DataNotMatchException: (code=1, message=Insert missed an field `text_sparse_emb` to collection without set nullable==true or set default_value)> (api_request.py:46)
FAILED
testcases/test_full_text_search.py:1375 (TestSearchWithFullTextSearch.test_full_text_search_default[default-None-SPARSE_INVERTED_INDEX-True-True-0])
self = <test_full_text_search.TestSearchWithFullTextSearch object at 0x136912220>
tokenizer = 'default', expr = None, enable_inverted_index = True
enable_partition_key = True, empty_percent = 0
index_type = 'SPARSE_INVERTED_INDEX'
@pytest.mark.tags(CaseLabel.L0)
@pytest.mark.parametrize("empty_percent", [0])
@pytest.mark.parametrize("enable_partition_key", [True])
@pytest.mark.parametrize("enable_inverted_index", [True])
@pytest.mark.parametrize("index_type", ["SPARSE_INVERTED_INDEX"])
@pytest.mark.parametrize("expr", [None, "text_match", "id_range"])
@pytest.mark.parametrize("tokenizer", ["default"])
def test_full_text_search_default(
self, tokenizer, expr, enable_inverted_index, enable_partition_key, empty_percent, index_type
):
"""
target: test full text search
method: 1. enable full text search and insert data with varchar
2. search with text
3. verify the result
expected: full text search successfully and result is correct
"""
tokenizer_params = {
"tokenizer": tokenizer,
}
dim = 128
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
FieldSchema(
name="word",
dtype=DataType.VARCHAR,
max_length=65535,
enable_tokenizer=True,
tokenizer_params=tokenizer_params,
is_partition_key=enable_partition_key,
),
FieldSchema(
name="sentence",
dtype=DataType.VARCHAR,
max_length=65535,
enable_tokenizer=True,
tokenizer_params=tokenizer_params,
),
FieldSchema(
name="paragraph",
dtype=DataType.VARCHAR,
max_length=65535,
enable_tokenizer=True,
tokenizer_params=tokenizer_params,
),
FieldSchema(
name="text",
dtype=DataType.VARCHAR,
max_length=65535,
enable_tokenizer=True,
enable_match=True,
tokenizer_params=tokenizer_params,
),
FieldSchema(name="emb", dtype=DataType.FLOAT_VECTOR, dim=dim),
FieldSchema(name="text_sparse_emb", dtype=DataType.SPARSE_FLOAT_VECTOR),
]
schema = CollectionSchema(fields=fields, description="test collection")
bm25_function = Function(
name="text_bm25_emb",
function_type=FunctionType.BM25,
input_field_names=["text"],
output_field_names=["text_sparse_emb"],
params={},
)
schema.add_function(bm25_function)
data_size = 5000
collection_w = self.init_collection_wrap(
name=cf.gen_unique_str(prefix), schema=schema
)
fake = fake_en
if tokenizer == "jieba":
language = "zh"
fake = fake_zh
else:
language = "en"
data = [
{
"id": i,
"word": fake.word().lower() if random.random() >= empty_percent else "",
"sentence": fake.sentence().lower() if random.random() >= empty_percent else "",
"paragraph": fake.paragraph().lower() if random.random() >= empty_percent else "",
"text": fake.text().lower() if random.random() >= empty_percent else "",
"emb": [random.random() for _ in range(dim)],
}
for i in range(data_size)
]
df = pd.DataFrame(data)
corpus = df["text"].to_list()
log.info(f"dataframe\n{df}")
texts = df["text"].to_list()
word_freq = cf.analyze_documents(texts, language=language)
tokens = list(word_freq.keys())
if len(tokens) == 0:
log.info(f"empty tokens, add a dummy token")
tokens = ["dummy"]
batch_size = 5000
for i in range(0, len(df), batch_size):
> collection_w.insert(
data[i : i + batch_size]
if i + batch_size < len(df)
else data[i : len(df)]
)
test_full_text_search.py:1474:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../utils/wrapper.py:33: in inner_wrapper
res, result = func(*args, **kwargs)
../base/collection_wrapper.py:130: in insert
check_result = ResponseChecker(res, func_name, check_task, check_items, check,
../check/func_check.py:34: in run
result = self.assert_succ(self.succ, True)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <check.func_check.ResponseChecker object at 0x136912460>, actual = False
expect = True
def assert_succ(self, actual, expect):
> assert actual is expect, f"Response of API {self.func_name} expect {expect}, but got {actual}"
E AssertionError: Response of API insert expect True, but got False
../check/func_check.py:116: AssertionError
Is there an existing issue for this?
Environment
Current Behavior
text_sparse_emb is a field in function output, so it does not need with data
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response