[Bug]: When full-text search is enabled (or the schema contains the BM25 function), and dynamic fields are also enabled, inserting correct data will still result in an error.

Is there an existing issue for this?

[X] I have searched the existing issues

Environment

- Milvus version:20750c0-dev
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc96
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2024-10-18 14:24:32 - INFO - ci_test]: ################################################################################ (conftest.py:232)
[2024-10-18 14:24:32 - INFO - ci_test]: [initialize_milvus] Log cleaned up, start testing... (conftest.py:233)
[2024-10-18 14:24:32 - INFO - ci_test]: [setup_class] Start setup class... (client_base.py:40)
[2024-10-18 14:24:32 - INFO - ci_test]: *********************************** setup *********************************** (client_base.py:46)
[2024-10-18 14:24:32 - INFO - ci_test]: pymilvus version: 2.5.0rc96 (client_base.py:47)
[2024-10-18 14:24:32 - INFO - ci_test]: [setup_method] Start setup test case test_insert_for_full_text_search_enable_dynamic_field. (client_base.py:49)
-------------------------------- live log call ---------------------------------
[2024-10-18 14:24:32 - INFO - ci_test]: server version: 20750c0-dev (client_base.py:165)
[2024-10-18 14:24:34 - ERROR - pymilvus.decorators]: RPC error: [insert_rows], <ParamError: (code=The data fields number is not match with schema., message=)>, <Time:{'RPC start': '2024-10-18 14:24:33.552853', 'RPC error': '2024-10-18 14:24:34.595584'}> (decorators.py:140)
[2024-10-18 14:24:34 - ERROR - ci_test]: Traceback (most recent call last):
  File "/Users/zilliz/workspace/milvus/tests/python_client/utils/api_request.py", line 32, in inner_wrapper
    res = func(*args, **_kwargs)
  File "/Users/zilliz/workspace/milvus/tests/python_client/utils/api_request.py", line 63, in api_request
    return func(*arg, **kwargs)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/orm/collection.py", line 507, in insert
    return conn.insert_rows(
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 141, in handler
    raise e from e
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 137, in handler
    return func(*args, **kwargs)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 176, in handler
    return func(self, *args, **kwargs)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 116, in handler
    raise e from e
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/decorators.py", line 86, in handler
    return func(*args, **kwargs)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 493, in insert_rows
    request = self._prepare_row_insert_request(
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 519, in _prepare_row_insert_request
    return Prepare.row_insert_param(
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/prepare.py", line 587, in row_insert_param
    return cls._parse_row_request(request, fields_info, enable_dynamic, entities)
  File "/Users/zilliz/opt/anaconda3/envs/full_text_search/lib/python3.8/site-packages/pymilvus/client/prepare.py", line 481, in _parse_row_request
    raise ParamError(ExceptionsMessage.FieldsNumInconsistent)
pymilvus.exceptions.ParamError: <ParamError: (code=The data fields number is not match with schema., message=)>
 (api_request.py:45)
[2024-10-18 14:24:34 - ERROR - ci_test]: (api_response) : <ParamError: (code=The data fields number is not match with schema., message=)> (api_request.py:46)
FAILED
testcases/test_full_text_search.py:518 (TestInsertWithFullTextSearch.test_insert_for_full_text_search_enable_dynamic_field[default-en-False-True])
self = <test_full_text_search.TestInsertWithFullTextSearch object at 0x12e371be0>
tokenizer = 'default', text_lang = 'en', nullable = False
enable_dynamic_field = True

    @pytest.mark.tags(CaseLabel.L0)
    @pytest.mark.parametrize("enable_dynamic_field", [True])
    @pytest.mark.parametrize("nullable", [False])
    @pytest.mark.parametrize("text_lang", ["en"])
    @pytest.mark.parametrize("tokenizer", ["default"])
    def test_insert_for_full_text_search_enable_dynamic_field(self, tokenizer, text_lang, nullable, enable_dynamic_field):
        """
        target: test full text search
        method: 1. enable full text search and insert data with varchar
                2. search with text
                3. verify the result
        expected: full text search successfully and result is correct
        """
        tokenizer_params = {
            "tokenizer": tokenizer,
        }
        dim = 128
        fields = [
            FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
            FieldSchema(
                name="word",
                dtype=DataType.VARCHAR,
                max_length=65535,
                enable_tokenizer=True,
                tokenizer_params=tokenizer_params,
                is_partition_key=True,
            ),
            FieldSchema(
                name="sentence",
                dtype=DataType.VARCHAR,
                max_length=65535,
                nullable=nullable,
                enable_tokenizer=True,
                tokenizer_params=tokenizer_params,
            ),
            FieldSchema(
                name="paragraph",
                dtype=DataType.VARCHAR,
                max_length=65535,
                nullable=nullable,
                enable_tokenizer=True,
                tokenizer_params=tokenizer_params,
            ),
            FieldSchema(
                name="text",
                dtype=DataType.VARCHAR,
                max_length=65535,
                enable_tokenizer=True,
                tokenizer_params=tokenizer_params,
            ),
            FieldSchema(name="emb", dtype=DataType.FLOAT_VECTOR, dim=dim),
            FieldSchema(name="text_sparse_emb", dtype=DataType.SPARSE_FLOAT_VECTOR),
        ]
        schema = CollectionSchema(fields=fields, description="test collection", enable_dynamic_field=enable_dynamic_field)
        bm25_function = Function(
            name="text_bm25_emb",
            function_type=FunctionType.BM25,
            input_field_names=["text"],
            output_field_names=["text_sparse_emb"],
            params={},
        )
        schema.add_function(bm25_function)
        data_size = 5000
        collection_w = self.init_collection_wrap(
            name=cf.gen_unique_str(prefix), schema=schema
        )
        fake = fake_en
        if text_lang == "zh":
            fake = fake_zh
        elif text_lang == "de":
            fake = Faker("de_DE")
        elif text_lang == "hybrid":
            fake = Faker()

        if nullable:
            data = [
                {
                    "id": i,
                    "word": fake.word().lower(),
                    "sentence": fake.sentence().lower() if random.random() < 0.5 else None,
                    "paragraph": fake.paragraph().lower() if random.random() < 0.5 else None,
                    "text": fake.text().lower(),  # function input should not be None
                    "emb": [random.random() for _ in range(dim)],
                    f"dynamic_field_{i}": f"dynamic_value_{i}"
                }
                for i in range(data_size)
            ]
        else:
            data = [
                {
                    "id": i,
                    "word": fake.word().lower(),
                    "sentence": fake.sentence().lower(),
                    "paragraph": fake.paragraph().lower(),
                    "text": fake.text().lower(),
                    "emb": [random.random() for _ in range(dim)],
                    f"dynamic_field_{i}": f"dynamic_value_{i}"
                }
                for i in range(data_size)
            ]
        if text_lang == "hybrid":
            hybrid_data = []
            for i in range(data_size):
                fake = random.choice([fake_en, fake_zh, Faker("de_DE")])
                tmp = {
                    "id": i,
                    "word": fake.word().lower(),
                    "sentence": fake.sentence().lower(),
                    "paragraph": fake.paragraph().lower(),
                    "text": fake.text().lower(),
                    "emb": [random.random() for _ in range(dim)],
                    f"dynamic_field_{i}": f"dynamic_value_{i}"
                }
                hybrid_data.append(tmp)
            data = hybrid_data + data
        # df = pd.DataFrame(data)
        # log.info(f"dataframe\n{df}")
        batch_size = 5000
        for i in range(0, len(data), batch_size):
>           collection_w.insert(
                data[i: i + batch_size]
                if i + batch_size < len(data)
                else data[i: len(data)]
            )

test_full_text_search.py:638:

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

It should be a problem with pymilvus's check. The error is thrown by pymilvus, not the server.

milvus-io / milvus

[Bug]: When full-text search is enabled (or the schema contains the BM25 function), and dynamic fields are also enabled, inserting correct data will still result in an error. #36986

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?