Open zhuwenxing opened 2 weeks ago
- Milvus version:zhengbuqian-doc-in-restful-d174d05-20241010 - Deployment mode(standalone or cluster): - MQ type(rocksmq, pulsar or kafka): - SDK version(e.g. pymilvus v2.0.0rc2): - OS(Ubuntu or CentOS): - CPU/Memory: - GPU: - Others:
-------------------------------- live log call --------------------------------- [2024-10-10 19:28:13 - DEBUG - ci_test]: (api_request) : [Connections.has_connection] args: ['default'], kwargs: {} (api_request.py:62) [2024-10-10 19:28:13 - DEBUG - ci_test]: (api_response) : False (api_request.py:37) [2024-10-10 19:28:13 - DEBUG - ci_test]: (api_request) : [Connections.connect] args: ['default', '', '', 'default', ''], kwargs: {'host': '10.104.4.62', 'port': 19530} (api_request.py:62) [2024-10-10 19:28:13 - DEBUG - ci_test]: (api_response) : None (api_request.py:37) [2024-10-10 19:28:13 - DEBUG - ci_test]: (api_request) : [Collection] args: ['full_text_search_collection_smwEFsbH', {'auto_id': False, 'description': 'test collection', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'word', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': ......, kwargs: {'consistency_level': 'Strong'} (api_request.py:62) [2024-10-10 19:28:14 - DEBUG - ci_test]: (api_response) : <Collection>: ------------- <name>: full_text_search_collection_smwEFsbH <description>: test collection <schema>: {'auto_id': False, 'description': 'test collection', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'word', 'de...... (api_request.py:37) [2024-10-10 19:28:14 - DEBUG - ci_test]: (api_request) : [Collection.describe] args: [180], kwargs: {} (api_request.py:62) [2024-10-10 19:28:14 - DEBUG - ci_test]: (api_response) : {'collection_name': 'full_text_search_collection_smwEFsbH', 'auto_id': False, 'num_shards': 1, 'description': 'test collection', 'fields': [{'field_id': 100, 'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'params': {}, 'is_primary': True}, {'field_id': 101, 'name': 'word', 'descriptio...... (api_request.py:37) [2024-10-10 19:28:14 - INFO - ci_test]: collection describe {'collection_name': 'full_text_search_collection_smwEFsbH', 'auto_id': False, 'num_shards': 1, 'description': 'test collection', 'fields': [{'field_id': 100, 'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'params': {}, 'is_primary': True}, {'field_id': 101, 'name': 'word', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535, 'enable_tokenizer': 'true', 'tokenizer_params': '{"tokenizer":"unsupported"}'}, 'is_partition_key': True}, {'field_id': 102, 'name': 'sentence', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535, 'enable_tokenizer': 'true', 'tokenizer_params': '{"tokenizer":"unsupported"}'}}, {'field_id': 103, 'name': 'paragraph', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535, 'enable_tokenizer': 'true', 'tokenizer_params': '{"tokenizer":"unsupported"}'}}, {'field_id': 104, 'name': 'text', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 65535, 'enable_tokenizer': 'true', 'tokenizer_params': '{"tokenizer":"unsupported"}'}}, {'field_id': 105, 'name': 'emb', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}, {'field_id': 106, 'name': 'text_sparse_emb', 'description': '', 'type': <DataType.SPARSE_FLOAT_VECTOR: 104>, 'params': {}, 'is_function_output': True}, {'field_id': 107, 'name': 'paragraph_sparse_emb', 'description': '', 'type': <DataType.SPARSE_FLOAT_VECTOR: 104>, 'params': {}, 'is_function_output': True}], 'functions': [{'name': 'text_bm25_emb', 'id': 100, 'description': '', 'type': <FunctionType.BM25: 1>, 'params': {}, 'input_field_names': ['text'], 'input_field_ids': [104], 'output_field_names': ['text_sparse_emb'], 'output_field_ids': [106]}, {'name': 'paragraph_bm25_emb', 'id': 101, 'description': '', 'type': <FunctionType.BM25: 1>, 'params': {}, 'input_field_names': ['paragraph'], 'input_field_ids': [103], 'output_field_names': ['paragraph_sparse_emb'], 'output_field_ids': [107]}], 'aliases': [], 'collection_id': 453124184567726198, 'consistency_level': 0, 'properties': {}, 'num_partitions': 16, 'enable_dynamic_field': False} (test_full_text_search.py:158) FAILED testcases/test_full_text_search.py:100 (TestCreateCollectionWithFullTextSearchNegative.test_create_collection_for_full_text_search_with_unsupported_tokenizer[unsupported]) self = <test_full_text_search.TestCreateCollectionWithFullTextSearchNegative object at 0x13275c940> tokenizer = 'unsupported' @pytest.mark.tags(CaseLabel.L0) @pytest.mark.parametrize("tokenizer", ["unsupported"]) def test_create_collection_for_full_text_search_with_unsupported_tokenizer(self, tokenizer): tokenizer_params = { "tokenizer": tokenizer, } dim = 128 fields = [ FieldSchema(name="id", dtype=DataType.INT64, is_primary=True), FieldSchema( name="word", dtype=DataType.VARCHAR, max_length=65535, enable_tokenizer=True, tokenizer_params=tokenizer_params, is_partition_key=True, ), FieldSchema( name="sentence", dtype=DataType.VARCHAR, max_length=65535, enable_tokenizer=True, tokenizer_params=tokenizer_params, ), FieldSchema( name="paragraph", dtype=DataType.VARCHAR, max_length=65535, enable_tokenizer=True, tokenizer_params=tokenizer_params, ), FieldSchema( name="text", dtype=DataType.VARCHAR, max_length=65535, enable_tokenizer=True, tokenizer_params=tokenizer_params, ), FieldSchema(name="emb", dtype=DataType.FLOAT_VECTOR, dim=dim), FieldSchema(name="text_sparse_emb", dtype=DataType.SPARSE_FLOAT_VECTOR), FieldSchema(name="paragraph_sparse_emb", dtype=DataType.SPARSE_FLOAT_VECTOR), ] schema = CollectionSchema(fields=fields, description="test collection") text_fields = ["text", "paragraph"] for field in text_fields: bm25_function = Function( name=f"{field}_bm25_emb", function_type=FunctionType.BM25, input_field_names=[field], output_field_names=[f"{field}_sparse_emb"], params={}, ) schema.add_function(bm25_function) collection_w = self.init_collection_wrap( name=cf.gen_unique_str(prefix), schema=schema ) res, result = collection_w.describe() log.info(f"collection describe {res}") > assert not result, "create collection with unsupported tokenizer should be failed" E AssertionError: create collection with unsupported tokenizer should be failed E assert not True
create collection with unsupported tokenizer should be failed
No response
/assign @zhengbuqian
/assign @aoiasd is working on this
/assign @aoiasd
this issue is caused by tokenizer params not correctly checked and used. so set it as critical issue
Is there an existing issue for this?
Environment
Current Behavior
Expected Behavior
create collection with unsupported tokenizer should be failed
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response