milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.27k stars 2.91k forks source link

[Bug]: Milvus panic when text match function used analyzer_params #36047

Closed zhuwenxing closed 1 month ago

zhuwenxing commented 1 month ago

Is there an existing issue for this?

Environment

- Milvus version:longjiquan-text-match-492a38f-20240906
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

when analyzer_params is used, milvus will panic. if not used ,then it works well

        analyzer_params = {
            "analyzer": "stop",
            "stop_words": ["an", "the"],
            "case_insensitive": False,
        }
        default_fields = [
            FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
            FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=65535, enable_match=True, analyzer_params=analyzer_params),
            FieldSchema(name="overview", dtype=DataType.VARCHAR, max_length=65535, enable_match=True, analyzer_params=analyzer_params),
            FieldSchema(name="genres", dtype=DataType.VARCHAR, max_length=65535, enable_match=True, analyzer_params=analyzer_params),
            FieldSchema(name="producer", dtype=DataType.VARCHAR, max_length=65535, enable_match=True, analyzer_params=analyzer_params),
            FieldSchema(name="cast", dtype=DataType.VARCHAR, max_length=65535, enable_match=True, analyzer_params=analyzer_params),
            FieldSchema(name="emb", dtype=DataType.FLOAT_VECTOR, dim=dim)
        ]
        default_schema = CollectionSchema(fields=default_fields, description="test collection")
[2024/09/06 06:18:43.536 +00:00] [ERROR] [delegator/delegator_data.go:116] ["failed to create new segment"] [collectionID=452353708370097380] [channel=by-dev-rootcoord-dml_10_452353708370097380v0] [replicaID=452353708528500747] [segmentID=452353708370097445] [error="[json.exception.type_error.302] type must be string, but is boolean"] [stack="github.com/milvus-io/milvus/internal/querynodev2/delegator.(*shardDelegator).ProcessInsert\n\t/go/src/github.com/milvus-io/milvus/internal/querynodev2/delegator/delegator_data.go:116\ngithub.com/milvus-io/milvus/internal/querynodev2/pipeline.(*insertNode).Operate\n\t/go/src/github.com/milvus-io/milvus/internal/querynodev2/pipeline/insert_node.go:110\ngithub.com/milvus-io/milvus/internal/util/pipeline.(*pipeline).process\n\t/go/src/github.com/milvus-io/milvus/internal/util/pipeline/pipeline.go:91\ngithub.com/milvus-io/milvus/internal/util/pipeline.(*streamPipeline).work\n\t/go/src/github.com/milvus-io/milvus/internal/util/pipeline/stream_pipeline.go:67"]
panic: [json.exception.type_error.302] type must be string, but is boolean

goroutine 525743 [running]:
panic({0x6727a80?, 0xc0034e7da0?})
    /usr/local/go/src/runtime/panic.go:1017 +0x3ac fp=0xc001cd5830 sp=0xc001cd5780 pc=0x213eccc
github.com/milvus-io/milvus/internal/querynodev2/delegator.(*shardDelegator).ProcessInsert(0xc0013f8a80, 0xc00449d290?)
    /go/src/github.com/milvus-io/milvus/internal/querynodev2/delegator/delegator_data.go:119 +0x1185 fp=0xc001cd5d80 sp=0xc001cd5830 pc=0x5c6d0e5
github.com/milvus-io/milvus/internal/querynodev2/pipeline.(*insertNode).Operate(0xc001bd74c0, {0x626f500?, 0xc001138740})
    /go/src/github.com/milvus-io/milvus/internal/querynodev2/pipeline/insert_node.go:110 +0x42c fp=0xc001cd5e88 sp=0xc001cd5d80 pc=0x5c96f0c
github.com/milvus-io/milvus/internal/util/pipeline.(*pipeline).process(0xc0039f6f50?)
    /go/src/github.com/milvus-io/milvus/internal/util/pipeline/pipeline.go:91 +0x83 fp=0xc001cd5ed8 sp=0xc001cd5e88 pc=0x5c91d43
github.com/milvus-io/milvus/internal/util/pipeline.(*streamPipeline).work(0xc0060b9480)
    /go/src/github.com/milvus-io/milvus/internal/util/pipeline/stream_pipeline.go:67 +0xf7 fp=0xc001cd5fc8 sp=0xc001cd5ed8 pc=0x5c91eb7
github.com/milvus-io/milvus/internal/util/pipeline.(*streamPipeline).Start.func1.1()
    /go/src/github.com/milvus-io/milvus/internal/util/pipeline/stream_pipeline.go:129 +0x25 fp=0xc001cd5fe0 sp=0xc001cd5fc8 pc=0x5c930c5
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc001cd5fe8 sp=0xc001cd5fe0 pc=0x2178521
created by github.com/milvus-io/milvus/internal/util/pipeline.(*streamPipeline).Start.func1 in goroutine 525737
    /go/src/github.com/milvus-io/milvus/internal/util/pipeline/stream_pipeline.go:129 +0x96

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

log.log

Anything else?

No response

longjiquan commented 1 month ago

Only jieba tokenizer is available now.

zhuwenxing commented 1 month ago

It is about the error handle, not whether stop is supported.

zhuwenxing commented 1 month ago

/assign @longjiquan

longjiquan commented 1 month ago

already fixed and you will get an exception when create collection with invalid tokenizer parameters. /unassign /assign @zhuwenxing