milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.87k stars 2.94k forks source link

[Bug]: [benchmark][pulsarv3] insert 50000 128dim data timeout #37929

Closed wangting0128 closed 6 hours ago

wangting0128 commented 9 hours ago

Is there an existing issue for this?

Environment

- Milvus version:master-20241122-06d73cf2-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc124
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: fouramf-9dhlb-wt-3 test case name: test_ivf_flat_search_filter_cluster

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
fouramf-9dhlb-wt-3-95-2001-etcd-0                                 1/1     Running     0               3m52s   10.104.24.19    4am-node29   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-etcd-1                                 1/1     Running     0               3m52s   10.104.20.82    4am-node22   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-etcd-2                                 1/1     Running     0               3m52s   10.104.19.230   4am-node28   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-milvus-datanode-b9758796-24pql         1/1     Running     2 (3m19s ago)   3m52s   10.104.24.7     4am-node29   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-milvus-indexnode-697d7b67dd-p2796      1/1     Running     1 (3m40s ago)   3m52s   10.104.14.125   4am-node18   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-milvus-mixcoord-68c56fd49f-w5bv8       1/1     Running     2 (3m30s ago)   3m52s   10.104.20.73    4am-node22   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-milvus-proxy-6dd7764997-h6kxd          1/1     Running     2 (3m29s ago)   3m52s   10.104.5.155    4am-node12   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-milvus-querynode-7fcd84758b-bn26z      1/1     Running     2 (3m30s ago)   3m52s   10.104.16.22    4am-node21   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-minio-0                                1/1     Running     0               3m52s   10.104.24.17    4am-node29   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-minio-1                                1/1     Running     0               3m52s   10.104.20.79    4am-node22   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-minio-2                                1/1     Running     0               3m51s   10.104.19.231   4am-node28   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-minio-3                                1/1     Running     0               3m51s   10.104.34.43    4am-node37   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-bookie-0                      1/1     Running     0               3m52s   10.104.24.18    4am-node29   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-bookie-1                      1/1     Running     0               3m52s   10.104.20.81    4am-node22   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-bookie-2                      1/1     Running     0               3m51s   10.104.19.232   4am-node28   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-bookie-init-cwz8s             0/1     Completed   0               3m52s   10.104.5.154    4am-node12   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-broker-0                      1/1     Running     0               3m52s   10.104.6.224    4am-node13   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-broker-1                      1/1     Running     0               3m52s   10.104.5.157    4am-node12   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-proxy-0                       1/1     Running     0               3m51s   10.104.5.159    4am-node12   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-proxy-1                       1/1     Running     0               3m51s   10.104.6.225    4am-node13   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-pulsar-init-xlg2t             0/1     Completed   0               3m52s   10.104.24.6     4am-node29   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-recovery-0                    1/1     Running     1 (3m ago)      3m52s   10.104.5.156    4am-node12   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-zookeeper-0                   1/1     Running     0               3m52s   10.104.24.16    4am-node29   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-zookeeper-1                   1/1     Running     0               3m52s   10.104.20.80    4am-node22   <none>           <none>
fouramf-9dhlb-wt-3-95-2001-pulsarv3-zookeeper-2                   1/1     Running     0               3m52s   10.104.19.229   4am-node28   <none>           <none>
截屏2024-11-22 14 18 16

client log:

[2024-11-22 06:10:49,618 -  INFO - fouram]: [Base] Connection params: {'alias': 'default', 'host': 'fouramf-9dhlb-wt-3-95-2001-milvus.qa-milvus.svc.cluster.local', 'port': '19530', 'uri': '', 'secure': False, 'user': '', 'password': '', 'token': '', 'db_name': ''} (base.py:240)
[2024-11-22 06:10:49,637 -  INFO - fouram]: [Base] Start clean all collections [] (base.py:289)
[2024-11-22 06:10:49,639 -  INFO - fouram]: [Base] Create collection fouram_gFuUAM3g (base.py:273)
[2024-11-22 06:10:49,759 -  INFO - fouram]: [Base] Collection schema: 
{'auto_id': False,
 'description': '',
 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}},
            {'name': 'int64_1', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'int64_2', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'float_1', 'description': '', 'type': <DataType.FLOAT: 10>},
            {'name': 'double_1', 'description': '', 'type': <DataType.DOUBLE: 11>}, {'name': 'varchar_1', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 256}}],
 'enable_dynamic_field': False} (base.py:329)
[2024-11-22 06:10:49,759 -  INFO - fouram]: [CommonCases] Prepare collection fouram_gFuUAM3g done. (common_cases.py:77)
[2024-11-22 06:10:49,764 -  INFO - fouram]: [Base] Collection:fouram_gFuUAM3g is not building index (base.py:491)
[2024-11-22 06:10:49,764 -  INFO - fouram]: [Base] Start release collection fouram_gFuUAM3g (base.py:324)
[2024-11-22 06:10:49,784 -  INFO - fouram]: [Base] Clean all index done. (base.py:515)
[2024-11-22 06:10:49,785 -  INFO - fouram]: [Base] Start build index of IVF_FLAT for field:float_vector collection:fouram_gFuUAM3g, params:{'index_type': 'IVF_FLAT', 'metric_type': 'L2', 'params': {'nlist': 2048}}, kwargs:{} (base.py:469)
[2024-11-22 06:10:50,313 -  INFO - fouram]: [Time] Index run in 0.5285s (api_request.py:49)
[2024-11-22 06:10:50,313 -  INFO - fouram]: [CommonCases] RT of build index IVF_FLAT: 0.5285s (common_cases.py:162)
[2024-11-22 06:10:50,313 -  INFO - fouram]: [CommonCases] Prepare index IVF_FLAT done. (common_cases.py:164)
[2024-11-22 06:10:50,314 -  INFO - fouram]: [CommonCases] No scalar and vector fields need to be indexed. (common_cases.py:183)
[2024-11-22 06:10:50,315 -  INFO - fouram]: [Base] Index params of fouram_gFuUAM3g:[{'float_vector': {'index_type': 'IVF_FLAT', 'metric_type': 'L2', 'params': {'nlist': 2048}}}] (base.py:488)
[2024-11-22 06:10:50,316 -  INFO - fouram]: [Base] Start inserting 50000000 vectors to collection fouram_gFuUAM3g (base.py:383)
[2024-11-22 06:10:50,390 -  INFO - fouram]: [Base] Start inserting, ids: 0 - 49999, data size: 50,000,000 (base.py:363)
[2024-11-22 06:11:22,060 - ERROR - fouram]: RPC error: [batch_insert], <MilvusException: (code=65535, message=message send timeout: TimeoutError)>, <Time:{'RPC start': '2024-11-22 06:10:50.790432', 'RPC error': '2024-11-22 06:11:22.060552'}> (decorators.py:140)
[2024-11-22 06:11:22,062 - ERROR - fouram]: (api_response) : [Collection.insert] <MilvusException: (code=65535, message=message send timeout: TimeoutError)>, [requestId: 85bb41d2-a898-11ef-b68d-7a0719bd6d08] (api_request.py:57)
[2024-11-22 06:11:22,062 - ERROR - fouram]: [CheckFunc] insert request check failed, response:<MilvusException: (code=65535, message=message send timeout: TimeoutError)> (func_check.py:106)

Expected Behavior

No response

Steps To Reproduce

1. create a collection with fields: 'id'(primary key), 'float_vector'(128dim), "int64_1","int64_2","float_1","double_1","varchar_1"
2. build index IVF_FLAT on 'float_vector'
3. insert 50000 data <- timeout

Milvus Log

No response

Anything else?

server config:

{
     "queryNode": {
          "resources": {
               "limits": {
                    "cpu": "16.0",
                    "memory": "64Gi"
               },
               "requests": {
                    "cpu": "9.0",
                    "memory": "33Gi"
               }
          }
     },
     "indexNode": {
          "resources": {
               "limits": {
                    "cpu": "16.0",
                    "memory": "20Gi"
               },
               "requests": {
                    "cpu": "9.0",
                    "memory": "11Gi"
               }
          },
          "replicas": 1
     },
     "dataNode": {
          "resources": {
               "limits": {
                    "cpu": "2.0",
                    "memory": "4Gi"
               },
               "requests": {
                    "cpu": "2.0",
                    "memory": "3Gi"
               }
          },
          "replicas": 1
     },
     "cluster": {
          "enabled": true
     },
     "pulsar": {
          "enabled": false
     },
     "kafka": {},
     "minio": {
          "metrics": {
               "podMonitor": {
                    "enabled": true
               }
          }
     },
     "etcd": {
          "metrics": {
               "enabled": true,
               "podMonitor": {
                    "enabled": true
               }
          },
          "image": {
               "tag": "3.5.16-r1"
          }
     },
     "metrics": {
          "serviceMonitor": {
               "enabled": true
          }
     },
     "log": {
          "level": "debug"
     },
     "pulsarv3": {
          "enabled": true,
          "broker": {
               "podMonitor": {
                    "enabled": true
               }
          },
          "bookkeeper": {
               "podMonitor": {
                    "enabled": true
               }
          }
     },
     "image": {
          "all": {
               "repository": "harbor.milvus.io/milvus/milvus",
               "tag": "master-20241122-06d73cf2-amd64"
          }
     }
}

client config:

{
     "dataset_params": {
          "metric_type": "L2",
          "dim": 128,
          "dataset_name": "sift",
          "dataset_size": 50000000,
          "ni_per": 50000,
          "req_run_counts": 10
     },
     "collection_params": {
          "other_fields": [
               "int64_1",
               "int64_2",
               "float_1",
               "double_1",
               "varchar_1"
          ],
          "shards_num": 2
     },
     "search_params": {
          "expr": [
               {
                    "float_1": {
                         "GT": -1,
                         "LT": 5000000
                    }
               },
               {
                    "float_1": {
                         "GT": -1,
                         "LT": 25000000
                    }
               },
               {
                    "float_1": {
                         "GT": -1,
                         "LT": 45000000
                    }
               }
          ],
          "top_k": [
               1,
               10,
               100,
               1000
          ],
          "nq": [
               1,
               10,
               100,
               200,
               500,
               1000,
               1200
          ],
          "search_param": {
               "nprobe": [
                    8,
                    32
               ]
          }
     },
     "index_params": {
          "index_type": "IVF_FLAT",
          "index_param": {
               "nlist": 2048
          }
     }
}
LoveEachDay commented 8 hours ago

@wangting0128 The newly deployed pulsar cluster failed to change a configuration of nettyMaxFrameSizeBytes, which trigger a error when message size is too large:

2024-11-22T06:27:58,292+0000 [bookie-io-8-87] ERROR org.apache.bookkeeper.proto.BookieRequestHandler - Unhandled exception occurred in I/O thread or handler on [id: 0xd23eddeb, L:/10.104.20.81:3181 - R:/10.104.5.157:49740]
io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 5253120: 5346562 - discarded
    at io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:507) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
    at io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:493) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
    at io.netty.handler.codec.LengthFieldBasedFrameDecoder.exceededFrameLength(LengthFieldBasedFrameDecoder.java:377) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
    at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:423) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
    at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:333) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
    at io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152) ~[io.netty-netty-handler-4.1.113.Final.jar:4.1.113.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]

We'll change the default nettyMaxFrameSizeBytes in the next release of milvus-helm chart.

wangting0128 commented 6 hours ago

verification passed

argo task: fouramf-n59lz test case name: test_ivf_flat_search_filter_cluster