milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
26.95k stars 2.6k forks source link

[Bug]: [benchmark][cluster] Query Node disconnected from etcd and restarted multiple times #30926

Open wangting0128 opened 2 months ago

wangting0128 commented 2 months ago

Is there an existing issue for this?

Environment

- Milvus version:master-20240228-095cdbed-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): pulsar   
- SDK version(e.g. pymilvus v2.0.0rc2):2.4.0rc36
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: inverted-corn-1709136000 test case name: test_inverted_locust_hnsw_diskann_dml_dql_cluster

server:

[2024-02-28 20:09:25,648 -  INFO - fouram]: [Base] Deploy initial state: 
I0228 16:10:08.885065     428 request.go:665] Waited for 1.162395988s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/batch/v1beta1?timeout=32s
NAME                                                              READY   STATUS        RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-136000-8-75-4542-etcd-0                             1/1     Running       0               7m3s    10.104.25.55    4am-node30   <none>           <none>
inverted-corn-136000-8-75-4542-etcd-1                             1/1     Running       0               7m3s    10.104.27.163   4am-node31   <none>           <none>
inverted-corn-136000-8-75-4542-etcd-2                             1/1     Running       0               7m2s    10.104.31.130   4am-node34   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-datacoord-5d8d86cc588bnq8   1/1     Running       0               7m3s    10.104.12.59    4am-node17   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-datanode-7df6ff895-ds2h8    1/1     Running       1 (2m2s ago)    7m3s    10.104.12.60    4am-node17   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-indexcoord-64b67b845j6sgz   1/1     Running       0               7m3s    10.104.23.237   4am-node27   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-5v67d   1/1     Running       0               7m3s    10.104.14.197   4am-node18   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-jltrr   1/1     Running       0               7m3s    10.104.34.37    4am-node37   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-svmnb   1/1     Running       0               7m3s    10.104.24.101   4am-node29   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-vrvl8   1/1     Running       0               7m3s    10.104.4.153    4am-node11   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-proxy-79c7cbf7c7-zdsn8      1/1     Running       1 (2m32s ago)   7m3s    10.104.5.219    4am-node12   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-querycoord-6c7f585db2t9qm   1/1     Running       1 (2m2s ago)    7m3s    10.104.14.196   4am-node18   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-querynode-89bc557c6-22x74   1/1     Running       0               7m2s    10.104.25.51    4am-node30   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-querynode-89bc557c6-g979z   1/1     Running       0               7m3s    10.104.5.220    4am-node12   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-rootcoord-766dcd65f4nkzcw   1/1     Running       1 (2m2s ago)    7m3s    10.104.9.82     4am-node14   <none>           <none>
inverted-corn-136000-8-75-4542-minio-0                            1/1     Running       0               7m3s    10.104.34.52    4am-node37   <none>           <none>
inverted-corn-136000-8-75-4542-minio-1                            1/1     Running       0               7m3s    10.104.29.226   4am-node35   <none>           <none>
inverted-corn-136000-8-75-4542-minio-2                            1/1     Running       0               7m3s    10.104.26.33    4am-node32   <none>           <none>
inverted-corn-136000-8-75-4542-minio-3                            1/1     Running       0               7m2s    10.104.31.131   4am-node34   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-0                    1/1     Running       0               7m3s    10.104.28.173   4am-node33   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-1                    1/1     Running       0               7m2s    10.104.26.34    4am-node32   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-2                    1/1     Running       0               7m2s    10.104.23.9     4am-node27   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-init-x5qrg           0/1     Completed     0               7m3s    10.104.34.35    4am-node37   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-broker-0                    1/1     Running       0               7m3s    10.104.30.110   4am-node38   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-proxy-0                     1/1     Running       0               7m3s    10.104.23.241   4am-node27   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-pulsar-init-bfggc           0/1     Completed     0               7m3s    10.104.34.34    4am-node37   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-recovery-0                  1/1     Running       0               7m3s    10.104.9.83     4am-node14   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-zookeeper-0                 1/1     Running       0               7m3s    10.104.34.53    4am-node37   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-zookeeper-1                 1/1     Running       0               4m55s   10.104.19.176   4am-node28   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-zookeeper-2                 1/1     Running       0               4m18s   10.104.29.228   4am-node35   <none>           <none> (base.py:257)
[2024-02-28 20:09:25,648 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'STATUS|inverted-corn-136000-8-75-4542-milvus|inverted-corn-136000-8-75-4542-minio|inverted-corn-136000-8-75-4542-etcd|inverted-corn-136000-8-75-4542-pulsar|inverted-corn-136000-8-75-4542-kafka|inverted-corn-136000-8-75-4542-log|inverted-corn-136000-8-75-4542-tikv'  (util_cmd.py:14)
[2024-02-28 20:09:35,668 -  INFO - fouram]: [CliClient] pod details of release(inverted-corn-136000-8-75-4542): 
 I0228 20:09:26.900864     539 request.go:665] Waited for 1.165144189s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/autoscaling/v2beta1?timeout=32s
NAME                                                              READY   STATUS             RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-136000-8-75-4542-etcd-0                             1/1     Running            0               4h6m    10.104.25.55    4am-node30   <none>           <none>
inverted-corn-136000-8-75-4542-etcd-1                             1/1     Running            0               4h6m    10.104.27.163   4am-node31   <none>           <none>
inverted-corn-136000-8-75-4542-etcd-2                             1/1     Running            0               4h6m    10.104.31.130   4am-node34   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-datacoord-5d8d86cc588bnq8   1/1     Running            0               4h6m    10.104.12.59    4am-node17   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-datanode-7df6ff895-ds2h8    1/1     Running            1 (4h1m ago)    4h6m    10.104.12.60    4am-node17   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-indexcoord-64b67b845j6sgz   1/1     Running            0               4h6m    10.104.23.237   4am-node27   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-5v67d   1/1     Running            0               4h6m    10.104.14.197   4am-node18   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-jltrr   1/1     Running            0               4h6m    10.104.34.37    4am-node37   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-svmnb   1/1     Running            0               4h6m    10.104.24.101   4am-node29   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-indexnode-b558f58d6-vrvl8   1/1     Running            0               4h6m    10.104.4.153    4am-node11   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-proxy-79c7cbf7c7-zdsn8      1/1     Running            1 (4h1m ago)    4h6m    10.104.5.219    4am-node12   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-querycoord-6c7f585db2t9qm   1/1     Running            1 (4h1m ago)    4h6m    10.104.14.196   4am-node18   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-querynode-89bc557c6-22x74   1/1     Running            2 (100m ago)    4h6m    10.104.25.51    4am-node30   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-querynode-89bc557c6-g979z   1/1     Running            0               4h6m    10.104.5.220    4am-node12   <none>           <none>
inverted-corn-136000-8-75-4542-milvus-rootcoord-766dcd65f4nkzcw   1/1     Running            1 (4h1m ago)    4h6m    10.104.9.82     4am-node14   <none>           <none>
inverted-corn-136000-8-75-4542-minio-0                            1/1     Running            0               4h6m    10.104.34.52    4am-node37   <none>           <none>
inverted-corn-136000-8-75-4542-minio-1                            1/1     Running            0               4h6m    10.104.29.226   4am-node35   <none>           <none>
inverted-corn-136000-8-75-4542-minio-2                            1/1     Running            0               4h6m    10.104.26.33    4am-node32   <none>           <none>
inverted-corn-136000-8-75-4542-minio-3                            1/1     Running            0               4h6m    10.104.31.131   4am-node34   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-0                    1/1     Running            0               4h6m    10.104.28.173   4am-node33   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-1                    1/1     Running            0               4h6m    10.104.26.34    4am-node32   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-2                    1/1     Running            0               4h6m    10.104.23.9     4am-node27   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-bookie-init-x5qrg           0/1     Completed          0               4h6m    10.104.34.35    4am-node37   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-broker-0                    1/1     Running            0               4h6m    10.104.30.110   4am-node38   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-proxy-0                     1/1     Running            0               4h6m    10.104.23.241   4am-node27   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-pulsar-init-bfggc           0/1     Completed          0               4h6m    10.104.34.34    4am-node37   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-recovery-0                  1/1     Running            0               4h6m    10.104.9.83     4am-node14   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-zookeeper-0                 1/1     Running            0               4h6m    10.104.34.53    4am-node37   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-zookeeper-1                 1/1     Running            0               4h4m    10.104.19.176   4am-node28   <none>           <none>
inverted-corn-136000-8-75-4542-pulsar-zookeeper-2                 1/1     Running            0               4h3m    10.104.29.228   4am-node35   <none>           <none> 

{pod=~"inverted-corn-136000-8-75-4542-milvus-querynode-89bc557c6-22x74"}

GC"=292] ["new GOGC"=200] [gc-pause=69.319µs] [gc-pause-end=1709141449374069875]
2024-02-29 01:30:49.379 [2024/02/28 17:30:49.379 +00:00] [DEBUG] [segments/collection.go:188] ["collection ref decrement"] [collectionID=448039877626298924] [refCount=234]
2024-02-29 01:30:49.380 [2024/02/28 17:30:49.380 +00:00] [WARN] [grpclog/grpclog.go:46] ["[core][Server #5] grpc: Server.processUnaryRPC failed to write status: connection error: desc = \"transport is closing\""]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=querynode] [LeaseID=2417463163909405775] [error="etcdserver: requested lease not found"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [ERROR] [querynodev2/server.go:171] ["Query Node disconnected from etcd, process will exit"] ["Server Id"=8] [stack="github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querynodev2/server.go:171"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [querynodev2/server.go:415] ["Query node stop..."]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [WARN] [querynodev2/server.go:418] ["session fail to go stopping state"] [error="this session has disconnected"] [errorVerbose="this session has disconnected\n(1) attached stack trace\n  -- stack trace:\n  | github.com/milvus-io/milvus/internal/util/sessionutil.(*Session).GoingStop\n  | \t/go/src/github.com/milvus-io/milvus/internal/util/sessionutil/session_util.go:661\n  | github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Stop.func1\n  | \t/go/src/github.com/milvus-io/milvus/internal/querynodev2/server.go:416\n  | sync.(*Once).doSlow\n  | \t/usr/local/go/src/sync/once.go:74\n  | sync.(*Once).Do\n  | \t/usr/local/go/src/sync/once.go:65\n  | github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Stop\n  | \t/go/src/github.com/milvus-io/milvus/internal/querynodev2/server.go:414\n  | github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Register.func1\n  | \t/go/src/github.com/milvus-io/milvus/internal/querynodev2/server.go:172\n  | runtime.goexit\n  | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (2) this session has disconnected\nError types: (1) *withstack.withStack (2) *errutil.leafError"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [tasks/concurrent_safe_scheduler.go:122] ["receiveChan closed, processing remaining request"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [tasks/concurrent_safe_scheduler.go:129] ["all task put into exeChan, schedule worker exit"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [tasks/concurrent_safe_scheduler.go:217] ["scheduler execChan closed, worker exit"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [DEBUG] [pipeline/stream_pipeline.go:56] ["stream pipeline input closed"]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:177] ["get signal"] [pchannel=by-dev-rootcoord-dml_0] [signal=pause] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:177] ["get signal"] [pchannel=by-dev-rootcoord-dml_0] [signal=pause] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:210] ["stop working"] [pchannel=by-dev-rootcoord-dml_0] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:200] ["handle signal done"] [pchannel=by-dev-rootcoord-dml_0] [signal=pause] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:200] ["handle signal done"] [pchannel=by-dev-rootcoord-dml_0] [signal=pause] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:164] ["closed target"] [vchannel=by-dev-rootcoord-dml_0_448039877626298924v0] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:177] ["get signal"] [pchannel=by-dev-rootcoord-dml_0] [signal=terminate] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgdispatcher/dispatcher.go:177] ["get signal"] [pchannel=by-dev-rootcoord-dml_0] [signal=terminate] [isMain=true]
2024-02-29 01:30:49.636 [2024/02/28 17:30:49.636 +00:00] [INFO] [msgstream/mq_msgstream.go:216] ["start to close mq msg stream"] ["producer num"=0] ["consumer num"=1]
2024-02-29 01:30:49.659 [2024/02/28 17:30:49.659 +00:00] [INFO] [msgdispatcher/dispatcher.go:200] ["handle signal done"] [pchannel=by-dev-rootcoord-dml_0] [signal=terminate] [isMain=true]
2024-02-29 01:30:49.659 [2024/02/28 17:30:49.659 +00:00] [INFO] 
截屏2024-02-29 14 27 29

inverted-corn-136000-8-75-4542-milvus-querynode-89bc557c6-22x74.txt

inverted-corn-136000-8-75-4542-milvus-datacoord-5d8d86cc588bnq8.txt inverted-corn-136000-8-75-4542-milvus-querynode-89bc557c6-g979z.txt inverted-corn-136000-8-75-4542-etcd-.*.txt inverted-corn-136000-8-75-4542-milvus-rootcoord-766dcd65f4nkzcw.txt inverted-corn-136000-8-75-4542-milvus-datanode-7df6ff895-ds2h8.txt inverted-corn-136000-8-75-4542-milvus-querycoord-6c7f585db2t9qm.txt inverted-corn-136000-8-75-4542-milvus-proxy-79c7cbf7c7-zdsn8.txt

client pod name: inverted-corn-1709136000-127259320 client log: client.log.zip

Expected Behavior

No response

Steps To Reproduce

concurrent test and calculation of RT and QPS

        :purpose:  `vector: memory and disk index`
            verify concurrent DML & DQL scenario which has 4 float_vector fields & 16 scalar fields

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'float_vector_1': 128dim,
                'float_vector_2': 200dim,
                'float_vector_3': 200dim,
                'int8_1', 'int16_1', 'int32_1', 'int64_1', 'double_1', 'float_1', 'varchar_1', 'bool_1',
                'int8_2', 'int16_2', 'int32_2', 'int64_2', 'double_2', 'float_2', 'varchar_2', 'bool_2'
            2. build indexes:
                HNSW: 'float_vector'
                DIAKANN_IP: 'float_vector_1'
                HNSW: 'float_vector_2'
                DIAKANN_L2: 'float_vector_3'
                scalar_default_index: 'int8_1', 'int16_1', 'int32_1', 'int64_1', 'double_1', 'float_1', 'varchar_1'
                scalar_INVERTED_index: 'int8_2', 'int16_2', 'int32_2', 'int64_2', 'double_2', 'float_2', 'varchar_2', 'bool_2'
            3. insert 5 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - insert
                - delete
                - flush
                - load
                - search
                - hybrid_search
                - query

Milvus Log

No response

Anything else?

test result:

[2024-02-28 20:08:38,030 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-02-28 20:08:38,031 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-02-28 20:08:38,032 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-28 20:08:38,032 -  INFO - fouram]: grpc     delete                                                                          8653     0(0.00%) |    154       4   20575     21 |    0.80        0.00 (stats.py:789)
[2024-02-28 20:08:38,032 -  INFO - fouram]: grpc     flush                                                                           8590     0(0.00%) |   7883     169   70211   6300 |    0.80        0.00 (stats.py:789)
[2024-02-28 20:08:38,032 -  INFO - fouram]: grpc     hybrid_search                                                                   8608 6950(80.74%) |    298       3   33631      9 |    0.80        0.64 (stats.py:789)
[2024-02-28 20:08:38,032 -  INFO - fouram]: grpc     insert                                                                          8565     0(0.00%) |    478      51   24871    340 |    0.79        0.00 (stats.py:789)
[2024-02-28 20:08:38,032 -  INFO - fouram]: grpc     load                                                                            8660    27(0.31%) |  15331      10  300005   2900 |    0.80        0.00 (stats.py:789)
[2024-02-28 20:08:38,032 -  INFO - fouram]: grpc     query                                                                           8562 6991(81.65%) |    331       1   32205      7 |    0.79        0.65 (stats.py:789)
[2024-02-28 20:08:38,032 -  INFO - fouram]: grpc     search                                                                          8580 6925(80.71%) |    449      81   34062    110 |    0.79        0.64 (stats.py:789)
[2024-02-28 20:08:38,032 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-28 20:08:38,032 -  INFO - fouram]:          Aggregated                                                                     60218 20893(34.70%) |   3573       1  300005    210 |    5.58        1.93 (stats.py:789)
[2024-02-28 20:08:38,032 -  INFO - fouram]:  (stats.py:790)
[2024-02-28 20:08:38,036 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'cluster',
            'config_name': 'cluster_8c16m',
            'config': {'queryNode': {'resources': {'limits': {'cpu': '16.0',
                                                              'memory': '64Gi'},
                                                   'requests': {'cpu': '9.0',
                                                                'memory': '33Gi'}},
                                     'replicas': 2},
                       'indexNode': {'resources': {'limits': {'cpu': '8.0',
                                                              'memory': '16Gi'},
                                                   'requests': {'cpu': '5.0',
                                                                'memory': '9Gi'}},
                                     'replicas': 4},
                       'dataNode': {'resources': {'limits': {'cpu': '8.0',
                                                             'memory': '16Gi'},
                                                  'requests': {'cpu': '5.0',
                                                               'memory': '9Gi'}}},
                       'cluster': {'enabled': True},
                       'pulsar': {},
                       'kafka': {},
                       'minio': {'metrics': {'podMonitor': {'enabled': True}}},
                       'etcd': {'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': 'master-20240228-095cdbed-amd64'}}},
            'host': 'inverted-corn-136000-8-75-4542-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_inverted_locust_hnsw_diskann_dml_dql_cluster',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'scalars_index': {'int8_1': {},
                                                                      'int16_1': {},
                                                                      'int32_1': {},
                                                                      'int64_1': {},
                                                                      'double_1': {},
                                                                      'float_1': {},
                                                                      'varchar_1': {},
                                                                      'int8_2': {'index_type': 'INVERTED'},
                                                                      'int16_2': {'index_type': 'INVERTED'},
                                                                      'int32_2': {'index_type': 'INVERTED'},
                                                                      'int64_2': {'index_type': 'INVERTED'},
                                                                      'double_2': {'index_type': 'INVERTED'},
                                                                      'float_2': {'index_type': 'INVERTED'},
                                                                      'varchar_2': {'index_type': 'INVERTED'},
                                                                      'bool_2': {'index_type': 'INVERTED'}},
                                                    'vectors_index': {'float_vector_1': {'index_type': 'DISKANN',
                                                                                         'index_param': {},
                                                                                         'metric_type': 'IP'},
                                                                      'float_vector_2': {'index_type': 'HNSW',
                                                                                         'index_param': {'M': 8,
                                                                                                         'efConstruction': 200},
                                                                                         'metric_type': 'L2'},
                                                                      'float_vector_3': {'index_type': 'DISKANN',
                                                                                         'index_param': {},
                                                                                         'metric_type': 'L2'}},
                                                    'scalars_params': {'float_vector_1': {'params': {'dim': 128},
                                                                                          'other_params': {'dataset': 'sift',
                                                                                                           'dim': 128}},
                                                                       'float_vector_2': {'params': {'dim': 200},
                                                                                          'other_params': {'dataset': 'text2img',
                                                                                                           'dim': 200}},
                                                                       'float_vector_3': {'params': {'dim': 200},
                                                                                          'other_params': {'dataset': 'text2img',
                                                                                                           'dim': 200}}},
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 5000000,
                                                    'ni_per': 5000},
                                 'collection_params': {'other_fields': ['float_vector_1',
                                                                        'float_vector_2',
                                                                        'float_vector_3',
                                                                        'int8_1',
                                                                        'int16_1',
                                                                        'int32_1',
                                                                        'int64_1',
                                                                        'double_1',
                                                                        'float_1',
                                                                        'varchar_1',
                                                                        'bool_1',
                                                                        'int8_2',
                                                                        'int16_2',
                                                                        'int32_2',
                                                                        'int64_2',
                                                                        'double_2',
                                                                        'float_2',
                                                                        'varchar_2',
                                                                        'bool_2'],
                                                       'shards_num': 2},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'HNSW',
                                                  'index_param': {'M': 8,
                                                                  'efConstruction': 200}},
                                 'concurrent_params': {'concurrent_number': 20,
                                                       'during_time': '3h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'insert',
                                                       'weight': 1,
                                                       'params': {'nb': 10,
                                                                  'timeout': 30,
                                                                  'random_id': True,
                                                                  'random_vector': True,
                                                                  'varchar_filled': False,
                                                                  'start_id': 5000000}},
                                                      {'type': 'delete',
                                                       'weight': 1,
                                                       'params': {'expr': '',
                                                                  'delete_length': 9,
                                                                  'timeout': 30}},
                                                      {'type': 'flush',
                                                       'weight': 1,
                                                       'params': {'timeout': 180}},
                                                      {'type': 'load',
                                                       'weight': 1,
                                                       'params': {'replica_number': 1,
                                                                  'timeout': 300}},
                                                      {'type': 'search',
                                                       'weight': 1,
                                                       'params': {'nq': 1000,
                                                                  'top_k': 1,
                                                                  'search_param': {'ef': 64},
                                                                  'expr': 'int64_1 '
                                                                          '> '
                                                                          '-1 '
                                                                          '&& '
                                                                          'id '
                                                                          '> '
                                                                          '-1',
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'output_fields': ['*'],
                                                                  'ignore_growing': False,
                                                                  'group_by_field': None,
                                                                  'timeout': 180,
                                                                  'random_data': True}},
                                                      {'type': 'hybrid_search',
                                                       'weight': 1,
                                                       'params': {'nq': 1,
                                                                  'top_k': 10,
                                                                  'reqs': [{'search_param': {'ef': 1280},
                                                                            'anns_field': 'float_vector',
                                                                            'expr': 'int64_1 '
                                                                                    '< '
                                                                                    '100000 '
                                                                                    '&& '
                                                                                    'float_2 '
                                                                                    '> '
                                                                                    '10.0',
                                                                            'top_k': 1000},
                                                                           {'search_param': {'search_list': 30},
                                                                            'anns_field': 'float_vector_1',
                                                                            'expr': 'varchar_1 '
                                                                                    'like '
                                                                                    '"0%" '
                                                                                    '&& '
                                                                                    'bool_2 '
                                                                                    '== '
                                                                                    'True'},
                                                                           {'search_param': {'ef': 1024},
                                                                            'anns_field': 'float_vector_2',
                                                                            'expr': 'int8_1 '
                                                                                    '< '
                                                                                    '64 '
                                                                                    '&& '
                                                                                    'bool_1 '
                                                                                    '== '
                                                                                    'False',
                                                                            'top_k': 1009},
                                                                           {'search_param': {'search_list': 40},
                                                                            'anns_field': 'float_vector_3',
                                                                            'expr': 'int8_2 '
                                                                                    '> '
                                                                                    '64 '
                                                                                    '|| '
                                                                                    'double_2 '
                                                                                    '> '
                                                                                    '1000000.0'}],
                                                                  'rerank': {'RRFRanker': []},
                                                                  'output_fields': ['*'],
                                                                  'ignore_growing': False,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'timeout': 60,
                                                                  'random_data': True}},
                                                      {'type': 'query',
                                                       'weight': 1,
                                                       'params': {'ids': None,
                                                                  'expr': 'int64_1 '
                                                                          '> '
                                                                          '-1 '
                                                                          '&&  '
                                                                          'int64_2 '
                                                                          '> '
                                                                          '-1 '
                                                                          '&& ',
                                                                  'output_fields': ['*'],
                                                                  'offset': None,
                                                                  'limit': None,
                                                                  'ignore_growing': False,
                                                                  'partition_names': None,
                                                                  'timeout': 180,
                                                                  'random_data': True,
                                                                  'random_count': 20,
                                                                  'random_range': [2500000.0,
                                                                                   5000000],
                                                                  'field_name': 'id',
                                                                  'field_type': 'int64'}}]},
            'run_id': 2024022861921862,
            'datetime': '2024-02-28 16:03:12.352658',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 936.4863,
                                      'float_vector_1': {'RT': 828.8714},
                                      'float_vector_2': {'RT': 214.7156},
                                      'float_vector_3': {'RT': 135.2635},
                                      'int8_1': {'RT': 0.5383},
                                      'int16_1': {'RT': 0.5459},
                                      'int32_1': {'RT': 0.54},
                                      'int64_1': {'RT': 0.5391},
                                      'double_1': {'RT': 0.5368},
                                      'float_1': {'RT': 0.7412},
                                      'varchar_1': {'RT': 0.7463},
                                      'int8_2': {'RT': 0.639},
                                      'int16_2': {'RT': 0.6818},
                                      'int32_2': {'RT': 0.5278},
                                      'int64_2': {'RT': 0.525},
                                      'double_2': {'RT': 0.5441},
                                      'float_2': {'RT': 0.5439},
                                      'varchar_2': {'RT': 0.547},
                                      'bool_2': {'RT': 0.631}},
                            'insert': {'total_time': 901.6265,
                                       'VPS': 5545.5335,
                                       'batch_time': 0.9016,
                                       'batch': 5000},
                            'flush': {'RT': 3.5434},
                            'load': {'RT': 32.7634},
                            'Locust': {'Aggregated': {'Requests': 60218,
                                                      'Fails': 20893,
                                                      'RPS': 5.58,
                                                      'fail_s': 0.35,
                                                      'RT_max': 300005.24,
                                                      'RT_avg': 3573.59,
                                                      'TP50': 210.0,
                                                      'TP99': 50000.0},
                                       'delete': {'Requests': 8653,
                                                  'Fails': 0,
                                                  'RPS': 0.8,
                                                  'fail_s': 0.0,
                                                  'RT_max': 20575.91,
                                                  'RT_avg': 154.5,
                                                  'TP50': 21,
                                                  'TP99': 1700.0},
                                       'flush': {'Requests': 8590,
                                                 'Fails': 0,
                                                 'RPS': 0.8,
                                                 'fail_s': 0.0,
                                                 'RT_max': 70211.92,
                                                 'RT_avg': 7883.45,
                                                 'TP50': 6300.0,
                                                 'TP99': 44000.0},
                                       'hybrid_search': {'Requests': 8608,
                                                         'Fails': 6950,
                                                         'RPS': 0.8,
                                                         'fail_s': 0.81,
                                                         'RT_max': 33631.02,
                                                         'RT_avg': 298.54,
                                                         'TP50': 9,
                                                         'TP99': 2500.0},
                                       'insert': {'Requests': 8565,
                                                  'Fails': 0,
                                                  'RPS': 0.79,
                                                  'fail_s': 0.0,
                                                  'RT_max': 24871.13,
                                                  'RT_avg': 478.59,
                                                  'TP50': 340.0,
                                                  'TP99': 2300.0},
                                       'load': {'Requests': 8660,
                                                'Fails': 27,
                                                'RPS': 0.8,
                                                'fail_s': 0.0,
                                                'RT_max': 300005.24,
                                                'RT_avg': 15331.16,
                                                'TP50': 2900.0,
                                                'TP99': 242000.0},
                                       'query': {'Requests': 8562,
                                                 'Fails': 6991,
                                                 'RPS': 0.79,
                                                 'fail_s': 0.82,
                                                 'RT_max': 32205.63,
                                                 'RT_avg': 331.84,
                                                 'TP50': 8,
                                                 'TP99': 3700.0},
                                       'search': {'Requests': 8580,
                                                  'Fails': 6925,
                                                  'RPS': 0.79,
                                                  'fail_s': 0.81,
                                                  'RT_max': 34062.22,
                                                  'RT_avg': 449.97,
                                                  'TP50': 110.0,
                                                  'TP99': 3400.0}}}}} 
wangting0128 commented 2 months ago

Issue #30915 may be the same reason

wangting0128 commented 2 months ago

dataNode panic: fail to allocate ID

argo task: inverted-corn-c79tm test case name: test_inverted_locust_varchar_dml_dql_cluster

server:

[2024-02-29 10:05:24,154 -  INFO - fouram]: [Base] Deploy initial state: 
I0229 03:25:11.753714     401 request.go:665] Waited for 1.174198326s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/node.k8s.io/v1?timeout=32s
NAME                                                              READY   STATUS             RESTARTS          AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-c79tm-5-49-7343-etcd-0                              1/1     Running            0                 8m27s   10.104.24.237   4am-node29   <none>           <none>
inverted-corn-c79tm-5-49-7343-etcd-1                              1/1     Running            0                 8m27s   10.104.16.15    4am-node21   <none>           <none>
inverted-corn-c79tm-5-49-7343-etcd-2                              1/1     Running            0                 8m26s   10.104.30.216   4am-node38   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-datacoord-5c45566ddd-f295q   1/1     Running            0                 8m27s   10.104.20.29    4am-node22   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-datanode-54f56c4cfb-4bwbv    1/1     Running            1 (3m56s ago)     8m27s   10.104.26.144   4am-node32   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-indexcoord-5455756f77nfj5t   1/1     Running            0                 8m27s   10.104.23.168   4am-node27   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-indexnode-7c6cf7486b-dbnfs   1/1     Running            0                 8m27s   10.104.20.31    4am-node22   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-proxy-7d854b565-pz4mz        1/1     Running            1 (3m56s ago)     8m27s   10.104.12.237   4am-node17   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-querycoord-c88bdbb74-b8tmt   1/1     Running            1 (3m57s ago)     8m27s   10.104.20.30    4am-node22   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-querynode-6bb9b6d87b-fhsz9   1/1     Running            0                 8m27s   10.104.23.166   4am-node27   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-rootcoord-8445bc65d8-hbhpr   1/1     Running            1 (3m57s ago)     8m27s   10.104.23.167   4am-node27   <none>           <none>
inverted-corn-c79tm-5-49-7343-minio-0                             1/1     Running            0                 8m27s   10.104.26.146   4am-node32   <none>           <none>
inverted-corn-c79tm-5-49-7343-minio-1                             1/1     Running            0                 8m27s   10.104.24.238   4am-node29   <none>           <none>
inverted-corn-c79tm-5-49-7343-minio-2                             1/1     Running            0                 8m27s   10.104.29.75    4am-node35   <none>           <none>
inverted-corn-c79tm-5-49-7343-minio-3                             1/1     Running            0                 8m27s   10.104.21.137   4am-node24   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-0                     1/1     Running            0                 8m27s   10.104.34.176   4am-node37   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-1                     1/1     Running            0                 8m27s   10.104.25.242   4am-node30   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-2                     1/1     Running            0                 8m26s   10.104.24.244   4am-node29   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-init-zhxgz            0/1     Completed          0                 8m27s   10.104.12.239   4am-node17   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-broker-0                     1/1     Running            0                 8m27s   10.104.15.211   4am-node20   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-proxy-0                      1/1     Running            0                 8m27s   10.104.5.162    4am-node12   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-pulsar-init-sv6xd            0/1     Completed          0                 8m27s   10.104.12.238   4am-node17   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-recovery-0                   1/1     Running            0                 8m27s   10.104.12.236   4am-node17   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-zookeeper-0                  1/1     Running            0                 8m27s   10.104.24.236   4am-node29   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-zookeeper-1                  1/1     Running            0                 7m47s   10.104.34.203   4am-node37   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-zookeeper-2                  1/1     Running            0                 6m6s    10.104.16.55    4am-node21   <none>           <none> (base.py:257)
[2024-02-29 10:05:24,155 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'STATUS|inverted-corn-c79tm-5-49-7343-milvus|inverted-corn-c79tm-5-49-7343-minio|inverted-corn-c79tm-5-49-7343-etcd|inverted-corn-c79tm-5-49-7343-pulsar|inverted-corn-c79tm-5-49-7343-kafka|inverted-corn-c79tm-5-49-7343-log|inverted-corn-c79tm-5-49-7343-tikv'  (util_cmd.py:14)
[2024-02-29 10:05:33,788 -  INFO - fouram]: [CliClient] pod details of release(inverted-corn-c79tm-5-49-7343): 
 I0229 10:05:25.402578     511 request.go:665] Waited for 1.164400108s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/metrics.k8s.io/v1beta1?timeout=32s
NAME                                                              READY   STATUS        RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-c79tm-5-49-7343-etcd-0                              1/1     Running       0               6h48m   10.104.24.237   4am-node29   <none>           <none>
inverted-corn-c79tm-5-49-7343-etcd-1                              1/1     Running       0               6h48m   10.104.16.15    4am-node21   <none>           <none>
inverted-corn-c79tm-5-49-7343-etcd-2                              1/1     Running       0               6h48m   10.104.30.216   4am-node38   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-datacoord-5c45566ddd-f295q   1/1     Running       0               6h48m   10.104.20.29    4am-node22   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-datanode-54f56c4cfb-4bwbv    1/1     Running       2 (58m ago)     6h48m   10.104.26.144   4am-node32   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-indexcoord-5455756f77nfj5t   1/1     Running       0               6h48m   10.104.23.168   4am-node27   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-indexnode-7c6cf7486b-dbnfs   1/1     Running       0               6h48m   10.104.20.31    4am-node22   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-proxy-7d854b565-pz4mz        1/1     Running       1 (6h44m ago)   6h48m   10.104.12.237   4am-node17   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-querycoord-c88bdbb74-b8tmt   1/1     Running       1 (6h44m ago)   6h48m   10.104.20.30    4am-node22   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-querynode-6bb9b6d87b-fhsz9   1/1     Running       0               6h48m   10.104.23.166   4am-node27   <none>           <none>
inverted-corn-c79tm-5-49-7343-milvus-rootcoord-8445bc65d8-hbhpr   1/1     Running       1 (6h44m ago)   6h48m   10.104.23.167   4am-node27   <none>           <none>
inverted-corn-c79tm-5-49-7343-minio-0                             1/1     Running       0               6h48m   10.104.26.146   4am-node32   <none>           <none>
inverted-corn-c79tm-5-49-7343-minio-1                             1/1     Running       0               6h48m   10.104.24.238   4am-node29   <none>           <none>
inverted-corn-c79tm-5-49-7343-minio-2                             1/1     Running       0               6h48m   10.104.29.75    4am-node35   <none>           <none>
inverted-corn-c79tm-5-49-7343-minio-3                             1/1     Running       0               6h48m   10.104.21.137   4am-node24   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-0                     1/1     Running       0               6h48m   10.104.34.176   4am-node37   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-1                     1/1     Running       0               6h48m   10.104.25.242   4am-node30   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-2                     1/1     Running       0               6h48m   10.104.24.244   4am-node29   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-bookie-init-zhxgz            0/1     Completed     0               6h48m   10.104.12.239   4am-node17   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-broker-0                     1/1     Running       0               6h48m   10.104.15.211   4am-node20   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-proxy-0                      1/1     Running       0               6h48m   10.104.5.162    4am-node12   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-pulsar-init-sv6xd            0/1     Completed     0               6h48m   10.104.12.238   4am-node17   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-recovery-0                   1/1     Running       0               6h48m   10.104.12.236   4am-node17   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-zookeeper-0                  1/1     Running       0               6h48m   10.104.24.236   4am-node29   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-zookeeper-1                  1/1     Running       0               6h48m   10.104.34.203   4am-node37   <none>           <none>
inverted-corn-c79tm-5-49-7343-pulsar-zookeeper-2                  1/1     Running       0               6h46m   10.104.16.55    4am-node21   <none>           <none> 

inverted-corn-c79tm-5-49-7343-milvus-datanode-54f56c4cfb-4bwbv.panic.txt

截屏2024-02-29 19 15 17 截屏2024-02-29 19 16 11

client pod name: inverted-corn-c79tm-505297513 client log: client.log

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `varchar: different max_length`
            verify concurrent DML & DQL scenario which has 3 VARCHAR scalars fields and creating INVERTED index

        :test steps:
            1. create collection with fields:
                'float_vector': 3dim,
                'varchar_1': max_length=256, varchar_filled=True
                'varchar_2': max_length=32768, varchar_filled=True
                'varchar_3': max_length=65535, varchar_filled=True
            2. build indexes:
                IVF_FLAT: 'float_vector'
                INVERTED: 'varchar_1', 'varchar_2', 'varchar_3'
            3. insert 5 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - insert
                - delete
                - flush
                - load
                - search
                - hybrid_search
                - query

test result:

[2024-02-29 10:05:15,361 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-02-29 10:05:15,361 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-02-29 10:05:15,361 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-29 10:05:15,361 -  INFO - fouram]: grpc     delete                                                                          2619 1199(45.78%) |   6356       6   18595   6100 |    0.73        0.33 (stats.py:789)
[2024-02-29 10:05:15,361 -  INFO - fouram]: grpc     flush                                                                           2654     0(0.00%) |  20717    3729  332757  15000 |    0.74        0.00 (stats.py:789)
[2024-02-29 10:05:15,361 -  INFO - fouram]: grpc     hybrid_search                                                                   2615     0(0.00%) |   6815    2152   12416   6600 |    0.73        0.00 (stats.py:789)
[2024-02-29 10:05:15,361 -  INFO - fouram]: grpc     insert                                                                          2619 1240(47.35%) |   6752      14   24199   6400 |    0.73        0.34 (stats.py:789)
[2024-02-29 10:05:15,361 -  INFO - fouram]: grpc     load                                                                            2643     2(0.08%) |  13294       8   38653  13000 |    0.73        0.00 (stats.py:789)
[2024-02-29 10:05:15,361 -  INFO - fouram]: grpc     query                                                                           2678     0(0.00%) |   8988       5   27468   8900 |    0.74        0.00 (stats.py:789)
[2024-02-29 10:05:15,362 -  INFO - fouram]: grpc     search                                                                          2642     0(0.00%) |   4989    2169    9200   5100 |    0.73        0.00 (stats.py:789)
[2024-02-29 10:05:15,362 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-29 10:05:15,362 -  INFO - fouram]:          Aggregated                                                                     18470 2441(13.22%) |   9720       5  332757   7600 |    5.13        0.68 (stats.py:789)
[2024-02-29 10:05:15,362 -  INFO - fouram]:  (stats.py:790)
[2024-02-29 10:05:15,364 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'cluster',
            'config_name': 'cluster_2c4m',
            'config': {'queryNode': {'resources': {'limits': {'cpu': '8',
                                                              'memory': '32Gi'},
                                                   'requests': {'cpu': '8',
                                                                'memory': '32Gi'}},
                                     'replicas': 1},
                       'indexNode': {'resources': {'limits': {'cpu': '4.0',
                                                              'memory': '16Gi'},
                                                   'requests': {'cpu': '3.0',
                                                                'memory': '9Gi'}},
                                     'replicas': 1},
                       'dataNode': {'resources': {'limits': {'cpu': '2.0',
                                                             'memory': '4Gi'},
                                                  'requests': {'cpu': '2.0',
                                                               'memory': '3Gi'}}},
                       'cluster': {'enabled': True},
                       'pulsar': {},
                       'kafka': {},
                       'minio': {'metrics': {'podMonitor': {'enabled': True}}},
                       'etcd': {'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': 'master-20240229-50a78b68-amd64'}}},
            'host': 'inverted-corn-c79tm-5-49-7343-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_inverted_locust_varchar_dml_dql_cluster',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 3,
                                                    'scalars_index': {'varchar_1': {'index_type': 'INVERTED'},
                                                                      'varchar_2': {'index_type': 'INVERTED'},
                                                                      'varchar_3': {'index_type': 'INVERTED'}},
                                                    'scalars_params': {'varchar_1': {'params': {'max_length': 256},
                                                                                     'other_params': {'varchar_filled': True}},
                                                                       'varchar_2': {'params': {'max_length': 32768},
                                                                                     'other_params': {'varchar_filled': True}},
                                                                       'varchar_3': {'params': {'max_length': 65535},
                                                                                     'other_params': {'varchar_filled': True}}},
                                                    'dataset_name': 'local',
                                                    'dataset_size': 300000,
                                                    'ni_per': 50},
                                 'collection_params': {'other_fields': ['varchar_1',
                                                                        'varchar_2',
                                                                        'varchar_3'],
                                                       'shards_num': 2},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'IVF_FLAT',
                                                  'index_param': {'nlist': 1024}},
                                 'concurrent_params': {'concurrent_number': 50,
                                                       'during_time': '1h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'insert',
                                                       'weight': 1,
                                                       'params': {'nb': 10,
                                                                  'timeout': 30,
                                                                  'random_id': True,
                                                                  'random_vector': True,
                                                                  'varchar_filled': False,
                                                                  'start_id': 300000}},
                                                      {'type': 'delete',
                                                       'weight': 1,
                                                       'params': {'expr': '',
                                                                  'delete_length': 10,
                                                                  'timeout': 30}},
                                                      {'type': 'flush',
                                                       'weight': 1,
                                                       'params': {'timeout': 600}},
                                                      {'type': 'load',
                                                       'weight': 1,
                                                       'params': {'replica_number': 1,
                                                                  'timeout': 30}},
                                                      {'type': 'search',
                                                       'weight': 1,
                                                       'params': {'nq': 1000,
                                                                  'top_k': 1,
                                                                  'search_param': {'nprobe': 32},
                                                                  'expr': 'varchar_1 '
                                                                          'like '
                                                                          '"a%" '
                                                                          '&& '
                                                                          'varchar_2 '
                                                                          'like '
                                                                          '"A%" '
                                                                          '&& '
                                                                          'varchar_3 '
                                                                          'like '
                                                                          '"0%" '
                                                                          '&& '
                                                                          'id '
                                                                          '> 0',
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'output_fields': None,
                                                                  'ignore_growing': False,
                                                                  'group_by_field': None,
                                                                  'timeout': 60,
                                                                  'random_data': True}},
                                                      {'type': 'hybrid_search',
                                                       'weight': 1,
                                                       'params': {'nq': 1,
                                                                  'top_k': 10,
                                                                  'reqs': [{'search_param': {'nprobe': 16},
                                                                            'anns_field': 'float_vector',
                                                                            'expr': 'varchar_1 '
                                                                                    'like '
                                                                                    '"0%"',
                                                                            'top_k': 2000},
                                                                           {'search_param': {'nprobe': 128},
                                                                            'anns_field': 'float_vector',
                                                                            'expr': 'varchar_2 '
                                                                                    'like '
                                                                                    '"9%"'}],
                                                                  'rerank': {'WeightedRanker': [0.5,
                                                                                                0.5]},
                                                                  'output_fields': ['*'],
                                                                  'ignore_growing': False,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'timeout': 60,
                                                                  'random_data': True}},
                                                      {'type': 'query',
                                                       'weight': 1,
                                                       'params': {'ids': None,
                                                                  'expr': 'varchar_3 '
                                                                          'like '
                                                                          '"a%" '
                                                                          '&& ',
                                                                  'output_fields': ['*'],
                                                                  'offset': None,
                                                                  'limit': None,
                                                                  'ignore_growing': False,
                                                                  'partition_names': None,
                                                                  'timeout': 60,
                                                                  'random_data': True,
                                                                  'random_count': 20,
                                                                  'random_range': [0,
                                                                                   150000.0],
                                                                  'field_name': 'id',
                                                                  'field_type': 'int64'}}]},
            'run_id': 2024022966095562,
            'datetime': '2024-02-29 03:16:49.427461',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 2674.4206,
                                      'varchar_1': {'RT': 2423.3369},
                                      'varchar_2': {'RT': 2770.9254},
                                      'varchar_3': {'RT': 2398.6752}},
                            'insert': {'total_time': 802.7125,
                                       'VPS': 373.7328,
                                       'batch_time': 0.1338,
                                       'batch': 50},
                            'flush': {'RT': 3.0556},
                            'load': {'RT': 67.3992},
                            'Locust': {'Aggregated': {'Requests': 18470,
                                                      'Fails': 2441,
                                                      'RPS': 5.13,
                                                      'fail_s': 0.13,
                                                      'RT_max': 332757.81,
                                                      'RT_avg': 9720.22,
                                                      'TP50': 7600.0,
                                                      'TP99': 27000.0},
                                       'delete': {'Requests': 2619,
                                                  'Fails': 1199,
                                                  'RPS': 0.73,
                                                  'fail_s': 0.46,
                                                  'RT_max': 18595.15,
                                                  'RT_avg': 6356.37,
                                                  'TP50': 6100.0,
                                                  'TP99': 15000.0},
                                       'flush': {'Requests': 2654,
                                                 'Fails': 0,
                                                 'RPS': 0.74,
                                                 'fail_s': 0.0,
                                                 'RT_max': 332757.81,
                                                 'RT_avg': 20717.97,
                                                 'TP50': 15000.0,
                                                 'TP99': 308000.0},
                                       'hybrid_search': {'Requests': 2615,
                                                         'Fails': 0,
                                                         'RPS': 0.73,
                                                         'fail_s': 0.0,
                                                         'RT_max': 12416.05,
                                                         'RT_avg': 6815.72,
                                                         'TP50': 6600.0,
                                                         'TP99': 11000.0},
                                       'insert': {'Requests': 2619,
                                                  'Fails': 1240,
                                                  'RPS': 0.73,
                                                  'fail_s': 0.47,
                                                  'RT_max': 24199.24,
                                                  'RT_avg': 6752.85,
                                                  'TP50': 6400.0,
                                                  'TP99': 15000.0},
                                       'load': {'Requests': 2643,
                                                'Fails': 2,
                                                'RPS': 0.73,
                                                'fail_s': 0.0,
                                                'RT_max': 38653.31,
                                                'RT_avg': 13294.36,
                                                'TP50': 13000.0,
                                                'TP99': 27000.0},
                                       'query': {'Requests': 2678,
                                                 'Fails': 0,
                                                 'RPS': 0.74,
                                                 'fail_s': 0.0,
                                                 'RT_max': 27468.3,
                                                 'RT_avg': 8988.8,
                                                 'TP50': 8900.0,
                                                 'TP99': 18000.0},
                                       'search': {'Requests': 2642,
                                                  'Fails': 0,
                                                  'RPS': 0.73,
                                                  'fail_s': 0.0,
                                                  'RT_max': 9200.37,
                                                  'RT_avg': 4989.36,
                                                  'TP50': 5100.0,
                                                  'TP99': 7800.0}}}}}
wangting0128 commented 1 month ago

Recurrent

argo task: inverted-corn-1709395200 test case name: test_inverted_locust_partition_key_dml_standalone

server:

[2024-03-02 19:30:39,854 -  INFO - fouram]: [Base] Deploy initial state: 
I0302 16:07:20.142749     421 request.go:665] Waited for 1.169485874s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/authorization.k8s.io/v1?timeout=32s
NAME                                                              READY   STATUS             RESTARTS          AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-195200-2-25-7555-etcd-0                             1/1     Running            0                 5m6s    10.104.27.102   4am-node31   <none>           <none>
inverted-corn-195200-2-25-7555-milvus-standalone-799d9c86cnnk77   1/1     Running            1 (113s ago)      5m6s    10.104.25.22    4am-node30   <none>           <none>
inverted-corn-195200-2-25-7555-minio-65dc6bf765-k258b             1/1     Running            0                 5m6s    10.104.27.101   4am-node31   <none>           <none> (base.py:257)
[2024-03-02 19:30:39,854 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'STATUS|inverted-corn-195200-2-25-7555-milvus|inverted-corn-195200-2-25-7555-minio|inverted-corn-195200-2-25-7555-etcd|inverted-corn-195200-2-25-7555-pulsar|inverted-corn-195200-2-25-7555-kafka|inverted-corn-195200-2-25-7555-log|inverted-corn-195200-2-25-7555-tikv'  (util_cmd.py:14)
[2024-03-02 19:30:50,072 -  INFO - fouram]: [CliClient] pod details of release(inverted-corn-195200-2-25-7555): 
 I0302 19:30:41.119839     538 request.go:665] Waited for 1.151439776s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/apiextensions.k8s.io/v1?timeout=32s
NAME                                                              READY   STATUS             RESTARTS          AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-195200-2-25-7555-etcd-0                             1/1     Running            0                 3h28m   10.104.27.102   4am-node31   <none>           <none>
inverted-corn-195200-2-25-7555-milvus-standalone-799d9c86cnnk77   1/1     Running            4 (132m ago)      3h28m   10.104.25.22    4am-node30   <none>           <none>
inverted-corn-195200-2-25-7555-minio-65dc6bf765-k258b             1/1     Running            0                 3h28m   10.104.27.101   4am-node31   <none>           <none>

inverted-corn-195200-2-25-7555-milvus-standalone-799d9c86cnnk77.log

截屏2024-03-04 10 58 51 截屏2024-03-04 10 59 54 截屏2024-03-04 10 59 20

client pod name: inverted-corn-1709395200-4115883251 client logs: client.log

截屏2024-03-04 11 25 49 截屏2024-03-04 11 26 36

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `partition_key: scalar enable partition_key(num_partitions=128)`
            verify concurrent DML scenario which
            scalar `id`(pk) & `int64_1` created INVERTED index and enable partition_key on `int64_1` field

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'int64_1': is_partition_key
            2. build indexes:
                IVF_FLAT: 'float_vector'
                INVERTED: 'id', 'int64_1'
            3. insert 5 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - insert
                - delete
                - flush
                - release

test result:

[2024-03-02 19:30:16,988 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-03-02 19:30:16,990 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-03-02 19:30:16,990 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-02 19:30:16,990 -  INFO - fouram]: grpc     delete                                                                          8934     0(0.00%) |     16       1     357      4 |    0.83        0.00 (stats.py:789)
[2024-03-02 19:30:16,990 -  INFO - fouram]: grpc     flush                                                                           8795    20(0.23%) |  19229     509  180789  14000 |    0.81        0.00 (stats.py:789)
[2024-03-02 19:30:16,991 -  INFO - fouram]: grpc     insert                                                                          8978     0(0.00%) |   5149      23  136817   3800 |    0.83        0.00 (stats.py:789)
[2024-03-02 19:30:16,991 -  INFO - fouram]: grpc     release                                                                         8978     0(0.00%) |     15       1     814      3 |    0.83        0.00 (stats.py:789)
[2024-03-02 19:30:16,992 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-02 19:30:16,992 -  INFO - fouram]:          Aggregated                                                                     35685    20(0.06%) |   6042       1  180789     71 |    3.31        0.00 (stats.py:789)
[2024-03-02 19:30:16,992 -  INFO - fouram]:  (stats.py:790)
[2024-03-02 19:30:16,994 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_8c16m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '8.0',
                                                               'memory': '16Gi'},
                                                    'requests': {'cpu': '5.0',
                                                                 'memory': '9Gi'}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1,
                                'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone',
                                 'metrics': {'podMonitor': {'enabled': True}}},
                       'pulsar': {'enabled': False},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': 'master-20240302-d98a5e44-amd64'}}},
            'host': 'inverted-corn-195200-2-25-7555-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_inverted_locust_partition_key_dml_standalone',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'scalars_index': {'id': {'index_type': 'INVERTED'},
                                                                      'int64_1': {'index_type': 'INVERTED'}},
                                                    'scalars_params': {'int64_1': {'params': {'is_partition_key': True}}},
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 5000000,
                                                    'ni_per': 50000},
                                 'collection_params': {'other_fields': ['int64_1'],
                                                       'shards_num': 2,
                                                       'num_partitions': 128},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'IVF_FLAT',
                                                  'index_param': {'nlist': 1024}},
                                 'concurrent_params': {'concurrent_number': 20,
                                                       'during_time': '3h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'insert',
                                                       'weight': 1,
                                                       'params': {'nb': 10,
                                                                  'timeout': 180,
                                                                  'random_id': True,
                                                                  'random_vector': True,
                                                                  'varchar_filled': False,
                                                                  'start_id': 0}},
                                                      {'type': 'delete',
                                                       'weight': 1,
                                                       'params': {'expr': '',
                                                                  'delete_length': 9,
                                                                  'timeout': 30}},
                                                      {'type': 'flush',
                                                       'weight': 1,
                                                       'params': {'timeout': 180}},
                                                      {'type': 'release',
                                                       'weight': 1,
                                                       'params': {'timeout': 30}}]},
            'run_id': 2024030253418613,
            'datetime': '2024-03-02 16:02:21.467025',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 920.1162,
                                      'id': {'RT': 1.0264},
                                      'int64_1': {'RT': 1.0201}},
                            'insert': {'total_time': 357.4444,
                                       'VPS': 13988.1895,
                                       'batch_time': 3.5744,
                                       'batch': 50000},
                            'flush': {'RT': 12.9492},
                            'load': {'RT': 9.7292},
                            'Locust': {'Aggregated': {'Requests': 35685,
                                                      'Fails': 20,
                                                      'RPS': 3.31,
                                                      'fail_s': 0.0,
                                                      'RT_max': 180789.72,
                                                      'RT_avg': 6042.71,
                                                      'TP50': 71,
                                                      'TP99': 56000.0},
                                       'delete': {'Requests': 8934,
                                                  'Fails': 0,
                                                  'RPS': 0.83,
                                                  'fail_s': 0.0,
                                                  'RT_max': 357.07,
                                                  'RT_avg': 16.47,
                                                  'TP50': 4,
                                                  'TP99': 110.0},
                                       'flush': {'Requests': 8795,
                                                 'Fails': 20,
                                                 'RPS': 0.81,
                                                 'fail_s': 0.0,
                                                 'RT_max': 180789.72,
                                                 'RT_avg': 19229.44,
                                                 'TP50': 14000.0,
                                                 'TP99': 76000.0},
                                       'insert': {'Requests': 8978,
                                                  'Fails': 0,
                                                  'RPS': 0.83,
                                                  'fail_s': 0.0,
                                                  'RT_max': 136817.72,
                                                  'RT_avg': 5149.17,
                                                  'TP50': 3800.0,
                                                  'TP99': 25000.0},
                                       'release': {'Requests': 8978,
                                                   'Fails': 0,
                                                   'RPS': 0.83,
                                                   'fail_s': 0.0,
                                                   'RT_max': 814.22,
                                                   'RT_avg': 15.02,
                                                   'TP50': 3,
                                                   'TP99': 110.0}}}}} 
wangting0128 commented 1 month ago

Same error, different scene

argo task: multi-vector-corn-1709560800 test case name:test_hybrid_search_locust_shard1_float_dql_hnsw_standalone image:master-20240304-52540fec-amd64

server:

NAME                                                              READY   STATUS        RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-corn-1709560800-51-etcd-0                            1/1     Running       0               3m1s    10.104.16.103   4am-node21   <none>           <none>
multi-vector-corn-1709560800-51-milvus-standalone-58ff988fwwlgw   1/1     Running       0               3m1s    10.104.26.113   4am-node32   <none>           <none>
multi-vector-corn-1709560800-51-minio-6d6d88568d-lfk7n            1/1     Running       0               3m1s    10.104.26.112   4am-node32   <none>           <none> 
NAME                                                              READY   STATUS             RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-corn-1709560800-51-etcd-0                            1/1     Running            0               12h     10.104.16.103   4am-node21   <none>           <none>
multi-vector-corn-1709560800-51-milvus-standalone-58ff988fwwlgw   1/1     Running            6 (10h ago)     12h     10.104.26.113   4am-node32   <none>           <none>
multi-vector-corn-1709560800-51-minio-6d6d88568d-lfk7n            1/1     Running            0               12h     10.104.26.112   4am-node32   <none>           <none>
截屏2024-03-05 10 49 22 截屏2024-03-05 10 49 08

client pod name:multi-vector-corn-1709560800-1714564187 client logs: client.log

截屏2024-03-05 10 56 55

test step:

        concurrent test and calculation of RT and QPS

        :purpose:  `shard_num=1, float_vector DQL`
            verify concurrent DQL scenario which has 4 float_vector fields(HNSW) and 60 scalar fields

        :test steps:
            1. create collection with fields:
                'float_vector': 32768dim,
                'float_vector_1': 32768dim,
                'float_vector_2': 32768dim,
                'float_vector_3': 32768dim,
                all scalar fields: varchar max_length=10, array max_capacity=7
            2. build indexes:
                HNSW: 'float_vector', 'float_vector_1', 'float_vector_2', 'float_vector_3'
                default_scalar_index: 'int64_1'
                INVERTED: 'id', 'bool_3'
            3. insert 100k data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
                replica: 1
            7. concurrent request:
                - hybrid_search

test result:

'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_16c64m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '16.0',
                                                               'memory': '64Gi'},
                                                    'requests': {'cpu': '9.0',
                                                                 'memory': '33Gi'}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1,
                                'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone',
                                 'metrics': {'podMonitor': {'enabled': True}}},
                       'pulsar': {'enabled': False},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': 'master-20240304-52540fec-amd64'}}},
            'host': 'multi-vector-corn-1709560800-51-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_hybrid_search_locust_shard1_float_dql_hnsw_standalone',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 32768,
                                                    'max_length': 10,
                                                    'scalars_index': {'int64_1': {},
                                                                      'id': {'index_type': 'INVERTED'},
                                                                      'bool_3': {'index_type': 'INVERTED'}},
                                                    'vectors_index': {'float_vector_1': {'index_type': 'HNSW',
                                                                                         'index_param': {'M': 8,
                                                                                                         'efConstruction': 200},
                                                                                         'metric_type': 'L2'},
                                                                      'float_vector_2': {'index_type': 'HNSW',
                                                                                         'index_param': {'M': 8,
                                                                                                         'efConstruction': 200},
                                                                                         'metric_type': 'L2'},
                                                                      'float_vector_3': {'index_type': 'HNSW',
                                                                                         'index_param': {'M': 8,
                                                                                                         'efConstruction': 200},
                                                                                         'metric_type': 'L2'}},
                                                    'scalars_params': {'array_int8_1': {'params': {'max_capacity': 7}},
                                                                       'array_int16_1': {'params': {'max_capacity': 7}},
                                                                       'array_int32_1': {'params': {'max_capacity': 7}},
                                                                       'array_int64_1': {'params': {'max_capacity': 7}},
                                                                       'array_double_1': {'params': {'max_capacity': 7}},
                                                                       'array_float_1': {'params': {'max_capacity': 7}},
                                                                       'array_varchar_1': {'params': {'max_capacity': 7}},
                                                                       'array_bool_1': {'params': {'max_capacity': 7}},
                                                                       'array_int8_2': {'params': {'max_capacity': 7}},
                                                                       'array_int16_2': {'params': {'max_capacity': 7}},
                                                                       'array_int32_2': {'params': {'max_capacity': 7}},
                                                                       'array_int64_2': {'params': {'max_capacity': 7}},
                                                                       'array_double_2': {'params': {'max_capacity': 7}},
                                                                       'array_float_2': {'params': {'max_capacity': 7}},
                                                                       'array_varchar_2': {'params': {'max_capacity': 7}},
                                                                       'array_bool_2': {'params': {'max_capacity': 7}},
                                                                       'array_int8_3': {'params': {'max_capacity': 7}},
                                                                       'array_int16_3': {'params': {'max_capacity': 7}},
                                                                       'array_int32_3': {'params': {'max_capacity': 7}},
                                                                       'array_int64_3': {'params': {'max_capacity': 7}},
                                                                       'array_double_3': {'params': {'max_capacity': 7}},
                                                                       'array_float_3': {'params': {'max_capacity': 7}},
                                                                       'array_varchar_3': {'params': {'max_capacity': 7}},
                                                                       'array_bool_3': {'params': {'max_capacity': 7}}},
                                                    'dataset_name': 'local',
                                                    'dataset_size': 100000,
                                                    'ni_per': 100},
                                 'collection_params': {'other_fields': ['float_vector_1',
                                                                        'float_vector_2',
                                                                        'float_vector_3',
                                                                        'int8_1',
                                                                        'int16_1',
                                                                        'int32_1',
                                                                        'int64_1',
                                                                        'double_1',
                                                                        'float_1',
                                                                        'varchar_1',
                                                                        'bool_1',
                                                                        'json_1',
                                                                        'array_int8_1',
                                                                        'array_int16_1',
                                                                        'array_int32_1',
                                                                        'array_int64_1',
                                                                        'array_double_1',
                                                                        'array_float_1',
                                                                        'array_varchar_1',
                                                                        'array_bool_1',
                                                                        'int8_2',
                                                                        'int16_2',
                                                                        'int32_2',
                                                                        'int64_2',
                                                                        'double_2',
                                                                        'float_2',
                                                                        'varchar_2',
                                                                        'bool_2',
                                                                        'json_2',
                                                                        'array_int8_2',
                                                                        'array_int16_2',
                                                                        'array_int32_2',
                                                                        'array_int64_2',
                                                                        'array_double_2',
                                                                        'array_float_2',
                                                                        'array_varchar_2',
                                                                        'array_bool_2',
                                                                        'int8_3',
                                                                        'int16_3',
                                                                        'int32_3',
                                                                        'int64_3',
                                                                        'double_3',
                                                                        'float_3',
                                                                        'varchar_3',
                                                                        'bool_3',
                                                                        'json_3',
                                                                        'array_int8_3',
                                                                        'array_int16_3',
                                                                        'array_int32_3',
                                                                        'array_int64_3',
                                                                        'array_double_3',
                                                                        'array_float_3',
                                                                        'array_varchar_3',
                                                                        'array_bool_3',
                                                                        'varchar_tail_1',
                                                                        'varchar_tail_2',
                                                                        'varchar_tail_3',
                                                                        'varchar_tail_4',
                                                                        'varchar_tail_5',
                                                                        'varchar_tail_6',
                                                                        'varchar_tail_7',
                                                                        'varchar_tail_8'],
                                                       'shards_num': 1},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'HNSW',
                                                  'index_param': {'M': 8,
                                                                  'efConstruction': 200}},
                                 'concurrent_params': {'concurrent_number': 1,
                                                       'during_time': '1h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'hybrid_search',
                                                       'weight': 1,
                                                       'params': {'nq': 1,
                                                                  'top_k': 100,
                                                                  'reqs': [{'search_param': {'ef': 128},
                                                                            'anns_field': 'float_vector',
                                                                            'expr': 'id '
                                                                                    '> '
                                                                                    '10000',
                                                                            'top_k': 10},
                                                                           {'search_param': {'ef': 64},
                                                                            'anns_field': 'float_vector_1',
                                                                            'expr': 'int64_1 '
                                                                                    '<= '
                                                                                    '90000',
                                                                            'top_k': 50},
                                                                           {'search_param': {'ef': 1024},
                                                                            'anns_field': 'float_vector_2',
                                                                            'expr': 'array_length(array_int8_2) '
                                                                                    '== '
                                                                                    '7',
                                                                            'top_k': 1000},
                                                                           {'search_param': {'ef': 20000},
                                                                            'anns_field': 'float_vector_3',
                                                                            'expr': 'bool_3 '
                                                                                    '== '
                                                                                    'True',
                                                                            'top_k': 16384}],
                                                                  'rerank': {'RRFRanker': []},
                                                                  'output_fields': ['float_vector'],
                                                                  'ignore_growing': False,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'timeout': 60,
                                                                  'random_data': True}}]},
            'run_id': 2024030409416393,
            'datetime': '2024-03-04 14:02:21.023607',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 174.534,
                                      'float_vector_1': {'RT': 30.9736},
                                      'float_vector_2': {'RT': 7.6267},
                                      'float_vector_3': {'RT': 8.1377},
                                      'int64_1': {'RT': 1.0257},
                                      'id': {'RT': 0.5196},
                                      'bool_3': {'RT': 0.5178}},
                            'insert': {'total_time': 3698.0592,
                                       'VPS': 27.0412,
                                       'batch_time': 3.6981,
                                       'batch': 100},
                            'flush': {'RT': 2.5311},
                            'load': {'RT': 66.3822},
                            'Locust': {'Aggregated': {'Requests': 3015,
                                                      'Fails': 3,
                                                      'RPS': 0.84,
                                                      'fail_s': 0.0,
                                                      'RT_max': 62713.68,
                                                      'RT_avg': 927.07,
                                                      'TP50': 840.0,
                                                      'TP99': 1100.0},
                                       'hybrid_search': {'Requests': 3015,
                                                         'Fails': 3,
                                                         'RPS': 0.84,
                                                         'fail_s': 0.0,
                                                         'RT_max': 62713.68,
                                                         'RT_avg': 927.07,
                                                         'TP50': 840.0,
                                                         'TP99': 1100.0}}}}} 
xiaofan-luan commented 1 month ago

@wangting0128 it seems on all your case there is some node crash. Did you check the possible reason why node crash?

wangting0128 commented 1 month ago

@wangting0128 it seems on all your case there is some node crash. Did you check the possible reason why node crash?

I have checked the reason why the node was restarted. From the log, I can see that the node restarted due to the disconnection between the node and etcd.

2024-03-04 14:32:44.505(no unique labels)[2024/03/04 14:32:44.505 +00:00] [WARN] [rootcoord/root_coord.go:1595] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 4"]2024-03-04 14:32:44.505(no unique labels)[2024/03/04 14:32:44.505 +00:00] [WARN] [proxy/proxy.go:370] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 4"]2024-03-04 14:32:44.653(no unique labels)[2024/03/04 14:32:44.653 +00:00] [INFO] [gc/gc_tuner.go:90] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=351] ["total memory"=2038] ["next GC"=1001] ["new GOGC"=200] [gc-pause=91.755µs] [gc-pause-end=1709562764652807392]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=indexcoord] [LeaseID=218862229534671527] [error="etcdserver: requested lease not found"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=querynode] [LeaseID=218862229534671595] [error="etcdserver: requested lease not found"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=proxy] [LeaseID=218862229534671579] [error="etcdserver: requested lease not found"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=indexnode] [LeaseID=218862229534671464] [error="etcdserver: requested lease not found"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=datanode] [LeaseID=218862229534671554] [error="etcdserver: requested lease not found"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [ERROR] [proxy/proxy.go:170] ["Proxy disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/proxy.(*Proxy).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/proxy.go:170"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=querycoord] [LeaseID=218862229534671561] [error="etcdserver: requested lease not found"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=datacoord] [LeaseID=218862229534671530] [error="etcdserver: requested lease not found"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [ERROR] [querycoordv2/server.go:152] ["QueryCoord disconnected from etcd, process will exit"] [serverID=4] [stack="github.com/milvus-io/milvus/internal/querycoordv2.(*Server).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/server.go:152"]2024-03-04 14:32:44.664(no unique labels)[2024/03/04 14:32:44.664 +00:00] [ERROR] [querynodev2/server.go:170] ["Query Node disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querynodev2/server.go:170"] | 2024-03-04 14:32:44.505 | (no unique labels) | [2024/03/04 14:32:44.505 +00:00] [WARN] [rootcoord/root_coord.go:1595] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 4"] |   |   |   | 2024-03-04 14:32:44.505 | (no unique labels) | [2024/03/04 14:32:44.505 +00:00] [WARN] [proxy/proxy.go:370] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 4"] |   |   |   | 2024-03-04 14:32:44.653 | (no unique labels) | [2024/03/04 14:32:44.653 +00:00] [INFO] [gc/gc_tuner.go:90] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=351] ["total memory"=2038] ["next GC"=1001] ["new GOGC"=200] [gc-pause=91.755µs] [gc-pause-end=1709562764652807392] |   |   |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=indexcoord] [LeaseID=218862229534671527] [error="etcdserver: requested lease not found"] |   |   |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=querynode] [LeaseID=218862229534671595] [error="etcdserver: requested lease not found"] |   |   |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] |   |   |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=proxy] [LeaseID=218862229534671579] [error="etcdserver: requested lease not found"] |   |   |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=indexnode] [LeaseID=218862229534671464] [error="etcdserver: requested lease not found"] |   |   |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] |   |   |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=datanode] [LeaseID=218862229534671554] [error="etcdserver: requested lease not found"] |   |   |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [ERROR] [proxy/proxy.go:170] ["Proxy disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/proxy.(*Proxy).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/proxy.go:170"] |   |   |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=querycoord] [LeaseID=218862229534671561] [error="etcdserver: requested lease not found"] |   |   |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=datacoord] [LeaseID=218862229534671530] [error="etcdserver: requested lease not found"] |   |   |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] |   |   |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] |   |   |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [ERROR] [querycoordv2/server.go:152] ["QueryCoord disconnected from etcd, process will exit"] [serverID=4] [stack="github.com/milvus-io/milvus/internal/querycoordv2.(*Server).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/server.go:152"] |   |   |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [ERROR] [querynodev2/server.go:170] ["Query Node disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querynodev2/server.go:170"] |  
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
2024-03-04 14:32:44.505 | (no unique labels) | [2024/03/04 14:32:44.505 +00:00] [WARN] [rootcoord/root_coord.go:1595] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 4"] |  
  |   | 2024-03-04 14:32:44.505 | (no unique labels) | [2024/03/04 14:32:44.505 +00:00] [WARN] [proxy/proxy.go:370] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 4"] |  
  |   | 2024-03-04 14:32:44.653 | (no unique labels) | [2024/03/04 14:32:44.653 +00:00] [INFO] [gc/gc_tuner.go:90] ["GC Tune done"] ["previous GOGC"=200] ["heapuse "=351] ["total memory"=2038] ["next GC"=1001] ["new GOGC"=200] [gc-pause=91.755µs] [gc-pause-end=1709562764652807392] |  
  |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=indexcoord] [LeaseID=218862229534671527] [error="etcdserver: requested lease not found"] |  
  |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=querynode] [LeaseID=218862229534671595] [error="etcdserver: requested lease not found"] |  
  |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] |  
  |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=proxy] [LeaseID=218862229534671579] [error="etcdserver: requested lease not found"] |  
  |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=indexnode] [LeaseID=218862229534671464] [error="etcdserver: requested lease not found"] |  
  |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] |  
  |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=datanode] [LeaseID=218862229534671554] [error="etcdserver: requested lease not found"] |  
  |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [ERROR] [proxy/proxy.go:170] ["Proxy disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/proxy.(*Proxy).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/proxy.go:170"] |  
  |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=querycoord] [LeaseID=218862229534671561] [error="etcdserver: requested lease not found"] |  
  |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:550] ["fail to retry keepAliveOnce"] [serverName=datacoord] [LeaseID=218862229534671530] [error="etcdserver: requested lease not found"] |  
  |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] |  
  |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [WARN] [sessionutil/session_util.go:878] ["connection lost detected, shuting down"] |  
  |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [ERROR] [querycoordv2/server.go:152] ["QueryCoord disconnected from etcd, process will exit"] [serverID=4] [stack="github.com/milvus-io/milvus/internal/querycoordv2.(*Server).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/server.go:152"] |  
  |   | 2024-03-04 14:32:44.664 | (no unique labels) | [2024/03/04 14:32:44.664 +00:00] [ERROR] [querynodev2/server.go:170] ["Query Node disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/querynodev2.(*QueryNode).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querynodev2/server.go:170"] |  

2024-03-04 14:32:44.665stdout[2024/03/04 14:32:44.664 +00:00] [ERROR] [datanode/data_node.go:200] ["Data Node disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/datanode.(*DataNode).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/data_node.go:200"] |   |   | 2024-03-04 14:32:44.665 | stdout | [2024/03/04 14:32:44.664 +00:00] [ERROR] [datanode/data_node.go:200] ["Data Node disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/datanode.(*DataNode).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/data_node.go:200"] |  
  |   | 2024-03-04 14:32:44.665 | stdout | [2024/03/04 14:32:44.664 +00:00] [ERROR] [datanode/data_node.go:200] ["Data Node disconnected from etcd, process will exit"] ["Server Id"=4] [stack="github.com/milvus-io/milvus/internal/datanode.(*DataNode).Register.func1\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/data_node.go:200"] |  
2024-03-04 14:33:04.704(no unique labels)[2024/03/04 14:33:04.704 +00:00] [INFO] [roles/roles.go:304] ["starting running Milvus components"]2024-03-04 14:33:04.704(no unique labels)[2024/03/04 14:33:04.704 +00:00] [INFO] [roles/roles.go:167] ["Enable Jemalloc"] ["Jemalloc Path"=/milvus/lib/libjemalloc.so]2024-03-04 14:33:04.719(no unique labels)[2024/03/04 14:33:04.718 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource]2024-03-04 14:33:04.719(no unique labels)[2024/03/04 14:33:04.719 +00:00] [DEBUG] [config/etcd_source.go:50] ["init etcd source"] [etcdInfo="{\"UseEmbed\":false,\"UseSSL\":false,\"Endpoints\":[\"multi-vector-corn-1709560800-51-etcd:2379\"],\"KeyPrefix\":\"by-dev\",\"CertFile\":\"/path/to/etcd-client.pem\",\"KeyFile\":\"/path/to/etcd-client-key.pem\",\"CaCertFile\":\"/path/to/ca.pem\",\"MinVersion\":\"1.3\",\"RefreshInterval\":5000000000}"]2024-03-04 14:33:04.719(no unique labels)[2024/03/04 14:33:04.719 +00:00] [INFO] [etcd/etcd_util.go:47] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[multi-vector-corn-1709560800-51-etcd:2379]"] [minVersion=1.3]2024-03-04 14:33:04.720(no unique labels)[2024/03/04 14:33:04.719 +00:00] [DEBUG] [config/etcd_source.go:86] ["etcd refreshConfigurations"] [prefix=by-dev/config] [endpoints="[multi-vector-corn-1709560800-51-etcd:2379]"]2024-03-04 14:33:04.723(no unique labels)[2024/03/04 14:33:04.723 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=EtcdSource]2024-03-04 14:33:04.724(no unique labels)[2024/03/04 14:33:04.724 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource] |   |   | 2024-03-04 14:33:04.704 | (no unique labels) | [2024/03/04 14:33:04.704 +00:00] [INFO] [roles/roles.go:304] ["starting running Milvus components"] |   |   |   | 2024-03-04 14:33:04.704 | (no unique labels) | [2024/03/04 14:33:04.704 +00:00] [INFO] [roles/roles.go:167] ["Enable Jemalloc"] ["Jemalloc Path"=/milvus/lib/libjemalloc.so] |   |   |   | 2024-03-04 14:33:04.719 | (no unique labels) | [2024/03/04 14:33:04.718 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource] |   |   |   | 2024-03-04 14:33:04.719 | (no unique labels) | [2024/03/04 14:33:04.719 +00:00] [DEBUG] [config/etcd_source.go:50] ["init etcd source"] [etcdInfo="{\"UseEmbed\":false,\"UseSSL\":false,\"Endpoints\":[\"multi-vector-corn-1709560800-51-etcd:2379\"],\"KeyPrefix\":\"by-dev\",\"CertFile\":\"/path/to/etcd-client.pem\",\"KeyFile\":\"/path/to/etcd-client-key.pem\",\"CaCertFile\":\"/path/to/ca.pem\",\"MinVersion\":\"1.3\",\"RefreshInterval\":5000000000}"] |   |   |   | 2024-03-04 14:33:04.719 | (no unique labels) | [2024/03/04 14:33:04.719 +00:00] [INFO] [etcd/etcd_util.go:47] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[multi-vector-corn-1709560800-51-etcd:2379]"] [minVersion=1.3] |   |   |   | 2024-03-04 14:33:04.720 | (no unique labels) | [2024/03/04 14:33:04.719 +00:00] [DEBUG] [config/etcd_source.go:86] ["etcd refreshConfigurations"] [prefix=by-dev/config] [endpoints="[multi-vector-corn-1709560800-51-etcd:2379]"] |   |   |   | 2024-03-04 14:33:04.723 | (no unique labels) | [2024/03/04 14:33:04.723 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=EtcdSource] |   |   |   | 2024-03-04 14:33:04.724 | (no unique labels) | [2024/03/04 14:33:04.724 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource]
  |   | 2024-03-04 14:33:04.704 | (no unique labels) | [2024/03/04 14:33:04.704 +00:00] [INFO] [roles/roles.go:304] ["starting running Milvus components"] |  
  |   | 2024-03-04 14:33:04.704 | (no unique labels) | [2024/03/04 14:33:04.704 +00:00] [INFO] [roles/roles.go:167] ["Enable Jemalloc"] ["Jemalloc Path"=/milvus/lib/libjemalloc.so] |  
  |   | 2024-03-04 14:33:04.719 | (no unique labels) | [2024/03/04 14:33:04.718 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource] |  
  |   | 2024-03-04 14:33:04.719 | (no unique labels) | [2024/03/04 14:33:04.719 +00:00] [DEBUG] [config/etcd_source.go:50] ["init etcd source"] [etcdInfo="{\"UseEmbed\":false,\"UseSSL\":false,\"Endpoints\":[\"multi-vector-corn-1709560800-51-etcd:2379\"],\"KeyPrefix\":\"by-dev\",\"CertFile\":\"/path/to/etcd-client.pem\",\"KeyFile\":\"/path/to/etcd-client-key.pem\",\"CaCertFile\":\"/path/to/ca.pem\",\"MinVersion\":\"1.3\",\"RefreshInterval\":5000000000}"] |  
  |   | 2024-03-04 14:33:04.719 | (no unique labels) | [2024/03/04 14:33:04.719 +00:00] [INFO] [etcd/etcd_util.go:47] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[multi-vector-corn-1709560800-51-etcd:2379]"] [minVersion=1.3] |  
  |   | 2024-03-04 14:33:04.720 | (no unique labels) | [2024/03/04 14:33:04.719 +00:00] [DEBUG] [config/etcd_source.go:86] ["etcd refreshConfigurations"] [prefix=by-dev/config] [endpoints="[multi-vector-corn-1709560800-51-etcd:2379]"] |  
  |   | 2024-03-04 14:33:04.723 | (no unique labels) | [2024/03/04 14:33:04.723 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=EtcdSource] |  
  |   | 2024-03-04 14:33:04.724 | (no unique labels) | [2024/03/04 14:33:04.724 +00:00] [DEBUG] [config/refresher.go:67] ["start refreshing configurations"] [source=FileSource]

But there is no abnormality found in etcd monitoring and logs.

截屏2024-03-05 12 06 29 截屏2024-03-05 12 09 19
xiaofan-luan commented 1 month ago

more like cpu full on node side.

might be throttleed by K8s?

wangting0128 commented 1 month ago

more like cpu full on node side.

might be throttleed by K8s?

From the monitoring point of view, the CPU and memory usage of the pod are not too high before and after the node restart time.

'standalone': {'resources': {'limits': {'cpu': '16.0',
                                                               'memory': '64Gi'},
                                                    'requests': {'cpu': '9.0',
                                                                 'memory': '33Gi'}}}
截屏2024-03-05 13 21 29
xiaofan-luan commented 1 month ago

it's already 16 and you required 16

wangting0128 commented 1 month ago

more like cpu full on node side.

might be throttleed by K8s?

The node where the pod is located has no abnormal monitoring indicators at the pod restart time point.

截屏2024-03-05 13 26 09
wangting0128 commented 1 month ago

it's already 16 and you required 16

pod restart at 14:33, the CPU usage at that time is about 2.5C image

wangting0128 commented 1 month ago

it's already 16 and you required 16

pod restart at 14:33, the CPU usage at that time is about 2.5C image

pod restart time:

  1. 2024-03-04 14:08:41.520 | stderr | Welcome to use Milvus!
  2. 2024-03-04 14:09:51.702 | stderr | Welcome to use Milvus!
  3. 2024-03-04 14:27:36.993 | stderr | Welcome to use Milvus!
  4. 2024-03-04 14:33:04.703 | stderr | Welcome to use Milvus!
  5. 2024-03-04 14:45:25.132 | stderr | Welcome to use Milvus! image
wangting0128 commented 1 month ago

Proxy disconnected from etcd please help to check if it is the same problem @longjiquan

argo task: inverted-corn-1711036800 test case name: test_inverted_locust_partition_key_dml_standalone

server:

[2024-03-21 19:24:33,265 -  INFO - fouram]: [Base] Deploy initial state: 
I0321 16:08:44.054900     406 request.go:665] Waited for 1.167310868s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/autoscaling/v1?timeout=32s
NAME                                                              READY   STATUS              RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-136800-2-57-7793-etcd-0                             1/1     Running             0                2m45s   10.104.30.115   4am-node38   <none>           <none>
inverted-corn-136800-2-57-7793-milvus-standalone-6778dd748s2tm9   1/1     Running             0                2m45s   10.104.25.106   4am-node30   <none>           <none>
inverted-corn-136800-2-57-7793-minio-cf8955d87-b75ss              1/1     Running             0                2m45s   10.104.30.114   4am-node38   <none>           <none> (base.py:257)
[2024-03-21 19:24:33,265 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'NAME|inverted-corn-136800-2-57-7793-milvus|inverted-corn-136800-2-57-7793-minio|inverted-corn-136800-2-57-7793-etcd|inverted-corn-136800-2-57-7793-pulsar|inverted-corn-136800-2-57-7793-zookeeper|inverted-corn-136800-2-57-7793-kafka|inverted-corn-136800-2-57-7793-log|inverted-corn-136800-2-57-7793-tikv'  (util_cmd.py:14)
[2024-03-21 19:24:43,534 -  INFO - fouram]: [CliClient] pod details of release(inverted-corn-136800-2-57-7793): 
 I0321 19:24:34.592913     506 request.go:665] Waited for 1.16829448s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/discovery.k8s.io/v1beta1?timeout=32s
NAME                                                              READY   STATUS             RESTARTS          AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-136800-2-57-7793-etcd-0                             1/1     Running            0                 3h18m   10.104.30.115   4am-node38   <none>           <none>
inverted-corn-136800-2-57-7793-milvus-standalone-6778dd748s2tm9   1/1     Running            1 (94m ago)       3h18m   10.104.25.106   4am-node30   <none>           <none>
inverted-corn-136800-2-57-7793-minio-cf8955d87-b75ss              1/1     Running            0                 3h18m   10.104.30.114   4am-node38   <none>           <none> 

milvus_restart_log.txt 5ebceaad-6daf-4b42-b1e2-d02c420d0651

client pod name: inverted-corn-1711036800-2185322252 client log: Error reporting time range 2024-03-21 17:49:49,202 ~ 2024-03-21 17:53:14,309 client.log

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `partition_key: scalar enable partition_key(num_partitions=128)`
            verify concurrent DML scenario which
            scalar `id`(pk) & `int64_1` created INVERTED index and enable partition_key on `int64_1` field

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'int64_1': is_partition_key
            2. build indexes:
                IVF_FLAT: 'float_vector'
                INVERTED: 'id', 'int64_1'
            3. insert 5 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - insert
                - delete
                - flush
                - release

test result:

[2024-03-21 19:24:08,772 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-03-21 19:24:08,772 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-03-21 19:24:08,772 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-21 19:24:08,772 -  INFO - fouram]: grpc     delete                                                                         28833     0(0.00%) |     53       1     513     35 |    2.67        0.00 (stats.py:789)
[2024-03-21 19:24:08,773 -  INFO - fouram]: grpc     flush                                                                          28469    20(0.07%) |   7183      54  273026   6400 |    2.64        0.00 (stats.py:789)
[2024-03-21 19:24:08,773 -  INFO - fouram]: grpc     insert                                                                         28700     0(0.00%) |    287      21   12349    110 |    2.66        0.00 (stats.py:789)
[2024-03-21 19:24:08,773 -  INFO - fouram]: grpc     release                                                                        28477     0(0.00%) |     52       0     648     34 |    2.64        0.00 (stats.py:789)
[2024-03-21 19:24:08,773 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-21 19:24:08,773 -  INFO - fouram]:          Aggregated                                                                    114479    20(0.02%) |   1885       0  273026     90 |   10.60        0.00 (stats.py:789)
[2024-03-21 19:24:08,773 -  INFO - fouram]:  (stats.py:790)
[2024-03-21 19:24:08,776 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_8c16m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '8.0',
                                                               'memory': '16Gi'},
                                                    'requests': {'cpu': '5.0',
                                                                 'memory': '9Gi'}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1,
                                'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone',
                                 'metrics': {'podMonitor': {'enabled': True}}},
                       'pulsar': {'enabled': False},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': '2.4-20240321-47868e9d-amd64'}}},
            'host': 'inverted-corn-136800-2-57-7793-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_inverted_locust_partition_key_dml_standalone',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'scalars_index': {'id': {'index_type': 'INVERTED'},
                                                                      'int64_1': {'index_type': 'INVERTED'}},
                                                    'scalars_params': {'int64_1': {'params': {'is_partition_key': True}}},
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 5000000,
                                                    'ni_per': 50000},
                                 'collection_params': {'other_fields': ['int64_1'],
                                                       'shards_num': 2,
                                                       'num_partitions': 128},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'IVF_FLAT',
                                                  'index_param': {'nlist': 1024}},
                                 'concurrent_params': {'concurrent_number': 20,
                                                       'during_time': '3h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'insert',
                                                       'weight': 1,
                                                       'params': {'nb': 10,
                                                                  'timeout': 180,
                                                                  'random_id': True,
                                                                  'random_vector': True,
                                                                  'varchar_filled': False,
                                                                  'start_id': 0}},
                                                      {'type': 'delete',
                                                       'weight': 1,
                                                       'params': {'expr': '',
                                                                  'delete_length': 9,
                                                                  'timeout': 30}},
                                                      {'type': 'flush',
                                                       'weight': 1,
                                                       'params': {'timeout': 180}},
                                                      {'type': 'release',
                                                       'weight': 1,
                                                       'params': {'timeout': 30}}]},
            'run_id': 2024032171652783,
            'datetime': '2024-03-21 16:06:05.342750',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 628.9663,
                                      'id': {'RT': 1.0116},
                                      'int64_1': {'RT': 1.0109}},
                            'insert': {'total_time': 167.189,
                                       'VPS': 29906.2737,
                                       'batch_time': 1.6719,
                                       'batch': 50000},
                            'flush': {'RT': 9.6348},
                            'load': {'RT': 7.5956},
                            'Locust': {'Aggregated': {'Requests': 114479,
                                                      'Fails': 20,
                                                      'RPS': 10.6,
                                                      'fail_s': 0.0,
                                                      'RT_max': 273026.89,
                                                      'RT_avg': 1885.18,
                                                      'TP50': 90,
                                                      'TP99': 12000.0},
                                       'delete': {'Requests': 28833,
                                                  'Fails': 0,
                                                  'RPS': 2.67,
                                                  'fail_s': 0.0,
                                                  'RT_max': 513.1,
                                                  'RT_avg': 53.43,
                                                  'TP50': 35,
                                                  'TP99': 260.0},
                                       'flush': {'Requests': 28469,
                                                 'Fails': 20,
                                                 'RPS': 2.64,
                                                 'fail_s': 0.0,
                                                 'RT_max': 273026.89,
                                                 'RT_avg': 7183.95,
                                                 'TP50': 6400.0,
                                                 'TP99': 20000.0},
                                       'insert': {'Requests': 28700,
                                                  'Fails': 0,
                                                  'RPS': 2.66,
                                                  'fail_s': 0.0,
                                                  'RT_max': 12349.54,
                                                  'RT_avg': 287.56,
                                                  'TP50': 110.0,
                                                  'TP99': 4200.0},
                                       'release': {'Requests': 28477,
                                                   'Fails': 0,
                                                   'RPS': 2.64,
                                                   'fail_s': 0.0,
                                                   'RT_max': 648.41,
                                                   'RT_avg': 52.66,
                                                   'TP50': 34,
                                                   'TP99': 250.0}}}}}
wangting0128 commented 1 month ago

Data Node disconnected from etcd

argo task: inverted-corn-1711123200 test case name: test_inverted_locust_partitions_dml_dql_standalone

server:

[2024-03-22 19:18:59,549 -  INFO - fouram]: [Base] Deploy initial state: 
I0322 16:11:02.401993     419 request.go:665] Waited for 1.16397117s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/apiregistration.k8s.io/v1?timeout=32s
NAME                                                              READY   STATUS             RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-123200-3-30-1407-etcd-0                             1/1     Running            0                2m27s   10.104.18.60    4am-node25   <none>           <none>
inverted-corn-123200-3-30-1407-milvus-standalone-75548cd9frtbjr   1/1     Running            0                2m27s   10.104.26.117   4am-node32   <none>           <none>
inverted-corn-123200-3-30-1407-minio-787555bd4d-cwxwr             1/1     Running            0                2m27s   10.104.15.217   4am-node20   <none>           <none> (base.py:257)
[2024-03-22 19:18:59,549 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'NAME|inverted-corn-123200-3-30-1407-milvus|inverted-corn-123200-3-30-1407-minio|inverted-corn-123200-3-30-1407-etcd|inverted-corn-123200-3-30-1407-pulsar|inverted-corn-123200-3-30-1407-zookeeper|inverted-corn-123200-3-30-1407-kafka|inverted-corn-123200-3-30-1407-log|inverted-corn-123200-3-30-1407-tikv'  (util_cmd.py:14)
[2024-03-22 19:19:09,580 -  INFO - fouram]: [CliClient] pod details of release(inverted-corn-123200-3-30-1407): 
 I0322 19:19:00.807332     550 request.go:665] Waited for 1.172468491s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/storage.k8s.io/v1beta1?timeout=32s
NAME                                                              READY   STATUS             RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-123200-3-30-1407-etcd-0                             1/1     Running            0                3h10m   10.104.18.60    4am-node25   <none>           <none>
inverted-corn-123200-3-30-1407-milvus-standalone-75548cd9frtbjr   1/1     Running            1 (2m40s ago)    3h10m   10.104.26.117   4am-node32   <none>           <none>
inverted-corn-123200-3-30-1407-minio-787555bd4d-cwxwr             1/1     Running            0                3h10m   10.104.15.217   4am-node20   <none>           <none> 

image milvus_restart.log

client pod name: inverted-corn-1711123200-249811846 client log: clien.log

Error reporting time range: 2024-03-22 19:15:17,679 ~ 2024-03-22 19:18:55,522

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `partition: collection has many partitions`
            verify concurrent DML & DQL scenario which
            scalar `id`(pk) & `int64_1` created INVERTED index and collection has 10 partitions

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'int64_1'
            2. build indexes:
                IVF_FLAT: 'float_vector'
                INVERTED: 'id', 'int64_1'
            3. insert 5 million data to 10 partitions
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - insert
                - delete
                - flush
                - load
                - search
                - hybrid_search
                - query

test result:

[2024-03-22 19:17:18,025 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-03-22 19:17:18,025 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-03-22 19:17:18,026 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-22 19:17:18,026 -  INFO - fouram]: grpc     delete                                                                          8929     4(0.04%) |    635       1   54603    100 |    0.83        0.00 (stats.py:789)
[2024-03-22 19:17:18,026 -  INFO - fouram]: grpc     flush                                                                           8968     0(0.00%) |   6977     236   28195   6300 |    0.83        0.00 (stats.py:789)
[2024-03-22 19:17:18,026 -  INFO - fouram]: grpc     hybrid_search                                                                   9001     8(0.09%) |   6247     266   61033   5800 |    0.84        0.00 (stats.py:789)
[2024-03-22 19:17:18,026 -  INFO - fouram]: grpc     insert                                                                          9013     1(0.01%) |    771      15   58594    190 |    0.84        0.00 (stats.py:789)
[2024-03-22 19:17:18,026 -  INFO - fouram]: grpc     load                                                                            8850     4(0.05%) |   1273       3   60003    380 |    0.82        0.00 (stats.py:789)
[2024-03-22 19:17:18,026 -  INFO - fouram]: grpc     query                                                                           9138     0(0.00%) |   4086      96   59228   3600 |    0.85        0.00 (stats.py:789)
[2024-03-22 19:17:18,026 -  INFO - fouram]: grpc     search                                                                          9023     0(0.00%) |   3684     573   14209   3300 |    0.84        0.00 (stats.py:789)
[2024-03-22 19:17:18,026 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-22 19:17:18,026 -  INFO - fouram]:          Aggregated                                                                     62922    17(0.03%) |   3389       1   61033   2800 |    5.84        0.00 (stats.py:789)
[2024-03-22 19:17:18,026 -  INFO - fouram]:  (stats.py:790)
[2024-03-22 19:17:18,029 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_16c16m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '16.0',
                                                               'memory': '16Gi'},
                                                    'requests': {'cpu': '9.0',
                                                                 'memory': '9Gi'}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1,
                                'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone',
                                 'metrics': {'podMonitor': {'enabled': True}}},
                       'pulsar': {'enabled': False},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': '2.4-20240322-99774548-amd64'}}},
            'host': 'inverted-corn-123200-3-30-1407-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_inverted_locust_partitions_dml_dql_standalone',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'scalars_index': {'id': {'index_type': 'INVERTED'},
                                                                      'int64_1': {'index_type': 'INVERTED'}},
                                                    'extra_partitions': {'partitions': 10,
                                                                         'data_repeated': False},
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 5000000,
                                                    'ni_per': 50000},
                                 'collection_params': {'other_fields': ['int64_1'],
                                                       'shards_num': 2},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'IVF_FLAT',
                                                  'index_param': {'nlist': 1024}},
                                 'concurrent_params': {'concurrent_number': 20,
                                                       'during_time': '3h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'insert',
                                                       'weight': 1,
                                                       'params': {'nb': 10,
                                                                  'timeout': 30,
                                                                  'random_id': True,
                                                                  'random_vector': True,
                                                                  'varchar_filled': False,
                                                                  'start_id': 5000000}},
                                                      {'type': 'delete',
                                                       'weight': 1,
                                                       'params': {'expr': '',
                                                                  'delete_length': 9,
                                                                  'timeout': 30}},
                                                      {'type': 'flush',
                                                       'weight': 1,
                                                       'params': {'timeout': 180}},
                                                      {'type': 'load',
                                                       'weight': 1,
                                                       'params': {'replica_number': 1,
                                                                  'timeout': 30}},
                                                      {'type': 'search',
                                                       'weight': 1,
                                                       'params': {'nq': 1000,
                                                                  'top_k': 10,
                                                                  'search_param': {'nprobe': 16},
                                                                  'expr': None,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'output_fields': None,
                                                                  'ignore_growing': False,
                                                                  'group_by_field': None,
                                                                  'timeout': 180,
                                                                  'random_data': True}},
                                                      {'type': 'hybrid_search',
                                                       'weight': 1,
                                                       'params': {'nq': 1,
                                                                  'top_k': 10,
                                                                  'reqs': [{'search_param': {'nprobe': 16},
                                                                            'anns_field': 'float_vector',
                                                                            'top_k': 2000},
                                                                           {'search_param': {'nprobe': 32},
                                                                            'anns_field': 'float_vector',
                                                                            'expr': 'int64_1 '
                                                                                    '> '
                                                                                    '-1 '
                                                                                    '&& '
                                                                                    'id '
                                                                                    '> '
                                                                                    '-1'},
                                                                           {'search_param': {'nprobe': 64},
                                                                            'anns_field': 'float_vector',
                                                                            'expr': 'id '
                                                                                    '> '
                                                                                    '10',
                                                                            'top_k': 60}],
                                                                  'rerank': {'WeightedRanker': [0.3,
                                                                                                0.4,
                                                                                                0.3]},
                                                                  'output_fields': ['*'],
                                                                  'ignore_growing': False,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'timeout': 60,
                                                                  'random_data': True}},
                                                      {'type': 'query',
                                                       'weight': 1,
                                                       'params': {'ids': None,
                                                                  'expr': 'int64_1 '
                                                                          '> '
                                                                          '-1 '
                                                                          '&&',
                                                                  'output_fields': ['*'],
                                                                  'offset': None,
                                                                  'limit': None,
                                                                  'ignore_growing': False,
                                                                  'partition_names': None,
                                                                  'timeout': 180,
                                                                  'random_data': True,
                                                                  'random_count': 20,
                                                                  'random_range': [0,
                                                                                   1000000.0],
                                                                  'field_name': 'id',
                                                                  'field_type': 'int64'}}]},
            'run_id': 2024032237213185,
            'datetime': '2024-03-22 16:08:41.486026',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 99.931,
                                      'id': {'RT': 1.0171},
                                      'int64_1': {'RT': 1.011}},
                            'insert': {'total_time': 175.9617,
                                       'VPS': 28679.9859,
                                       'batch_time': 1.7596,
                                       'batch': 50000.0},
                            'flush': {'RT': 3.1012},
                            'load': {'RT': 5.5419},
                            'Locust': {'Aggregated': {'Requests': 62922,
                                                      'Fails': 17,
                                                      'RPS': 5.84,
                                                      'fail_s': 0.0,
                                                      'RT_max': 61033.45,
                                                      'RT_avg': 3389.73,
                                                      'TP50': 2800.0,
                                                      'TP99': 14000.0},
                                       'delete': {'Requests': 8929,
                                                  'Fails': 4,
                                                  'RPS': 0.83,
                                                  'fail_s': 0.0,
                                                  'RT_max': 54603.54,
                                                  'RT_avg': 635.08,
                                                  'TP50': 100.0,
                                                  'TP99': 4900.0},
                                       'flush': {'Requests': 8968,
                                                 'Fails': 0,
                                                 'RPS': 0.83,
                                                 'fail_s': 0.0,
                                                 'RT_max': 28195.54,
                                                 'RT_avg': 6977.64,
                                                 'TP50': 6300.0,
                                                 'TP99': 18000.0},
                                       'hybrid_search': {'Requests': 9001,
                                                         'Fails': 8,
                                                         'RPS': 0.84,
                                                         'fail_s': 0.0,
                                                         'RT_max': 61033.45,
                                                         'RT_avg': 6247.61,
                                                         'TP50': 5800.0,
                                                         'TP99': 14000.0},
                                       'insert': {'Requests': 9013,
                                                  'Fails': 1,
                                                  'RPS': 0.84,
                                                  'fail_s': 0.0,
                                                  'RT_max': 58594.34,
                                                  'RT_avg': 771.37,
                                                  'TP50': 190.0,
                                                  'TP99': 5100.0},
                                       'load': {'Requests': 8850,
                                                'Fails': 4,
                                                'RPS': 0.82,
                                                'fail_s': 0.0,
                                                'RT_max': 60003.05,
                                                'RT_avg': 1273.14,
                                                'TP50': 380.0,
                                                'TP99': 8400.0},
                                       'query': {'Requests': 9138,
                                                 'Fails': 0,
                                                 'RPS': 0.85,
                                                 'fail_s': 0.0,
                                                 'RT_max': 59228.6,
                                                 'RT_avg': 4086.94,
                                                 'TP50': 3600.0,
                                                 'TP99': 12000.0},
                                       'search': {'Requests': 9023,
                                                  'Fails': 0,
                                                  'RPS': 0.84,
                                                  'fail_s': 0.0,
                                                  'RT_max': 14209.91,
                                                  'RT_avg': 3684.13,
                                                  'TP50': 3300.0,
                                                  'TP99': 9600.0}}}}}
wangting0128 commented 1 month ago

Different scene,same error

argo task:multi-vector-corn-2vswm test case name:test_hybrid_search_locust_shard1_float_dql_diskann_standalone

server:

NAME                                                              READY   STATUS                            RESTARTS          AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-corn-2vswm-2s1-etcd-0                                1/1     Running                           0                 26h     10.104.29.93    4am-node35   <none>           <none>
multi-vector-corn-2vswm-2s1-milvus-standalone-6f744444f-bdnrp     1/1     Running                           3 (11h ago)       26h     10.104.25.60    4am-node30   <none>           <none>
multi-vector-corn-2vswm-2s1-minio-785d495c47-wp26r                1/1     Running                           0                 26h     10.104.29.85    4am-node35   <none>           <none>

image

milvus_restart.log

截屏2024-03-26 15 16 06

client pod name: multi-vector-corn-2vswm-1562047643 client log: client.log

get_index_state failed

from 

[2024-03-25 19:24:58,016 - WARNING - fouram]: [get_index_state] retry:4, cost: 0.27s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.255.95.86:19530: Failed to connect to remote host: Connection refused> (decorators.py:100)

 to

[2024-03-25 19:59:17,355 - WARNING - fouram]: [get_index_state] retry:49, cost: 3.00s, reason: <_MultiThreadedRendezvous: StatusCode.UNAVAILABLE, failed to connect to all addresses; last error: UNKNOWN: ipv4:10.255.95.86:19530: Failed to connect to remote host: Connection refused> (decorators.py:100)

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `shard_num=1, float_vector DQL`
            verify concurrent DQL scenario which has 4 float_vector fields(DISKANN) and 60 scalar fields

        :test steps:
            1. create collection with fields:
                'float_vector': 2048dim,
                'float_vector_1': 2048dim,
                'float_vector_2': 2048dim,
                'float_vector_3': 2048dim,
                all scalar fields: varchar max_length=10, array max_capacity=7
            2. build indexes:
                DISKANN: 'float_vector', 'float_vector_1', 'float_vector_2', 'float_vector_3'
                default_scalar_index: 'int64_1'
                INVERTED: 'id', 'bool_3'
            3. insert 100k data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
                replica: 1
            7. concurrent request:
                - hybrid_search

test result:

[2024-03-25 23:25:08,505 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-03-25 23:25:08,505 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-03-25 23:25:08,505 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-25 23:25:08,505 -  INFO - fouram]: grpc     hybrid_search                                                                   1710     4(0.23%) |  41864   23354   60005  41000 |    0.48        0.00 (stats.py:789)
[2024-03-25 23:25:08,505 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-03-25 23:25:08,505 -  INFO - fouram]:          Aggregated                                                                      1710     4(0.23%) |  41864   23354   60005  41000 |    0.48        0.00 (stats.py:789)
[2024-03-25 23:25:08,506 -  INFO - fouram]:  (stats.py:790)
[2024-03-25 23:25:08,511 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_16c64m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '16.0',
                                                               'memory': '64Gi'},
                                                    'requests': {'cpu': '9.0',
                                                                 'memory': '33Gi'}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1,
                                'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone',
                                 'metrics': {'podMonitor': {'enabled': True}}},
                       'pulsar': {'enabled': False},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': '2.4-20240325-6e0baa47-amd64'}}},
            'host': 'multi-vector-corn-2vswm-2s1-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_hybrid_search_locust_shard1_float_dql_diskann_standalone',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 2048,
                                                    'max_length': 10,
                                                    'scalars_index': {'int64_1': {},
                                                                      'id': {'index_type': 'INVERTED'},
                                                                      'bool_3': {'index_type': 'INVERTED'}},
                                                    'vectors_index': {'float_vector_1': {'index_type': 'DISKANN',
                                                                                         'index_param': {},
                                                                                         'metric_type': 'L2'},
                                                                      'float_vector_2': {'index_type': 'DISKANN',
                                                                                         'index_param': {},
                                                                                         'metric_type': 'L2'},
                                                                      'float_vector_3': {'index_type': 'DISKANN',
                                                                                         'index_param': {},
                                                                                         'metric_type': 'L2'}},
                                                    'scalars_params': {'array_int8_1': {'params': {'max_capacity': 7}},
                                                                       'array_int16_1': {'params': {'max_capacity': 7}},
                                                                       'array_int32_1': {'params': {'max_capacity': 7}},
                                                                       'array_int64_1': {'params': {'max_capacity': 7}},
                                                                       'array_double_1': {'params': {'max_capacity': 7}},
                                                                       'array_float_1': {'params': {'max_capacity': 7}},
                                                                       'array_varchar_1': {'params': {'max_capacity': 7}},
                                                                       'array_bool_1': {'params': {'max_capacity': 7}},
                                                                       'array_int8_2': {'params': {'max_capacity': 7}},
                                                                       'array_int16_2': {'params': {'max_capacity': 7}},
                                                                       'array_int32_2': {'params': {'max_capacity': 7}},
                                                                       'array_int64_2': {'params': {'max_capacity': 7}},
                                                                       'array_double_2': {'params': {'max_capacity': 7}},
                                                                       'array_float_2': {'params': {'max_capacity': 7}},
                                                                       'array_varchar_2': {'params': {'max_capacity': 7}},
                                                                       'array_bool_2': {'params': {'max_capacity': 7}},
                                                                       'array_int8_3': {'params': {'max_capacity': 7}},
                                                                       'array_int16_3': {'params': {'max_capacity': 7}},
                                                                       'array_int32_3': {'params': {'max_capacity': 7}},
                                                                       'array_int64_3': {'params': {'max_capacity': 7}},
                                                                       'array_double_3': {'params': {'max_capacity': 7}},
                                                                       'array_float_3': {'params': {'max_capacity': 7}},
                                                                       'array_varchar_3': {'params': {'max_capacity': 7}},
                                                                       'array_bool_3': {'params': {'max_capacity': 7}}},
                                                    'dataset_name': 'local',
                                                    'dataset_size': 1500000,
                                                    'ni_per': 100},
                                 'collection_params': {'other_fields': ['float_vector_1',
                                                                        'float_vector_2',
                                                                        'float_vector_3',
                                                                        'int8_1',
                                                                        'int16_1',
                                                                        'int32_1',
                                                                        'int64_1',
                                                                        'double_1',
                                                                        'float_1',
                                                                        'varchar_1',
                                                                        'bool_1',
                                                                        'json_1',
                                                                        'array_int8_1',
                                                                        'array_int16_1',
                                                                        'array_int32_1',
                                                                        'array_int64_1',
                                                                        'array_double_1',
                                                                        'array_float_1',
                                                                        'array_varchar_1',
                                                                        'array_bool_1',
                                                                        'int8_2',
                                                                        'int16_2',
                                                                        'int32_2',
                                                                        'int64_2',
                                                                        'double_2',
                                                                        'float_2',
                                                                        'varchar_2',
                                                                        'bool_2',
                                                                        'json_2',
                                                                        'array_int8_2',
                                                                        'array_int16_2',
                                                                        'array_int32_2',
                                                                        'array_int64_2',
                                                                        'array_double_2',
                                                                        'array_float_2',
                                                                        'array_varchar_2',
                                                                        'array_bool_2',
                                                                        'int8_3',
                                                                        'int16_3',
                                                                        'int32_3',
                                                                        'int64_3',
                                                                        'double_3',
                                                                        'float_3',
                                                                        'varchar_3',
                                                                        'bool_3',
                                                                        'json_3',
                                                                        'array_int8_3',
                                                                        'array_int16_3',
                                                                        'array_int32_3',
                                                                        'array_int64_3',
                                                                        'array_double_3',
                                                                        'array_float_3',
                                                                        'array_varchar_3',
                                                                        'array_bool_3',
                                                                        'varchar_tail_1',
                                                                        'varchar_tail_2',
                                                                        'varchar_tail_3',
                                                                        'varchar_tail_4',
                                                                        'varchar_tail_5',
                                                                        'varchar_tail_6',
                                                                        'varchar_tail_7',
                                                                        'varchar_tail_8'],
                                                       'shards_num': 1},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'DISKANN',
                                                  'index_param': {}},
                                 'concurrent_params': {'concurrent_number': 20,
                                                       'during_time': '1h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'hybrid_search',
                                                       'weight': 1,
                                                       'params': {'nq': 1,
                                                                  'top_k': 100,
                                                                  'reqs': [{'search_param': {'search_list': 30},
                                                                            'anns_field': 'float_vector',
                                                                            'expr': 'id '
                                                                                    '> '
                                                                                    '150000',
                                                                            'top_k': 10},
                                                                           {'search_param': {'search_list': 100},
                                                                            'anns_field': 'float_vector_1',
                                                                            'expr': 'int64_1 '
                                                                                    '<= '
                                                                                    '1350000',
                                                                            'top_k': 50},
                                                                           {'search_param': {'search_list': 1500},
                                                                            'anns_field': 'float_vector_2',
                                                                            'expr': 'array_length(array_int8_2) '
                                                                                    '== '
                                                                                    '7',
                                                                            'top_k': 1000},
                                                                           {'search_param': {'search_list': 20000},
                                                                            'anns_field': 'float_vector_3',
                                                                            'expr': 'bool_3 '
                                                                                    '== '
                                                                                    'True',
                                                                            'top_k': 16384}],
                                                                  'rerank': {'RRFRanker': []},
                                                                  'output_fields': ['float_vector'],
                                                                  'ignore_growing': False,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'timeout': 60,
                                                                  'random_data': True}}]},
            'run_id': 2024032505807040,
            'datetime': '2024-03-25 04:23:00.273429',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 14723.9804,
                                      'float_vector_1': {'RT': 17423.1928},
                                      'float_vector_2': {'RT': 20895.6376},
                                      'float_vector_3': {'RT': 1779.0977},
                                      'int64_1': {'RT': 1.0256},
                                      'id': {'RT': 1.0153},
                                      'bool_3': {'RT': 1.0149}},
                            'insert': {'total_time': 4502.0788,
                                       'VPS': 333.1794,
                                       'batch_time': 0.3001,
                                       'batch': 100},
                            'flush': {'RT': 3.5202},
                            'load': {'RT': 101.2388},
                            'Locust': {'Aggregated': {'Requests': 1710,
                                                      'Fails': 4,
                                                      'RPS': 0.48,
                                                      'fail_s': 0.0,
                                                      'RT_max': 60005.87,
                                                      'RT_avg': 41864.82,
                                                      'TP50': 41000.0,
                                                      'TP99': 56000.0},
                                       'hybrid_search': {'Requests': 1710,
                                                         'Fails': 4,
                                                         'RPS': 0.48,
                                                         'fail_s': 0.0,
                                                         'RT_max': 60005.87,
                                                         'RT_avg': 41864.82,
                                                         'TP50': 41000.0,
                                                        'TP99': 56000.0}}}}}

@longjiquan

longjiquan commented 1 month ago

I noticed that the goroutines and OS threads are very high compared to normal instances: heLB6hNcvM

Below are the goroutines and OS threads of normal instances: FQBQZgqJWq

longjiquan commented 1 month ago

Also, the querynode under cluster mode also encountered this issue, so maybe index building is not the root cause.

xiaofan-luan commented 1 month ago

goroutine number might be fine. any idea about where the os thread created?

wangting0128 commented 3 weeks ago

Root Coord disconnected from etcd

argo task: inverted-corn-1712332800 test case name: test_inverted_locust_partition_key_dml_standalone

server:

NAME                                                              READY   STATUS                            RESTARTS          AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-132800-2-24-5637-etcd-0                             1/1     Running                           0                 3m39s   10.104.33.66    4am-node36   <none>           <none>
inverted-corn-132800-2-24-5637-milvus-standalone-5ff4877b7vbcf5   1/1     Running                           0                 3m39s   10.104.28.109   4am-node33   <none>           <none>
inverted-corn-132800-2-24-5637-minio-7d877f7cb4-994sk             1/1     Running                           0                 3m39s   10.104.33.65    4am-node36   <none>           <none> (base.py:257)
[2024-04-05 19:34:45,691 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'NAME|inverted-corn-132800-2-24-5637-milvus|inverted-corn-132800-2-24-5637-minio|inverted-corn-132800-2-24-5637-etcd|inverted-corn-132800-2-24-5637-pulsar|inverted-corn-132800-2-24-5637-zookeeper|inverted-corn-132800-2-24-5637-kafka|inverted-corn-132800-2-24-5637-log|inverted-corn-132800-2-24-5637-tikv'  (util_cmd.py:14)
[2024-04-05 19:34:55,683 -  INFO - fouram]: [CliClient] pod details of release(inverted-corn-132800-2-24-5637): 
 I0405 19:34:46.951623     566 request.go:665] Waited for 1.161921833s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/eventtracker.litmuschaos.io/v1?timeout=32s
NAME                                                              READY   STATUS                            RESTARTS          AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-132800-2-24-5637-etcd-0                             1/1     Running                           0                 3h31m   10.104.33.66    4am-node36   <none>           <none>
inverted-corn-132800-2-24-5637-milvus-standalone-5ff4877b7vbcf5   0/1     CrashLoopBackOff                  8 (2m47s ago)     3h31m   10.104.28.109   4am-node33   <none>           <none>
inverted-corn-132800-2-24-5637-minio-7d877f7cb4-994sk             1/1     Running                           0                 3h31m   10.104.33.65    4am-node36   <none>           <none> 

image

client log: client failed to connect to milvus image

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `partition_key: scalar enable partition_key(num_partitions=128)`
            verify concurrent DML scenario which
            scalar `id`(pk) & `int64_1` created INVERTED index and enable partition_key on `int64_1` field

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'int64_1': is_partition_key
            2. build indexes:
                IVF_FLAT: 'float_vector'
                INVERTED: 'id', 'int64_1'
            3. insert 5 million data <- connect failed
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - insert
                - delete
                - flush
                - release

test result:

[2024-04-05 19:30:55,016 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-04-05 19:30:55,016 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-04-05 19:30:55,016 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-05 19:30:55,017 -  INFO - fouram]: grpc     delete                                                                         26860    61(0.23%) |     90       0   30851      6 |    2.49        0.01 (stats.py:789)
[2024-04-05 19:30:55,017 -  INFO - fouram]: grpc     flush                                                                          26790    67(0.25%) |   7165     266  279927   6300 |    2.48        0.01 (stats.py:789)
[2024-04-05 19:30:55,017 -  INFO - fouram]: grpc     insert                                                                         26797    47(0.18%) |    659      20  181076    130 |    2.49        0.00 (stats.py:789)
[2024-04-05 19:30:55,017 -  INFO - fouram]: grpc     release                                                                        26961    50(0.19%) |     76       0   30714      5 |    2.50        0.00 (stats.py:789)
[2024-04-05 19:30:55,017 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-05 19:30:55,017 -  INFO - fouram]:          Aggregated                                                                    107408   225(0.21%) |   1993       0  279927     58 |    9.96        0.02 (stats.py:789)
[2024-04-05 19:30:55,017 -  INFO - fouram]:  (stats.py:790)
[2024-04-05 19:30:55,020 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_8c16m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '8.0',
                                                               'memory': '16Gi'},
                                                    'requests': {'cpu': '5.0',
                                                                 'memory': '9Gi'}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1,
                                'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone',
                                 'metrics': {'podMonitor': {'enabled': True}}},
                       'pulsar': {'enabled': False},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': '2.4-20240405-7d721ae7-amd64'}}},
            'host': 'inverted-corn-132800-2-24-5637-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_inverted_locust_partition_key_dml_standalone',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'scalars_index': {'id': {'index_type': 'INVERTED'},
                                                                      'int64_1': {'index_type': 'INVERTED'}},
                                                    'scalars_params': {'int64_1': {'params': {'is_partition_key': True}}},
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 5000000,
                                                    'ni_per': 50000},
                                 'collection_params': {'other_fields': ['int64_1'],
                                                       'shards_num': 2,
                                                       'num_partitions': 128},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'IVF_FLAT',
                                                  'index_param': {'nlist': 1024}},
                                 'concurrent_params': {'concurrent_number': 20,
                                                       'during_time': '3h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'insert',
                                                       'weight': 1,
                                                       'params': {'nb': 10,
                                                                  'timeout': 180,
                                                                  'random_id': True,
                                                                  'random_vector': True,
                                                                  'varchar_filled': False,
                                                                  'start_id': 0}},
                                                      {'type': 'delete',
                                                       'weight': 1,
                                                       'params': {'expr': '',
                                                                  'delete_length': 9,
                                                                  'timeout': 30}},
                                                      {'type': 'flush',
                                                       'weight': 1,
                                                       'params': {'timeout': 180}},
                                                      {'type': 'release',
                                                       'weight': 1,
                                                       'params': {'timeout': 30}}]},
            'run_id': 2024040529862467,
            'datetime': '2024-04-05 16:03:06.898874',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 746.6375,
                                      'id': {'RT': 1.0163},
                                      'int64_1': {'RT': 1.011}},
                            'insert': {'total_time': 609.2848,
                                       'VPS': 8206.3429,
                                       'batch_time': 6.0928,
                                       'batch': 50000},
                            'flush': {'RT': 7.417},
                            'load': {'RT': 8.1251},
                            'Locust': {'Aggregated': {'Requests': 107408,
                                                      'Fails': 225,
                                                      'RPS': 9.96,
                                                      'fail_s': 0.0,
                                                      'RT_max': 279927.68,
                                                      'RT_avg': 1993.75,
                                                      'TP50': 58,
                                                      'TP99': 12000.0},
                                       'delete': {'Requests': 26860,
                                                  'Fails': 61,
                                                  'RPS': 2.49,
                                                  'fail_s': 0.0,
                                                  'RT_max': 30851.74,
                                                  'RT_avg': 90.57,
                                                  'TP50': 6,
                                                  'TP99': 130.0},
                                       'flush': {'Requests': 26790,
                                                 'Fails': 67,
                                                 'RPS': 2.48,
                                                 'fail_s': 0.0,
                                                 'RT_max': 279927.68,
                                                 'RT_avg': 7165.75,
                                                 'TP50': 6300.0,
                                                 'TP99': 14000.0},
                                       'insert': {'Requests': 26797,
                                                  'Fails': 47,
                                                  'RPS': 2.49,
                                                  'fail_s': 0.0,
                                                  'RT_max': 181076.98,
                                                  'RT_avg': 659.48,
                                                  'TP50': 130.0,
                                                  'TP99': 3300.0},
                                       'release': {'Requests': 26961,
                                                   'Fails': 50,
                                                   'RPS': 2.5,
                                                   'fail_s': 0.0,
                                                   'RT_max': 30714.13,
                                                   'RT_avg': 76.77,
                                                   'TP50': 5,
                                                   'TP99': 130.0}}}}} 
wangting0128 commented 3 weeks ago

Query Node disconnected from etcd

argo task: multi-vector-comp-2-2lbmt

server: init stats

NAME                                                              READY   STATUS                            RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-comp-2-2lbmt-etcd-0                                  1/1     Running                           0               6m15s   10.104.27.159   4am-node31   <none>           <none>
multi-vector-comp-2-2lbmt-etcd-1                                  1/1     Running                           0               6m15s   10.104.17.19    4am-node23   <none>           <none>
multi-vector-comp-2-2lbmt-etcd-2                                  1/1     Running                           0               6m14s   10.104.33.9     4am-node36   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-datacoord-b856bbbbb-xdxcc        1/1     Running                           0               6m15s   10.104.31.174   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-datanode-6b7955f57c-fxdgn        1/1     Running                           0               6m15s   10.104.31.177   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-indexcoord-7659494967-mzxvm      1/1     Running                           0               6m15s   10.104.31.178   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-indexnode-b9d668fc8-zzbqc        1/1     Running                           0               6m15s   10.104.19.3     4am-node28   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-proxy-dd7c87f67-zbsz4            1/1     Running                           1 (2m9s ago)    6m15s   10.104.31.175   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-querycoord-7976f6f7c6-rtf57      1/1     Running                           0               6m15s   10.104.31.176   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-querynode-6d8c845bd8-qnfjn       1/1     Running                           0               6m15s   10.104.26.77    4am-node32   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-rootcoord-5b97c8788b-v852j       1/1     Running                           0               6m15s   10.104.31.173   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-minio-0                                 1/1     Running                           0               6m15s   10.104.18.143   4am-node25   <none>           <none>
multi-vector-comp-2-2lbmt-minio-1                                 1/1     Running                           0               6m15s   10.104.31.182   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-minio-2                                 1/1     Running                           0               6m15s   10.104.27.162   4am-node31   <none>           <none>
multi-vector-comp-2-2lbmt-minio-3                                 1/1     Running                           0               6m14s   10.104.32.210   4am-node39   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-0                         1/1     Running                           0               6m15s   10.104.34.219   4am-node37   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-1                         1/1     Running                           0               6m15s   10.104.15.173   4am-node20   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-2                         1/1     Running                           0               6m14s   10.104.23.197   4am-node27   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-init-g48qr                0/1     Completed                         0               6m15s   10.104.4.124    4am-node11   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-broker-0                         1/1     Running                           0               6m15s   10.104.5.121    4am-node12   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-proxy-0                          1/1     Running                           0               6m15s   10.104.6.98     4am-node13   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-pulsar-init-h25t9                0/1     Completed                         0               6m15s   10.104.6.96     4am-node13   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-recovery-0                       1/1     Running                           0               6m15s   10.104.9.210    4am-node14   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-0                      1/1     Running                           0               6m15s   10.104.31.181   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-1                      1/1     Running                           0               5m27s   10.104.17.21    4am-node23   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-2                      1/1     Running                           0               4m52s   10.104.27.165   4am-node31   <none>           <none>

after testing

NAME                                                              READY   STATUS                            RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-comp-2-2lbmt-etcd-0                                  1/1     Running                           0               39m     10.104.27.159   4am-node31   <none>           <none>
multi-vector-comp-2-2lbmt-etcd-1                                  1/1     Running                           0               39m     10.104.17.19    4am-node23   <none>           <none>
multi-vector-comp-2-2lbmt-etcd-2                                  1/1     Running                           0               39m     10.104.33.9     4am-node36   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-datacoord-b856bbbbb-xdxcc        1/1     Running                           0               39m     10.104.31.174   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-datanode-6b7955f57c-fxdgn        1/1     Running                           0               39m     10.104.31.177   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-indexcoord-7659494967-mzxvm      1/1     Running                           0               39m     10.104.31.178   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-indexnode-b9d668fc8-zzbqc        1/1     Running                           0               39m     10.104.19.3     4am-node28   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-proxy-dd7c87f67-zbsz4            1/1     Running                           1 (35m ago)     39m     10.104.31.175   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-querycoord-7976f6f7c6-rtf57      1/1     Running                           0               39m     10.104.31.176   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-querynode-6d8c845bd8-qnfjn       1/1     Running                           1 (30m ago)     39m     10.104.26.77    4am-node32   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-rootcoord-5b97c8788b-v852j       1/1     Running                           0               39m     10.104.31.173   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-minio-0                                 1/1     Running                           0               39m     10.104.18.143   4am-node25   <none>           <none>
multi-vector-comp-2-2lbmt-minio-1                                 1/1     Running                           0               39m     10.104.31.182   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-minio-2                                 1/1     Running                           0               39m     10.104.27.162   4am-node31   <none>           <none>
multi-vector-comp-2-2lbmt-minio-3                                 1/1     Running                           0               39m     10.104.32.210   4am-node39   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-0                         1/1     Running                           0               39m     10.104.34.219   4am-node37   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-1                         1/1     Running                           0               39m     10.104.15.173   4am-node20   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-2                         1/1     Running                           0               39m     10.104.23.197   4am-node27   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-init-g48qr                0/1     Completed                         0               39m     10.104.4.124    4am-node11   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-broker-0                         1/1     Running                           0               39m     10.104.5.121    4am-node12   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-proxy-0                          1/1     Running                           0               39m     10.104.6.98     4am-node13   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-pulsar-init-h25t9                0/1     Completed                         0               39m     10.104.6.96     4am-node13   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-recovery-0                       1/1     Running                           0               39m     10.104.9.210    4am-node14   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-0                      1/1     Running                           0               39m     10.104.31.181   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-1                      1/1     Running                           0               39m     10.104.17.21    4am-node23   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-2                      1/1     Running                           0               38m     10.104.27.165   4am-node31   <none>           <none> 

image

client pod name: multi-vector-comp-2-2lbmt-1555640872 client log: client.log client search error: 2024-04-09 11:30:40,756 ~ 2024-04-09 11:33:57,730

test result:

[2024-04-09 11:59:02,350 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-04-09 11:59:02,351 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-04-09 11:59:02,351 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-09 11:59:02,351 -  INFO - fouram]: grpc     search                                                                           765    70(9.15%) | 208749   21482  327504 189000 |    0.42        0.04 (stats.py:789)
[2024-04-09 11:59:02,351 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-09 11:59:02,351 -  INFO - fouram]:          Aggregated                                                                       765    70(9.15%) | 208749   21482  327504 189000 |    0.42        0.04 (stats.py:789)
[2024-04-09 11:59:02,351 -  INFO - fouram]:  (stats.py:790)
[2024-04-09 11:59:02,353 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'cluster',
            'config_name': 'cluster_2c2m',
            'config': {'queryNode': {'resources': {'limits': {'cpu': '16.0',
                                                              'memory': '32Gi'},
                                                   'requests': {'cpu': '9.0',
                                                                'memory': '17Gi'}},
                                     'replicas': 1},
                       'indexNode': {'resources': {'limits': {'cpu': '8.0',
                                                              'memory': '8Gi'},
                                                   'requests': {'cpu': '5.0',
                                                                'memory': '5Gi'}},
                                     'replicas': 1},
                       'dataNode': {'resources': {'limits': {'cpu': '2.0',
                                                             'memory': '2Gi'},
                                                  'requests': {'cpu': '2.0',
                                                               'memory': '2Gi'}}},
                       'cluster': {'enabled': True},
                       'pulsar': {},
                       'kafka': {},
                       'minio': {'metrics': {'podMonitor': {'enabled': True}}},
                       'etcd': {'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': 'v2.3.12'}}},
            'host': 'multi-vector-comp-2-2lbmt-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_concurrent_locust_hnsw_search_cluster',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 1000000,
                                                    'ni_per': 50000},
                                 'collection_params': {'other_fields': [],
                                                       'shards_num': 2},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'HNSW',
                                                  'index_param': {'M': 8,
                                                                  'efConstruction': 200}},
                                 'concurrent_params': {'concurrent_number': 100,
                                                       'during_time': 1800,
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'search',
                                                       'weight': 1,
                                                       'params': {'nq': 10000,
                                                                  'top_k': 10,
                                                                  'search_param': {'ef': 16},
                                                                  'expr': None,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'output_fields': None,
                                                                  'ignore_growing': False,
                                                                  'group_by_field': None,
                                                                  'timeout': 3600,
                                                                  'random_data': True}}]},
            'run_id': 2024040916443942,
            'datetime': '2024-04-09 11:20:44.380762',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 20.6699},
                            'insert': {'total_time': 35.1925,
                                       'VPS': 28415.1453,
                                       'batch_time': 1.7596,
                                       'batch': 50000},
                            'flush': {'RT': 2.5403},
                            'load': {'RT': 5.1836},
                            'Locust': {'Aggregated': {'Requests': 765,
                                                      'Fails': 70,
                                                      'RPS': 0.42,
                                                      'fail_s': 0.09,
                                                      'RT_max': 327504.59,
                                                      'RT_avg': 208749.74,
                                                      'TP50': 189000.0,
                                                      'TP99': 322000.0},
                                       'search': {'Requests': 765,
                                                  'Fails': 70,
                                                  'RPS': 0.42,
                                                  'fail_s': 0.09,
                                                  'RT_max': 327504.59,
                                                  'RT_avg': 208749.74,
                                                  'TP50': 189000.0,
                                                  'TP99': 322000.0}}}}}
wangting0128 commented 1 week ago

Build index failed

argo task: multi-vector-based-scene1-f8pw5 test case name: test_hybrid_search_serial_ivf_flat_hnsw_standalone

server: init stats

NAME                                                              READY   STATUS                            RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-based-scene1-f8pw5-etcd-0                            1/1     Running                           0               2m33s   10.104.15.16    4am-node20   <none>           <none>
multi-vector-based-scene1-f8pw5-milvus-standalone-65cf6f86z2s9q   1/1     Running                           0               2m33s   10.104.26.10    4am-node32   <none>           <none>
multi-vector-based-scene1-f8pw5-minio-6f9756b97c-mfzmr            1/1     Running                           0               2m33s   10.104.15.19    4am-node20   <none>           <none>

after testing

NAME                                                              READY   STATUS                            RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-based-scene1-f8pw5-etcd-0                            1/1     Running                           0               24h     10.104.15.16    4am-node20   <none>           <none>
multi-vector-based-scene1-f8pw5-milvus-standalone-65cf6f86z2s9q   1/1     Running                           3 (17h ago)     24h     10.104.26.10    4am-node32   <none>           <none>
multi-vector-based-scene1-f8pw5-minio-6f9756b97c-mfzmr            1/1     Running                           0               24h     10.104.15.19    4am-node20   <none>           <none>
截屏2024-04-17 11 29 49

client pod name: multi-vector-based-scene1-f8pw5-19328334 client log:

截屏2024-04-17 11 24 41

test steps:

1. create a collection, 8 fields: "id", "float_vector", "float_vector_1", "int64_1", "int64_2", "float_1", "double_1", "varchar_1"
2. build index
   IVF_FLAT: float_vector
   HNSW: float_vector_1
   INVERTED: "int64_1", "int64_2", "float_1", "double_1", "varchar_1"
3. insert 25m data
4. flush collection
5. build index again with the same params <- failed

server config:

截屏2024-04-17 11 29 28

@longjiquan please help to check, thanks

xiaofan-luan commented 1 week ago

@wangting0128 it seems to be all different issues. maybe we can assign different people?

xiaofan-luan commented 1 week ago

func (s storageV1Serializer) setTaskMeta(task SyncTask, pack *SyncPack) { task.WithCollectionID(pack.collectionID). WithPartitionID(pack.partitionID). WithChannelName(pack.channelName). WithSegmentID(pack.segmentID). WithBatchSize(pack.batchSize). WithSchema(s.metacache.Schema()). WithStartPosition(pack.startPosition). WithCheckpoint(pack.checkpoint). WithLevel(pack.level). WithTimeRange(pack.tsFrom, pack.tsTo). WithMetaCache(s.metacache). WithMetaWriter(s.metaWriter). WithFailureCallback(func(err error) { // TODO could change to unsub channel in the future panic(err) }) } @congqixia we need to refine the flush logic. it should be retried forever but not panic easily

xiaofan-luan commented 1 week ago

also the core of this issue is cpu is too high under that situation. @longjiquan is there any analysis result? Is there any where we failed to limit the cpu cores?

wangting0128 commented 1 week ago

@wangting0128 it seems to be all different issues. maybe we can assign different people?

Got it! Reopened a new issue: #32400

longjiquan commented 1 week ago

Maybe this issue is not caused by the inverted index. I noticed that there is no any inverted index building job before the Milvus disconnected from etcd. See the logs.

xiaofan-luan commented 1 week ago

One possibility is search becomes too slow on such segments and block the GO P thread. having 64K length varchar is bad for milvus because the segment also become huge

xiaofan-luan commented 1 week ago

@longjiquan please check the execution time for each search

wangting0128 commented 1 week ago

One possibility is search becomes too slow on such segments and block the GO P thread. having 64K length varchar is bad for milvus because the segment also become huge

Synchronously, in the test scenario here, the values of the varchar field are all integers converted int to strings, and there is no length of 64k.

wangting0128 commented 1 week ago

Query Node disconnected from etcd

argo task: multi-vector-comp-2-2lbmt

server: init stats

NAME                                                              READY   STATUS                            RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-comp-2-2lbmt-etcd-0                                  1/1     Running                           0               6m15s   10.104.27.159   4am-node31   <none>           <none>
multi-vector-comp-2-2lbmt-etcd-1                                  1/1     Running                           0               6m15s   10.104.17.19    4am-node23   <none>           <none>
multi-vector-comp-2-2lbmt-etcd-2                                  1/1     Running                           0               6m14s   10.104.33.9     4am-node36   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-datacoord-b856bbbbb-xdxcc        1/1     Running                           0               6m15s   10.104.31.174   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-datanode-6b7955f57c-fxdgn        1/1     Running                           0               6m15s   10.104.31.177   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-indexcoord-7659494967-mzxvm      1/1     Running                           0               6m15s   10.104.31.178   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-indexnode-b9d668fc8-zzbqc        1/1     Running                           0               6m15s   10.104.19.3     4am-node28   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-proxy-dd7c87f67-zbsz4            1/1     Running                           1 (2m9s ago)    6m15s   10.104.31.175   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-querycoord-7976f6f7c6-rtf57      1/1     Running                           0               6m15s   10.104.31.176   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-querynode-6d8c845bd8-qnfjn       1/1     Running                           0               6m15s   10.104.26.77    4am-node32   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-rootcoord-5b97c8788b-v852j       1/1     Running                           0               6m15s   10.104.31.173   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-minio-0                                 1/1     Running                           0               6m15s   10.104.18.143   4am-node25   <none>           <none>
multi-vector-comp-2-2lbmt-minio-1                                 1/1     Running                           0               6m15s   10.104.31.182   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-minio-2                                 1/1     Running                           0               6m15s   10.104.27.162   4am-node31   <none>           <none>
multi-vector-comp-2-2lbmt-minio-3                                 1/1     Running                           0               6m14s   10.104.32.210   4am-node39   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-0                         1/1     Running                           0               6m15s   10.104.34.219   4am-node37   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-1                         1/1     Running                           0               6m15s   10.104.15.173   4am-node20   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-2                         1/1     Running                           0               6m14s   10.104.23.197   4am-node27   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-init-g48qr                0/1     Completed                         0               6m15s   10.104.4.124    4am-node11   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-broker-0                         1/1     Running                           0               6m15s   10.104.5.121    4am-node12   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-proxy-0                          1/1     Running                           0               6m15s   10.104.6.98     4am-node13   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-pulsar-init-h25t9                0/1     Completed                         0               6m15s   10.104.6.96     4am-node13   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-recovery-0                       1/1     Running                           0               6m15s   10.104.9.210    4am-node14   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-0                      1/1     Running                           0               6m15s   10.104.31.181   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-1                      1/1     Running                           0               5m27s   10.104.17.21    4am-node23   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-2                      1/1     Running                           0               4m52s   10.104.27.165   4am-node31   <none>           <none>

after testing

NAME                                                              READY   STATUS                            RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-comp-2-2lbmt-etcd-0                                  1/1     Running                           0               39m     10.104.27.159   4am-node31   <none>           <none>
multi-vector-comp-2-2lbmt-etcd-1                                  1/1     Running                           0               39m     10.104.17.19    4am-node23   <none>           <none>
multi-vector-comp-2-2lbmt-etcd-2                                  1/1     Running                           0               39m     10.104.33.9     4am-node36   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-datacoord-b856bbbbb-xdxcc        1/1     Running                           0               39m     10.104.31.174   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-datanode-6b7955f57c-fxdgn        1/1     Running                           0               39m     10.104.31.177   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-indexcoord-7659494967-mzxvm      1/1     Running                           0               39m     10.104.31.178   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-indexnode-b9d668fc8-zzbqc        1/1     Running                           0               39m     10.104.19.3     4am-node28   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-proxy-dd7c87f67-zbsz4            1/1     Running                           1 (35m ago)     39m     10.104.31.175   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-querycoord-7976f6f7c6-rtf57      1/1     Running                           0               39m     10.104.31.176   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-querynode-6d8c845bd8-qnfjn       1/1     Running                           1 (30m ago)     39m     10.104.26.77    4am-node32   <none>           <none>
multi-vector-comp-2-2lbmt-milvus-rootcoord-5b97c8788b-v852j       1/1     Running                           0               39m     10.104.31.173   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-minio-0                                 1/1     Running                           0               39m     10.104.18.143   4am-node25   <none>           <none>
multi-vector-comp-2-2lbmt-minio-1                                 1/1     Running                           0               39m     10.104.31.182   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-minio-2                                 1/1     Running                           0               39m     10.104.27.162   4am-node31   <none>           <none>
multi-vector-comp-2-2lbmt-minio-3                                 1/1     Running                           0               39m     10.104.32.210   4am-node39   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-0                         1/1     Running                           0               39m     10.104.34.219   4am-node37   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-1                         1/1     Running                           0               39m     10.104.15.173   4am-node20   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-2                         1/1     Running                           0               39m     10.104.23.197   4am-node27   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-bookie-init-g48qr                0/1     Completed                         0               39m     10.104.4.124    4am-node11   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-broker-0                         1/1     Running                           0               39m     10.104.5.121    4am-node12   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-proxy-0                          1/1     Running                           0               39m     10.104.6.98     4am-node13   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-pulsar-init-h25t9                0/1     Completed                         0               39m     10.104.6.96     4am-node13   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-recovery-0                       1/1     Running                           0               39m     10.104.9.210    4am-node14   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-0                      1/1     Running                           0               39m     10.104.31.181   4am-node34   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-1                      1/1     Running                           0               39m     10.104.17.21    4am-node23   <none>           <none>
multi-vector-comp-2-2lbmt-pulsar-zookeeper-2                      1/1     Running                           0               38m     10.104.27.165   4am-node31   <none>           <none> 

image

client pod name: multi-vector-comp-2-2lbmt-1555640872 client log: client.log client search error: 2024-04-09 11:30:40,756 ~ 2024-04-09 11:33:57,730

test result:

[2024-04-09 11:59:02,350 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-04-09 11:59:02,351 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-04-09 11:59:02,351 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-09 11:59:02,351 -  INFO - fouram]: grpc     search                                                                           765    70(9.15%) | 208749   21482  327504 189000 |    0.42        0.04 (stats.py:789)
[2024-04-09 11:59:02,351 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-09 11:59:02,351 -  INFO - fouram]:          Aggregated                                                                       765    70(9.15%) | 208749   21482  327504 189000 |    0.42        0.04 (stats.py:789)
[2024-04-09 11:59:02,351 -  INFO - fouram]:  (stats.py:790)
[2024-04-09 11:59:02,353 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'cluster',
            'config_name': 'cluster_2c2m',
            'config': {'queryNode': {'resources': {'limits': {'cpu': '16.0',
                                                              'memory': '32Gi'},
                                                   'requests': {'cpu': '9.0',
                                                                'memory': '17Gi'}},
                                     'replicas': 1},
                       'indexNode': {'resources': {'limits': {'cpu': '8.0',
                                                              'memory': '8Gi'},
                                                   'requests': {'cpu': '5.0',
                                                                'memory': '5Gi'}},
                                     'replicas': 1},
                       'dataNode': {'resources': {'limits': {'cpu': '2.0',
                                                             'memory': '2Gi'},
                                                  'requests': {'cpu': '2.0',
                                                               'memory': '2Gi'}}},
                       'cluster': {'enabled': True},
                       'pulsar': {},
                       'kafka': {},
                       'minio': {'metrics': {'podMonitor': {'enabled': True}}},
                       'etcd': {'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': 'v2.3.12'}}},
            'host': 'multi-vector-comp-2-2lbmt-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_concurrent_locust_hnsw_search_cluster',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 1000000,
                                                    'ni_per': 50000},
                                 'collection_params': {'other_fields': [],
                                                       'shards_num': 2},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'HNSW',
                                                  'index_param': {'M': 8,
                                                                  'efConstruction': 200}},
                                 'concurrent_params': {'concurrent_number': 100,
                                                       'during_time': 1800,
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'search',
                                                       'weight': 1,
                                                       'params': {'nq': 10000,
                                                                  'top_k': 10,
                                                                  'search_param': {'ef': 16},
                                                                  'expr': None,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'output_fields': None,
                                                                  'ignore_growing': False,
                                                                  'group_by_field': None,
                                                                  'timeout': 3600,
                                                                  'random_data': True}}]},
            'run_id': 2024040916443942,
            'datetime': '2024-04-09 11:20:44.380762',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 20.6699},
                            'insert': {'total_time': 35.1925,
                                       'VPS': 28415.1453,
                                       'batch_time': 1.7596,
                                       'batch': 50000},
                            'flush': {'RT': 2.5403},
                            'load': {'RT': 5.1836},
                            'Locust': {'Aggregated': {'Requests': 765,
                                                      'Fails': 70,
                                                      'RPS': 0.42,
                                                      'fail_s': 0.09,
                                                      'RT_max': 327504.59,
                                                      'RT_avg': 208749.74,
                                                      'TP50': 189000.0,
                                                      'TP99': 322000.0},
                                       'search': {'Requests': 765,
                                                  'Fails': 70,
                                                  'RPS': 0.42,
                                                  'fail_s': 0.09,
                                                  'RT_max': 327504.59,
                                                  'RT_avg': 208749.74,
                                                  'TP50': 189000.0,
                                                  'TP99': 322000.0}}}}}

For example, in this scenario, there are only int64 primary key field and one vector field.

wangting0128 commented 1 week ago

Proxy disconnected from etcd

argo task: inverted-corn-1713715200 image: 2.4-20240418-238f9a4a-amd64

server:

NAME                                                              READY   STATUS                            RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-115200-3-4-8054-etcd-0                              1/1     Running                           0               2m36s   10.104.18.189   4am-node25   <none>           <none>
inverted-corn-115200-3-4-8054-milvus-standalone-7c6dd4df89d7mq9   1/1     Running                           0               2m36s   10.104.25.7     4am-node30   <none>           <none>
inverted-corn-115200-3-4-8054-minio-67b486fc44-454zw              1/1     Running                           0               2m36s   10.104.18.191   4am-node25   <none>           <none> (base.py:257)
[2024-04-21 19:12:24,537 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'NAME|inverted-corn-115200-3-4-8054-milvus|inverted-corn-115200-3-4-8054-minio|inverted-corn-115200-3-4-8054-etcd|inverted-corn-115200-3-4-8054-pulsar|inverted-corn-115200-3-4-8054-zookeeper|inverted-corn-115200-3-4-8054-kafka|inverted-corn-115200-3-4-8054-log|inverted-corn-115200-3-4-8054-tikv'  (util_cmd.py:14)
[2024-04-21 19:12:34,819 -  INFO - fouram]: [CliClient] pod details of release(inverted-corn-115200-3-4-8054): 
 I0421 19:12:25.784044     547 request.go:665] Waited for 1.174038926s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/api/v1?timeout=32s
NAME                                                              READY   STATUS                            RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-115200-3-4-8054-etcd-0                              1/1     Running                           0               3h10m   10.104.18.189   4am-node25   <none>           <none>
inverted-corn-115200-3-4-8054-milvus-standalone-7c6dd4df89d7mq9   1/1     Running                           1 (14m ago)     3h10m   10.104.25.7     4am-node30   <none>           <none>
inverted-corn-115200-3-4-8054-minio-67b486fc44-454zw              1/1     Running                           0               3h10m   10.104.18.191   4am-node25   <none>           <none>

image

client pod name: inverted-corn-1713715200-2526371721 client log: error time: 2024-04-21 18:57:18,136 ~ 2024-04-21 19:00:46,117

test result:

{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_16c16m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '16.0',
                                                               'memory': '16Gi'},
                                                    'requests': {'cpu': '9.0',
                                                                 'memory': '9Gi'}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1,
                                'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone',
                                 'metrics': {'podMonitor': {'enabled': True}}},
                       'pulsar': {'enabled': False},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': '2.4-20240418-238f9a4a-amd64'}}},
            'host': 'inverted-corn-115200-3-4-8054-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_inverted_locust_partitions_dml_dql_standalone',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'scalars_index': {'id': {'index_type': 'INVERTED'},
                                                                      'int64_1': {'index_type': 'INVERTED'}},
                                                    'extra_partitions': {'partitions': 10,
                                                                         'data_repeated': False},
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 5000000,
                                                    'ni_per': 50000},
                                 'collection_params': {'other_fields': ['int64_1'],
                                                       'shards_num': 2},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'IVF_FLAT',
                                                  'index_param': {'nlist': 1024}},
                                 'concurrent_params': {'concurrent_number': 20,
                                                       'during_time': '3h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'insert',
                                                       'weight': 1,
                                                       'params': {'nb': 10,
                                                                  'timeout': 30,
                                                                  'random_id': True,
                                                                  'random_vector': True,
                                                                  'varchar_filled': False,
                                                                  'start_id': 5000000}},
                                                      {'type': 'delete',
                                                       'weight': 1,
                                                       'params': {'expr': '',
                                                                  'delete_length': 9,
                                                                  'timeout': 30}},
                                                      {'type': 'flush',
                                                       'weight': 1,
                                                       'params': {'timeout': 180}},
                                                      {'type': 'load',
                                                       'weight': 1,
                                                       'params': {'replica_number': 1,
                                                                  'timeout': 30}},
                                                      {'type': 'search',
                                                       'weight': 1,
                                                       'params': {'nq': 1000,
                                                                  'top_k': 10,
                                                                  'search_param': {'nprobe': 16},
                                                                  'expr': None,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'output_fields': None,
                                                                  'ignore_growing': False,
                                                                  'group_by_field': None,
                                                                  'timeout': 180,
                                                                  'random_data': True}},
                                                      {'type': 'hybrid_search',
                                                       'weight': 1,
                                                       'params': {'nq': 1,
                                                                  'top_k': 10,
                                                                  'reqs': [{'search_param': {'nprobe': 16},
                                                                            'anns_field': 'float_vector',
                                                                            'top_k': 2000},
                                                                           {'search_param': {'nprobe': 32},
                                                                            'anns_field': 'float_vector',
                                                                            'expr': 'int64_1 '
                                                                                    '> '
                                                                                    '-1 '
                                                                                    '&& '
                                                                                    'id '
                                                                                    '> '
                                                                                    '-1'},
                                                                           {'search_param': {'nprobe': 64},
                                                                            'anns_field': 'float_vector',
                                                                            'expr': 'id '
                                                                                    '> '
                                                                                    '10',
                                                                            'top_k': 60}],
                                                                  'rerank': {'WeightedRanker': [0.3,
                                                                                                0.4,
                                                                                                0.3]},
                                                                  'output_fields': ['*'],
                                                                  'ignore_growing': False,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'timeout': 60,
                                                                  'random_data': True}},
                                                      {'type': 'query',
                                                       'weight': 1,
                                                       'params': {'ids': None,
                                                                  'expr': 'int64_1 '
                                                                          '> '
                                                                          '-1 '
                                                                          '&&',
                                                                  'output_fields': ['*'],
                                                                  'offset': None,
                                                                  'limit': None,
                                                                  'ignore_growing': False,
                                                                  'partition_names': None,
                                                                  'timeout': 180,
                                                                  'random_data': True,
                                                                  'random_count': 20,
                                                                  'random_range': [0,
                                                                                   1000000.0],
                                                                  'field_name': 'id',
                                                                  'field_type': 'int64'}}]},
            'run_id': 2024042153188471,
            'datetime': '2024-04-21 16:01:58.021701',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 197.6094,
                                      'id': {'RT': 1.0106},
                                      'int64_1': {'RT': 1.0106}},
                            'insert': {'total_time': 149.8755,
                                       'VPS': 33492.0794,
                                       'batch_time': 1.4988,
                                       'batch': 50000.0},
                            'flush': {'RT': 3.0494},
                            'load': {'RT': 5.5379},
                            'Locust': {'Aggregated': {'Requests': 60823,
                                                      'Fails': 33,
                                                      'RPS': 5.63,
                                                      'fail_s': 0.0,
                                                      'RT_max': 257636.77,
                                                      'RT_avg': 3544.44,
                                                      'TP50': 3200.0,
                                                      'TP99': 14000.0},
                                       'delete': {'Requests': 8582,
                                                  'Fails': 2,
                                                  'RPS': 0.79,
                                                  'fail_s': 0.0,
                                                  'RT_max': 30694.32,
                                                  'RT_avg': 900.17,
                                                  'TP50': 230.0,
                                                  'TP99': 6300.0},
                                       'flush': {'Requests': 8781,
                                                 'Fails': 7,
                                                 'RPS': 0.81,
                                                 'fail_s': 0.0,
                                                 'RT_max': 257636.77,
                                                 'RT_avg': 4777.98,
                                                 'TP50': 4200.0,
                                                 'TP99': 13000.0},
                                       'hybrid_search': {'Requests': 8623,
                                                         'Fails': 8,
                                                         'RPS': 0.8,
                                                         'fail_s': 0.0,
                                                         'RT_max': 76211.92,
                                                         'RT_avg': 6729.31,
                                                         'TP50': 6000.0,
                                                         'TP99': 17000.0},
                                       'insert': {'Requests': 8702,
                                                  'Fails': 4,
                                                  'RPS': 0.81,
                                                  'fail_s': 0.0,
                                                  'RT_max': 30894.44,
                                                  'RT_avg': 1062.99,
                                                  'TP50': 410.0,
                                                  'TP99': 6900.0},
                                       'load': {'Requests': 8728,
                                                'Fails': 1,
                                                'RPS': 0.81,
                                                'fail_s': 0.0,
                                                'RT_max': 30893.58,
                                                'RT_avg': 1755.24,
                                                'TP50': 1000.0,
                                                'TP99': 9400.0},
                                       'query': {'Requests': 8630,
                                                 'Fails': 8,
                                                 'RPS': 0.8,
                                                 'fail_s': 0.0,
                                                 'RT_max': 182467.66,
                                                 'RT_avg': 5207.77,
                                                 'TP50': 4400.0,
                                                 'TP99': 15000.0},
                                       'search': {'Requests': 8777,
                                                  'Fails': 3,
                                                  'RPS': 0.81,
                                                  'fail_s': 0.0,
                                                  'RT_max': 182535.3,
                                                  'RT_avg': 4370.84,
                                                  'TP50': 3800.0,
                                                  'TP99': 12000.0}}}}}
xiaofan-luan commented 1 week ago

@wangting0128 let's sync offline. please set up a short meeting for this issue