milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.25k stars 2.81k forks source link

[Bug]: [benchmark][cluster] flush 180s timeout in DQL scene with 1024 `reqs` for hybrid_search #30529

Open wangting0128 opened 7 months ago

wangting0128 commented 7 months ago

Is there an existing issue for this?

Environment

- Milvus version:master-20240204-69596306
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2):2.4.0rc19
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: multi-vector-scene-mix-84bqz test case name: test_hybrid_search_locust_dql_max_reqs_cluster

server:

NAME                                                              READY   STATUS        RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-scene-mix-84bqz-7-etcd-0                             1/1     Running       0               6h28m   10.104.20.228   4am-node22   <none>           <none>
multi-vector-scene-mix-84bqz-7-etcd-1                             1/1     Running       0               6h28m   10.104.25.65    4am-node30   <none>           <none>
multi-vector-scene-mix-84bqz-7-etcd-2                             1/1     Running       0               6h28m   10.104.19.208   4am-node28   <none>           <none>
multi-vector-scene-mix-84bqz-7-milvus-datacoord-6b95c9fc6c846b2   1/1     Running       0               6h28m   10.104.19.197   4am-node28   <none>           <none>
multi-vector-scene-mix-84bqz-7-milvus-datanode-56d6475f56-t64nb   1/1     Running       1 (6h23m ago)   6h28m   10.104.26.55    4am-node32   <none>           <none>
multi-vector-scene-mix-84bqz-7-milvus-indexcoord-6fb84db66k7ztc   1/1     Running       0               6h28m   10.104.25.57    4am-node30   <none>           <none>
multi-vector-scene-mix-84bqz-7-milvus-indexnode-7dd7dcd887xmk84   1/1     Running       0               6h28m   10.104.23.117   4am-node27   <none>           <none>
multi-vector-scene-mix-84bqz-7-milvus-proxy-698fdd46b6-rswmn      1/1     Running       1 (6h23m ago)   6h28m   10.104.26.56    4am-node32   <none>           <none>
multi-vector-scene-mix-84bqz-7-milvus-querycoord-547f7b6794qwcc   1/1     Running       1 (6h23m ago)   6h28m   10.104.18.62    4am-node25   <none>           <none>
multi-vector-scene-mix-84bqz-7-milvus-querynode-95b46f967-9b9g7   1/1     Running       0               6h28m   10.104.1.13     4am-node10   <none>           <none>
multi-vector-scene-mix-84bqz-7-milvus-rootcoord-6bbb958b7-cmz9r   1/1     Running       1 (6h23m ago)   6h28m   10.104.1.9      4am-node10   <none>           <none>
multi-vector-scene-mix-84bqz-7-minio-0                            1/1     Running       0               6h28m   10.104.20.230   4am-node22   <none>           <none>
multi-vector-scene-mix-84bqz-7-minio-1                            1/1     Running       0               6h28m   10.104.19.206   4am-node28   <none>           <none>
multi-vector-scene-mix-84bqz-7-minio-2                            1/1     Running       0               6h28m   10.104.17.165   4am-node23   <none>           <none>
multi-vector-scene-mix-84bqz-7-minio-3                            1/1     Running       0               6h28m   10.104.25.66    4am-node30   <none>           <none>
multi-vector-scene-mix-84bqz-7-pulsar-bookie-0                    1/1     Running       0               6h28m   10.104.18.75    4am-node25   <none>           <none>
multi-vector-scene-mix-84bqz-7-pulsar-bookie-1                    1/1     Running       0               6h28m   10.104.20.232   4am-node22   <none>           <none>
multi-vector-scene-mix-84bqz-7-pulsar-bookie-2                    1/1     Running       0               6h28m   10.104.17.166   4am-node23   <none>           <none>
multi-vector-scene-mix-84bqz-7-pulsar-bookie-init-xfg49           0/1     Completed     0               6h28m   10.104.20.221   4am-node22   <none>           <none>
multi-vector-scene-mix-84bqz-7-pulsar-broker-0                    1/1     Running       0               6h28m   10.104.6.4      4am-node13   <none>           <none>
multi-vector-scene-mix-84bqz-7-pulsar-proxy-0                     1/1     Running       0               6h28m   10.104.21.24    4am-node24   <none>           <none>
multi-vector-scene-mix-84bqz-7-pulsar-pulsar-init-4grwb           0/1     Completed     0               6h28m   10.104.6.3      4am-node13   <none>           <none>
multi-vector-scene-mix-84bqz-7-pulsar-recovery-0                  1/1     Running       0               6h28m   10.104.21.23    4am-node24   <none>           <none>
multi-vector-scene-mix-84bqz-7-pulsar-zookeeper-0                 1/1     Running       0               6h28m   10.104.20.231   4am-node22   <none>           <none>
multi-vector-scene-mix-84bqz-7-pulsar-zookeeper-1                 1/1     Running       0               6h27m   10.104.27.235   4am-node31   <none>           <none>
multi-vector-scene-mix-84bqz-7-pulsar-zookeeper-2                 1/1     Running       0               6h25m   10.104.34.151   4am-node37   <none>           <none> 

client pod name: multi-vector-scene-mix-84bqz-3046920368 client log:

client.log

截屏2024-02-05 15 12 52 截屏2024-02-05 14 51 57

Expected Behavior

No response

Steps To Reproduce

concurrent test and calculation of RT and QPS

        :purpose:  `DQL & max reqs=1024`
            verify DQL & max reqs=1024 scenario,
            which has 4 vector fields(IVF_FLAT, HNSW, DISKANN, IVF_SQ8) and scalar fields: `int64_1`, `varchar_1`

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'float_vector_1': 128dim,
                'float_vector_2': 128dim,
                'float_vector_3': 128dim,
                scalar field: int64_1, varchar_1
            2. build indexes:
                IVF_FLAT: 'float_vector'
                HNSW: 'float_vector_1',
                DISKANN: 'float_vector_2'
                IVF_SQ8: 'float_vector_3'
                INVERTED: 'int64_1', 'varchar_1'
                default scalar index: 'id'
            3. insert 1m data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
                replica: 1
            7. concurrent request:
                - flush
                - load
                - search
                - hybrid_search: len(reqs) = 1024
                - query

Milvus Log

No response

Anything else?

server config:

{
     "queryNode": {
          "resources": {
               "limits": {
                    "cpu": "16.0",
                    "memory": "64Gi"
               },
               "requests": {
                    "cpu": "9.0",
                    "memory": "33Gi"
               }
          },
          "replicas": 1
     },
     "indexNode": {
          "resources": {
               "limits": {
                    "cpu": "8.0",
                    "memory": "8Gi"
               },
               "requests": {
                    "cpu": "5.0",
                    "memory": "5Gi"
               }
          },
          "replicas": 1
     },
     "dataNode": {
          "resources": {
               "limits": {
                    "cpu": "2.0",
                    "memory": "8Gi"
               },
               "requests": {
                    "cpu": "2.0",
                    "memory": "5Gi"
               }
          }
     },
     "cluster": {
          "enabled": true
     },
     "pulsar": {},
     "kafka": {},
     "minio": {
          "metrics": {
               "podMonitor": {
                    "enabled": true
               }
          }
     },
     "etcd": {
          "metrics": {
               "enabled": true,
               "podMonitor": {
                    "enabled": true
               }
          }
     },
     "metrics": {
          "serviceMonitor": {
               "enabled": true
          }
     },
     "log": {
          "level": "debug"
     },
     "image": {
          "all": {
               "repository": "harbor.milvus.io/milvus/milvus",
               "tag": "master-20240204-69596306"
          }
     }
}

client config: image

test result:

{
     "test_result": {
          "index": {
               "RT": 967.0936,
               "float_vector_1": {
                    "RT": 0.5557
               },
               "float_vector_2": {
                    "RT": 2.042
               },
               "float_vector_3": {
                    "RT": 1.0627
               },
               "id": {
                    "RT": 1.0318
               },
               "int64_1": {
                    "RT": 1.0643
               },
               "varchar_1": {
                    "RT": 1.0247
               }
          },
          "insert": {
               "total_time": 175.1558,
               "VPS": 5709.2029,
               "batch_time": 1.7516,
               "batch": 10000
          },
          "flush": {
               "RT": 30.2399
          },
          "load": {
               "RT": 11.6515
          },
          "Locust": {
               "Aggregated": {
                    "Requests": 5481,
                    "Fails": 409,
                    "RPS": 0.51,
                    "fail_s": 0.07,
                    "RT_max": 548240,
                    "RT_avg": 192900.39,
                    "TP50": 187000,
                    "TP99": 494000
               },
               "flush": {
                    "Requests": 1113,
                    "Fails": 409,
                    "RPS": 0.1,
                    "fail_s": 0.37,
                    "RT_max": 548240,
                    "RT_avg": 339588.99,
                    "TP50": 335000,
                    "TP99": 511000
               },
               "hybrid_search": {
                    "Requests": 1082,
                    "Fails": 0,
                    "RPS": 0.1,
                    "fail_s": 0,
                    "RT_max": 268701.09,
                    "RT_avg": 68170.01,
                    "TP50": 64000,
                    "TP99": 230000
               },
               "load": {
                    "Requests": 1102,
                    "Fails": 0,
                    "RPS": 0.1,
                    "fail_s": 0,
                    "RT_max": 548204.59,
                    "RT_avg": 337781.77,
                    "TP50": 333000,
                    "TP99": 512000
               },
               "query": {
                    "Requests": 1065,
                    "Fails": 0,
                    "RPS": 0.1,
                    "fail_s": 0,
                    "RT_max": 378817.38,
                    "RT_avg": 187034,
                    "TP50": 180000,
                    "TP99": 359000
               },
               "search": {
                    "Requests": 1119,
                    "Fails": 0,
                    "RPS": 0.1,
                    "fail_s": 0,
                    "RT_max": 230678.43,
                    "RT_avg": 30507.44,
                    "TP50": 29000,
                    "TP99": 229000
               }
          }
     }
}
wangting0128 commented 6 months ago

flush timeout 180s

argo task:inverted-corn-1708358400 test case name:test_inverted_locust_partition_key_dml_standalone milvus image: master-20240219-43e8cd53-amd64

server:

NAME                                                              READY   STATUS             RESTARTS          AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-158400-2-82-4682-etcd-0                             1/1     Running            0                 3h17m   10.104.16.107   4am-node21   <none>           <none>
inverted-corn-158400-2-82-4682-milvus-standalone-65b65977dnhdxg   1/1     Running            0                 3h17m   10.104.27.229   4am-node31   <none>           <none>
inverted-corn-158400-2-82-4682-minio-5bd6797c67-cdv4d             1/1     Running            0                 3h17m   10.104.23.223   4am-node27   <none>           <none>

client pod name: inverted-corn-1708358400-3041780947 client logs:

截屏2024-02-21 14 24 02 截屏2024-02-21 14 24 25

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `partition_key: scalar enable partition_key(num_partitions=128)`
            verify concurrent DML scenario which
            scalar `id`(pk) & `int64_1` created INVERTED index and enable partition_key on `int64_1` field

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'int64_1': is_partition_key
            2. build indexes:
                IVF_FLAT: 'float_vector'
                INVERTED: 'id', 'int64_1'
            3. insert 5 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - insert
                - delete
                - flush
                - release

test result:

{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_8c16m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '8.0',
                                                               'memory': '16Gi'},
                                                    'requests': {'cpu': '5.0',
                                                                 'memory': '9Gi'}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1,
                                'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone',
                                 'metrics': {'podMonitor': {'enabled': True}}},
                       'pulsar': {'enabled': False},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': 'master-20240219-43e8cd53-amd64'}}},
            'host': 'inverted-corn-158400-2-82-4682-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_inverted_locust_partition_key_dml_standalone',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'scalars_index': {'id': {'index_type': 'INVERTED'},
                                                                      'int64_1': {'index_type': 'INVERTED'}},
                                                    'scalars_params': {'int64_1': {'params': {'is_partition_key': True}}},
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 5000000,
                                                    'ni_per': 50000},
                                 'collection_params': {'other_fields': ['int64_1'],
                                                       'shards_num': 2,
                                                       'num_partitions': 128},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'IVF_FLAT',
                                                  'index_param': {'nlist': 1024}},
                                 'concurrent_params': {'concurrent_number': 20,
                                                       'during_time': '3h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'insert',
                                                       'weight': 1,
                                                       'params': {'nb': 10,
                                                                  'timeout': 180,
                                                                  'random_id': True,
                                                                  'random_vector': True,
                                                                  'varchar_filled': False,
                                                                  'start_id': 0}},
                                                      {'type': 'delete',
                                                       'weight': 1,
                                                       'params': {'expr': '',
                                                                  'delete_length': 9,
                                                                  'timeout': 30}},
                                                      {'type': 'flush',
                                                       'weight': 1,
                                                       'params': {'timeout': 180}},
                                                      {'type': 'release',
                                                       'weight': 1,
                                                       'params': {'timeout': 30}}]},
            'run_id': 2024021985241562,
            'datetime': '2024-02-19 16:02:04.509472',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 0.5101,
                                      'id': {'RT': 0.5091},
                                      'int64_1': {'RT': 0.5091}},
                            'insert': {'total_time': 145.5963,
                                       'VPS': 34341.532,
                                       'batch_time': 1.456,
                                       'batch': 50000},
                            'flush': {'RT': 559.5905},
                            'load': {'RT': 2.5874},
                            'Locust': {'Aggregated': {'Requests': 4821,
                                                      'Fails': 1180,
                                                      'RPS': 0.45,
                                                      'fail_s': 0.24,
                                                      'RT_max': 182973.29,
                                                      'RT_avg': 44150.94,
                                                      'TP50': 27,
                                                      'TP99': 181000.0},
                                       'delete': {'Requests': 1245,
                                                  'Fails': 0,
                                                  'RPS': 0.12,
                                                  'fail_s': 0.0,
                                                  'RT_max': 90.65,
                                                  'RT_avg': 6.5,
                                                  'TP50': 3,
                                                  'TP99': 56},
                                       'flush': {'Requests': 1180,
                                                 'Fails': 1180,
                                                 'RPS': 0.11,
                                                 'fail_s': 1.0,
                                                 'RT_max': 182973.29,
                                                 'RT_avg': 180318.67,
                                                 'TP50': 180000.0,
                                                 'TP99': 181000.0},
                                       'insert': {'Requests': 1202,
                                                  'Fails': 0,
                                                  'RPS': 0.11,
                                                  'fail_s': 0.0,
                                                  'RT_max': 280.76,
                                                  'RT_avg': 46.83,
                                                  'TP50': 41,
                                                  'TP99': 140.0},
                                       'release': {'Requests': 1194,
                                                   'Fails': 0,
                                                   'RPS': 0.11,
                                                   'fail_s': 0.0,
                                                   'RT_max': 2203.0,
                                                   'RT_avg': 9.41,
                                                   'TP50': 2,
                                                   'TP99': 59}}}}}
wangting0128 commented 6 months ago

flush timeout 180s

argo task: inverted-corn-1709049600 test case name: test_inverted_locust_partition_key_dml_standalone image: master-20240227-f87a3a13-amd64

server:

NAME                                                              READY   STATUS             RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-149600-2-93-8599-etcd-0                             1/1     Running            0                3h22m   10.104.32.236   4am-node39   <none>           <none>
inverted-corn-149600-2-93-8599-milvus-standalone-58d6cd97d2cfkz   1/1     Running            1 (3h20m ago)    3h22m   10.104.19.67    4am-node28   <none>           <none>
inverted-corn-149600-2-93-8599-minio-54cf95b55-kqhlh              1/1     Running            0                3h22m   10.104.32.237   4am-node39   <none>           <none> 

client pod name: inverted-corn-1709049600-1162705415 client log: client.log

截屏2024-02-28 10 44 51

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `partition_key: scalar enable partition_key(num_partitions=128)`
            verify concurrent DML scenario which
            scalar `id`(pk) & `int64_1` created INVERTED index and enable partition_key on `int64_1` field

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'int64_1': is_partition_key
            2. build indexes:
                IVF_FLAT: 'float_vector'
                INVERTED: 'id', 'int64_1'
            3. insert 5 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - insert
                - delete
                - flush
                - release

test result:

[2024-02-27 19:28:37,101 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-02-27 19:28:37,101 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-27 19:28:37,102 -  INFO - fouram]: grpc     delete                                                                          5437     0(0.00%) |     18       1     383      5 |    0.50        0.00 (stats.py:789)
[2024-02-27 19:28:37,102 -  INFO - fouram]: grpc     flush                                                                           5405    77(1.42%) |  29881     340  204158  19000 |    0.50        0.01 (stats.py:789)
[2024-02-27 19:28:37,102 -  INFO - fouram]: grpc     insert                                                                          5410    31(0.57%) |   9689      28  180059   5400 |    0.50        0.00 (stats.py:789)
[2024-02-27 19:28:37,103 -  INFO - fouram]: grpc     release                                                                         5564     0(0.00%) |     17       1    1025      4 |    0.52        0.00 (stats.py:789)
[2024-02-27 19:28:37,103 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-02-27 19:28:37,103 -  INFO - fouram]:          Aggregated                                                                     21816   108(0.50%) |   9815       1  204158     85 |    2.02        0.01 (stats.py:789)
[2024-02-27 19:28:37,103 -  INFO - fouram]:  (stats.py:790)
[2024-02-27 19:28:37,110 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_8c16m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '8.0',
                                                               'memory': '16Gi'},
                                                    'requests': {'cpu': '5.0',
                                                                 'memory': '9Gi'}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1,
                                'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone',
                                 'metrics': {'podMonitor': {'enabled': True}}},
                       'pulsar': {'enabled': False},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': 'master-20240227-f87a3a13-amd64'}}},
            'host': 'inverted-corn-149600-2-93-8599-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_inverted_locust_partition_key_dml_standalone',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'scalars_index': {'id': {'index_type': 'INVERTED'},
                                                                      'int64_1': {'index_type': 'INVERTED'}},
                                                    'scalars_params': {'int64_1': {'params': {'is_partition_key': True}}},
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 5000000,
                                                    'ni_per': 50000},
                                 'collection_params': {'other_fields': ['int64_1'],
                                                       'shards_num': 2,
                                                       'num_partitions': 128},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'IVF_FLAT',
                                                  'index_param': {'nlist': 1024}},
                                 'concurrent_params': {'concurrent_number': 20,
                                                       'during_time': '3h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'insert',
                                                       'weight': 1,
                                                       'params': {'nb': 10,
                                                                  'timeout': 180,
                                                                  'random_id': True,
                                                                  'random_vector': True,
                                                                  'varchar_filled': False,
                                                                  'start_id': 0}},
                                                      {'type': 'delete',
                                                       'weight': 1,
                                                       'params': {'expr': '',
                                                                  'delete_length': 9,
                                                                  'timeout': 30}},
                                                      {'type': 'flush',
                                                       'weight': 1,
                                                       'params': {'timeout': 180}},
                                                      {'type': 'release',
                                                       'weight': 1,
                                                       'params': {'timeout': 30}}]},
            'run_id': 2024022700172497,
            'datetime': '2024-02-27 16:06:57.151992',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 797.5195,
                                      'id': {'RT': 1.0174},
                                      'int64_1': {'RT': 1.02}},
                            'insert': {'total_time': 174.1965,
                                       'VPS': 28703.2173,
                                       'batch_time': 1.742,
                                       'batch': 50000},
                            'flush': {'RT': 16.4752},
                            'load': {'RT': 9.1697},
                            'Locust': {'Aggregated': {'Requests': 21816,
                                                      'Fails': 108,
                                                      'RPS': 2.02,
                                                      'fail_s': 0.0,
                                                      'RT_max': 204158.91,
                                                      'RT_avg': 9815.15,
                                                      'TP50': 85,
                                                      'TP99': 120000.0},
                                       'delete': {'Requests': 5437,
                                                  'Fails': 0,
                                                  'RPS': 0.5,
                                                  'fail_s': 0.0,
                                                  'RT_max': 383.34,
                                                  'RT_avg': 18.87,
                                                  'TP50': 5,
                                                  'TP99': 110.0},
                                       'flush': {'Requests': 5405,
                                                 'Fails': 77,
                                                 'RPS': 0.5,
                                                 'fail_s': 0.01,
                                                 'RT_max': 204158.91,
                                                 'RT_avg': 29881.0,
                                                 'TP50': 19000.0,
                                                 'TP99': 185000.0},
                                       'insert': {'Requests': 5410,
                                                  'Fails': 31,
                                                  'RPS': 0.5,
                                                  'fail_s': 0.01,
                                                  'RT_max': 180059.4,
                                                  'RT_avg': 9689.75,
                                                  'TP50': 5400.0,
                                                  'TP99': 101000.0},
                                       'release': {'Requests': 5564,
                                                   'Fails': 0,
                                                   'RPS': 0.52,
                                                   'fail_s': 0.0,
                                                   'RT_max': 1025.65,
                                                   'RT_avg': 17.33,
                                                   'TP50': 4,
                                                   'TP99': 100.0}}}}}
yanliang567 commented 5 months ago

i'd set priority to high for a flush issue.

longjiquan commented 5 months ago

image

longjiquan commented 5 months ago

channel checkpoint lag is sometimes bigger than 180s, which is the root cause why flush 180s timeout.

XuanYang-cn commented 5 months ago

/assign

wangting0128 commented 5 months ago

Recurrent

argo task: multi-vector-scene-mix-6r7dt test case name: test_hybrid_search_locust_dql_max_reqs_cluster image: 2.4-20240401-d4d0c6be8-amd64

server:

NAME                                                              READY   STATUS             RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-scene-mix-6r7dt-7-etcd-0                             1/1     Running            0                6h29m   10.104.31.50    4am-node34   <none>           <none>
multi-vector-scene-mix-6r7dt-7-etcd-1                             1/1     Running            0                6h29m   10.104.20.101   4am-node22   <none>           <none>
multi-vector-scene-mix-6r7dt-7-etcd-2                             1/1     Running            0                6h29m   10.104.30.88    4am-node38   <none>           <none>
multi-vector-scene-mix-6r7dt-7-milvus-datacoord-6579d9d4774m5wr   1/1     Running            1 (6h24m ago)    6h29m   10.104.5.26     4am-node12   <none>           <none>
multi-vector-scene-mix-6r7dt-7-milvus-datanode-5d79b8654-rhdqv    1/1     Running            1 (6h24m ago)    6h29m   10.104.29.64    4am-node35   <none>           <none>
multi-vector-scene-mix-6r7dt-7-milvus-indexcoord-6fcfb8976qxtj6   1/1     Running            0                6h29m   10.104.14.27    4am-node18   <none>           <none>
multi-vector-scene-mix-6r7dt-7-milvus-indexnode-5b47fcb5c-k8txc   1/1     Running            0                6h29m   10.104.6.105    4am-node13   <none>           <none>
multi-vector-scene-mix-6r7dt-7-milvus-proxy-7cdccbf7d8-g6nqm      1/1     Running            1 (6h24m ago)    6h29m   10.104.14.25    4am-node18   <none>           <none>
multi-vector-scene-mix-6r7dt-7-milvus-querycoord-5c78f747bpxlmf   1/1     Running            1 (6h24m ago)    6h29m   10.104.14.29    4am-node18   <none>           <none>
multi-vector-scene-mix-6r7dt-7-milvus-querynode-644b57f6d94w56w   1/1     Running            0                6h29m   10.104.5.27     4am-node12   <none>           <none>
multi-vector-scene-mix-6r7dt-7-milvus-rootcoord-85d8cf89492pncr   1/1     Running            1 (6h24m ago)    6h29m   10.104.5.25     4am-node12   <none>           <none>
multi-vector-scene-mix-6r7dt-7-minio-0                            1/1     Running            0                6h29m   10.104.30.84    4am-node38   <none>           <none>
multi-vector-scene-mix-6r7dt-7-minio-1                            1/1     Running            0                6h29m   10.104.20.105   4am-node22   <none>           <none>
multi-vector-scene-mix-6r7dt-7-minio-2                            1/1     Running            0                6h29m   10.104.31.67    4am-node34   <none>           <none>
multi-vector-scene-mix-6r7dt-7-minio-3                            1/1     Running            0                6h29m   10.104.29.77    4am-node35   <none>           <none>
multi-vector-scene-mix-6r7dt-7-pulsar-bookie-0                    1/1     Running            0                6h29m   10.104.30.83    4am-node38   <none>           <none>
multi-vector-scene-mix-6r7dt-7-pulsar-bookie-1                    1/1     Running            0                6h29m   10.104.20.107   4am-node22   <none>           <none>
multi-vector-scene-mix-6r7dt-7-pulsar-bookie-2                    1/1     Running            0                6h29m   10.104.31.66    4am-node34   <none>           <none>
multi-vector-scene-mix-6r7dt-7-pulsar-bookie-init-x9xvx           0/1     Completed          0                6h29m   10.104.14.28    4am-node18   <none>           <none>
multi-vector-scene-mix-6r7dt-7-pulsar-broker-0                    1/1     Running            0                6h29m   10.104.14.31    4am-node18   <none>           <none>
multi-vector-scene-mix-6r7dt-7-pulsar-proxy-0                     1/1     Running            0                6h29m   10.104.9.188    4am-node14   <none>           <none>
multi-vector-scene-mix-6r7dt-7-pulsar-pulsar-init-52zjq           0/1     Completed          0                6h29m   10.104.14.26    4am-node18   <none>           <none>
multi-vector-scene-mix-6r7dt-7-pulsar-recovery-0                  1/1     Running            0                6h29m   10.104.14.30    4am-node18   <none>           <none>
multi-vector-scene-mix-6r7dt-7-pulsar-zookeeper-0                 1/1     Running            0                6h29m   10.104.31.49    4am-node34   <none>           <none>
multi-vector-scene-mix-6r7dt-7-pulsar-zookeeper-1                 1/1     Running            0                6h26m   10.104.28.132   4am-node33   <none>           <none>
multi-vector-scene-mix-6r7dt-7-pulsar-zookeeper-2                 1/1     Running            0                6h24m   10.104.29.98    4am-node35   <none>           <none> 
截屏2024-04-02 10 52 27 截屏2024-04-02 10 52 56 截屏2024-04-02 10 53 24

client pod name: multi-vector-scene-mix-6r7dt-2282897328 client log: client.log

截屏2024-04-02 10 49 55
[2024-04-01 20:51:17,858 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-04-01 20:51:17,858 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-04-01 20:51:17,858 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-01 20:51:17,859 -  INFO - fouram]: grpc     flush                                                                           1041  451(43.32%) | 357341  192409  608509 351000 |    0.10        0.04 (stats.py:789)
[2024-04-01 20:51:17,859 -  INFO - fouram]: grpc     hybrid_search                                                                   1079     0(0.00%) |  69104   20244  272834  63000 |    0.10        0.00 (stats.py:789)
[2024-04-01 20:51:17,859 -  INFO - fouram]: grpc     load                                                                            1050     0(0.00%) | 355687     398  608586 356000 |    0.10        0.00 (stats.py:789)
[2024-04-01 20:51:17,859 -  INFO - fouram]: grpc     query                                                                           1079     0(0.00%) | 194759   69671  504588 190000 |    0.10        0.00 (stats.py:789)
[2024-04-01 20:51:17,859 -  INFO - fouram]: grpc     search                                                                          1085     0(0.00%) |  33079      26  232173  30000 |    0.10        0.00 (stats.py:789)
[2024-04-01 20:51:17,859 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-01 20:51:17,859 -  INFO - fouram]:          Aggregated                                                                      5334   451(8.46%) | 199862      26  608586 190000 |    0.49        0.04 (stats.py:789)
[2024-04-01 20:51:17,859 -  INFO - fouram]:  (stats.py:790)

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `DQL & max reqs=1024`
            verify DQL & max reqs=1024 scenario,
            which has 4 vector fields(IVF_FLAT, HNSW, DISKANN, IVF_SQ8) and scalar fields: `int64_1`, `varchar_1`

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'float_vector_1': 128dim,
                'float_vector_2': 128dim,
                'float_vector_3': 128dim,
                scalar field: int64_1, varchar_1
            2. build indexes:
                IVF_FLAT: 'float_vector'
                HNSW: 'float_vector_1',
                DISKANN: 'float_vector_2'
                IVF_SQ8: 'float_vector_3'
                INVERTED: 'int64_1', 'varchar_1'
                default scalar index: 'id'
            3. insert 1 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
                replica: 1
            7. concurrent request:
                - flush
                - load
                - search
                - hybrid_search: len(reqs) = 1024
                - query

test result:


     "result": {
          "test_result": {
               "index": {
                    "RT": 981.6727,
                    "float_vector_1": {
                         "RT": 1.0218
                    },
                    "float_vector_2": {
                         "RT": 2.0281
                    },
                    "float_vector_3": {
                         "RT": 1.02
                    },
                    "id": {
                         "RT": 1.0226
                    },
                    "int64_1": {
                         "RT": 0.5214
                    },
                    "varchar_1": {
                         "RT": 1.028
                    }
               },
               "insert": {
                    "total_time": 148.5601,
                    "VPS": 6731.2825,
                    "batch_time": 1.4856,
                    "batch": 10000
               },
               "flush": {
                    "RT": 22.2226
               },
               "load": {
                    "RT": 8.6411
               },
               "Locust": {
                    "Aggregated": {
                         "Requests": 5313,
                         "Fails": 0,
                         "RPS": 0.49,
                         "fail_s": 0,
                         "RT_max": 139961.75,
                         "RT_avg": 40434.15,
                         "TP50": 41000,
                         "TP99": 109000
                    },
                    "flush": {
                         "Requests": 1109,
                         "Fails": 0,
                         "RPS": 0.1,
                         "fail_s": 0,
                         "RT_max": 139961.75,
                         "RT_avg": 38920.65,
                         "TP50": 40000,
                         "TP99": 111000
                    },
                    "hybrid_search": {
                         "Requests": 1062,
                         "Fails": 0,
                         "RPS": 0.1,
                         "fail_s": 0,
                         "RT_max": 129233.3,
                         "RT_avg": 64638.19,
                         "TP50": 61000,
                         "TP99": 113000
                    },
                    "load": {
                         "Requests": 1063,
                         "Fails": 0,
                         "RPS": 0.1,
                         "fail_s": 0,
                         "RT_max": 125997,
                         "RT_avg": 29857.82,
                         "TP50": 24000,
                         "TP99": 103000
                    },
                    "query": {
                         "Requests": 1079,
                         "Fails": 0,
                         "RPS": 0.1,
                         "fail_s": 0,
                         "RT_max": 130191.6,
                         "RT_avg": 37255.29,
                         "TP50": 40000,
                         "TP99": 93000
                    },
                    "search": {
                         "Requests": 1000,
                         "Fails": 0,
                         "RPS": 0.09,
                         "fail_s": 0,
                         "RT_max": 91257.24,
                         "RT_avg": 31080.53,
                         "TP50": 32000,
                         "TP99": 80000
                    }
               }
          }
     }

server config:

截屏2024-04-02 10 48 11

client config:

截屏2024-04-02 10 49 16
XuanYang-cn commented 5 months ago

related to https://github.com/milvus-io/milvus/issues/30552#issuecomment-2031769315

XuanYang-cn commented 5 months ago

should be fixed /assign @wangting0128 /unassign @XuanYang-cn Please help verify

wangting0128 commented 4 months ago

Recurrent

argo task: multi-vector-scene-mix-ld9h8 test case name: test_hybrid_search_locust_dql_max_reqs_cluster image: 2.4-20240415-e50599ba-amd64

server:

NAME                                                              READY   STATUS                            RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-scene-mix-ld9h8-7-etcd-0                             1/1     Running                           0                6h30m   10.104.31.251   4am-node34   <none>           <none>
multi-vector-scene-mix-ld9h8-7-etcd-1                             1/1     Running                           0                6h30m   10.104.19.27    4am-node28   <none>           <none>
multi-vector-scene-mix-ld9h8-7-etcd-2                             1/1     Running                           0                6h30m   10.104.25.110   4am-node30   <none>           <none>
multi-vector-scene-mix-ld9h8-7-milvus-datacoord-c9f6855cf-xlktl   1/1     Running                           0                6h30m   10.104.31.244   4am-node34   <none>           <none>
multi-vector-scene-mix-ld9h8-7-milvus-datanode-7c898966f8-c6v79   1/1     Running                           1 (6h25m ago)    6h30m   10.104.34.128   4am-node37   <none>           <none>
multi-vector-scene-mix-ld9h8-7-milvus-indexcoord-67b64b4d8gmsvs   1/1     Running                           0                6h30m   10.104.6.177    4am-node13   <none>           <none>
multi-vector-scene-mix-ld9h8-7-milvus-indexnode-6d699b47c7xzr69   1/1     Running                           0                6h30m   10.104.15.60    4am-node20   <none>           <none>
multi-vector-scene-mix-ld9h8-7-milvus-proxy-8474cf5474-jz6qq      1/1     Running                           1 (6h25m ago)    6h30m   10.104.31.243   4am-node34   <none>           <none>
multi-vector-scene-mix-ld9h8-7-milvus-querycoord-d776d56f5lcgl9   1/1     Running                           1 (6h25m ago)    6h30m   10.104.19.21    4am-node28   <none>           <none>
multi-vector-scene-mix-ld9h8-7-milvus-querynode-db5755485-mbsvr   1/1     Running                           0                6h30m   10.104.6.178    4am-node13   <none>           <none>
multi-vector-scene-mix-ld9h8-7-milvus-rootcoord-69c857b5cfplzxb   1/1     Running                           1 (6h25m ago)    6h30m   10.104.6.174    4am-node13   <none>           <none>
multi-vector-scene-mix-ld9h8-7-minio-0                            1/1     Running                           0                6h30m   10.104.31.250   4am-node34   <none>           <none>
multi-vector-scene-mix-ld9h8-7-minio-1                            1/1     Running                           0                6h30m   10.104.25.105   4am-node30   <none>           <none>
multi-vector-scene-mix-ld9h8-7-minio-2                            1/1     Running                           0                6h30m   10.104.17.64    4am-node23   <none>           <none>
multi-vector-scene-mix-ld9h8-7-minio-3                            1/1     Running                           0                6h30m   10.104.19.26    4am-node28   <none>           <none>
multi-vector-scene-mix-ld9h8-7-pulsar-bookie-0                    1/1     Running                           0                6h30m   10.104.29.9     4am-node35   <none>           <none>
multi-vector-scene-mix-ld9h8-7-pulsar-bookie-1                    1/1     Running                           0                6h30m   10.104.17.65    4am-node23   <none>           <none>
multi-vector-scene-mix-ld9h8-7-pulsar-bookie-2                    1/1     Running                           0                6h30m   10.104.25.111   4am-node30   <none>           <none>
multi-vector-scene-mix-ld9h8-7-pulsar-bookie-init-zr2mm           0/1     Completed                         0                6h30m   10.104.20.199   4am-node22   <none>           <none>
multi-vector-scene-mix-ld9h8-7-pulsar-broker-0                    1/1     Running                           0                6h30m   10.104.30.194   4am-node38   <none>           <none>
multi-vector-scene-mix-ld9h8-7-pulsar-proxy-0                     1/1     Running                           0                6h30m   10.104.9.98     4am-node14   <none>           <none>
multi-vector-scene-mix-ld9h8-7-pulsar-pulsar-init-vlksb           0/1     Completed                         0                6h30m   10.104.20.200   4am-node22   <none>           <none>
multi-vector-scene-mix-ld9h8-7-pulsar-recovery-0                  1/1     Running                           0                6h30m   10.104.20.201   4am-node22   <none>           <none>
multi-vector-scene-mix-ld9h8-7-pulsar-zookeeper-0                 1/1     Running                           0                6h30m   10.104.31.252   4am-node34   <none>           <none>
multi-vector-scene-mix-ld9h8-7-pulsar-zookeeper-1                 1/1     Running                           0                6h29m   10.104.15.86    4am-node20   <none>           <none>
multi-vector-scene-mix-ld9h8-7-pulsar-zookeeper-2                 1/1     Running                           0                6h27m   10.104.28.179   4am-node33   <none>           <none>

client pod name: multi-vector-scene-mix-ld9h8-876450026 client log: client.log

截屏2024-04-15 19 53 02

/unassign @wangting0128 /assign @XuanYang-cn

wangting0128 commented 4 months ago

Recurrent

argo task: multi-vector-scene-mix-gkx8n test case name: test_hybrid_search_locust_dql_max_reqs_cluster image: 2.4-20240418-238f9a4a-amd64

server:

NAME                                                              READY   STATUS                            RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-scene-mix-gkx8n-7-etcd-0                             1/1     Running                           0                6h30m   10.104.24.140   4am-node29   <none>           <none>
multi-vector-scene-mix-gkx8n-7-etcd-1                             1/1     Running                           0                6h30m   10.104.23.218   4am-node27   <none>           <none>
multi-vector-scene-mix-gkx8n-7-etcd-2                             1/1     Running                           0                6h30m   10.104.30.196   4am-node38   <none>           <none>
multi-vector-scene-mix-gkx8n-7-milvus-datacoord-587c4f4d57fnw66   1/1     Running                           1 (6h25m ago)    6h30m   10.104.14.83    4am-node18   <none>           <none>
multi-vector-scene-mix-gkx8n-7-milvus-datanode-6758fbdc4d-td9m7   1/1     Running                           1 (6h25m ago)    6h30m   10.104.33.198   4am-node36   <none>           <none>
multi-vector-scene-mix-gkx8n-7-milvus-indexcoord-78c596cd8lbvns   1/1     Running                           0                6h30m   10.104.18.14    4am-node25   <none>           <none>
multi-vector-scene-mix-gkx8n-7-milvus-indexnode-cd964ff95-9l6pv   1/1     Running                           1 (6h25m ago)    6h30m   10.104.21.135   4am-node24   <none>           <none>
multi-vector-scene-mix-gkx8n-7-milvus-proxy-f9f4ddc76-ctg68       1/1     Running                           1 (6h25m ago)    6h30m   10.104.17.245   4am-node23   <none>           <none>
multi-vector-scene-mix-gkx8n-7-milvus-querycoord-859c8bcbf4rcq6   1/1     Running                           1 (6h25m ago)    6h30m   10.104.32.142   4am-node39   <none>           <none>
multi-vector-scene-mix-gkx8n-7-milvus-querynode-7fd56c9755jmzj2   1/1     Running                           1 (6h25m ago)    6h30m   10.104.1.226    4am-node10   <none>           <none>
multi-vector-scene-mix-gkx8n-7-milvus-rootcoord-86d678f65cvtv46   1/1     Running                           1 (6h25m ago)    6h30m   10.104.32.143   4am-node39   <none>           <none>
multi-vector-scene-mix-gkx8n-7-minio-0                            1/1     Running                           0                6h30m   10.104.18.37    4am-node25   <none>           <none>
multi-vector-scene-mix-gkx8n-7-minio-1                            1/1     Running                           0                6h30m   10.104.23.213   4am-node27   <none>           <none>
multi-vector-scene-mix-gkx8n-7-minio-2                            1/1     Running                           0                6h30m   10.104.24.141   4am-node29   <none>           <none>
multi-vector-scene-mix-gkx8n-7-minio-3                            1/1     Running                           0                6h30m   10.104.34.103   4am-node37   <none>           <none>
multi-vector-scene-mix-gkx8n-7-pulsar-bookie-0                    1/1     Running                           0                6h30m   10.104.19.206   4am-node28   <none>           <none>
multi-vector-scene-mix-gkx8n-7-pulsar-bookie-1                    1/1     Running                           0                6h30m   10.104.23.219   4am-node27   <none>           <none>
multi-vector-scene-mix-gkx8n-7-pulsar-bookie-2                    1/1     Running                           0                6h30m   10.104.34.104   4am-node37   <none>           <none>
multi-vector-scene-mix-gkx8n-7-pulsar-bookie-init-9n4kh           0/1     Completed                         0                6h30m   10.104.9.180    4am-node14   <none>           <none>
multi-vector-scene-mix-gkx8n-7-pulsar-broker-0                    1/1     Running                           0                6h30m   10.104.18.15    4am-node25   <none>           <none>
multi-vector-scene-mix-gkx8n-7-pulsar-proxy-0                     1/1     Running                           0                6h30m   10.104.14.84    4am-node18   <none>           <none>
multi-vector-scene-mix-gkx8n-7-pulsar-pulsar-init-6z875           0/1     Completed                         0                6h30m   10.104.18.13    4am-node25   <none>           <none>
multi-vector-scene-mix-gkx8n-7-pulsar-recovery-0                  1/1     Running                           0                6h30m   10.104.13.31    4am-node16   <none>           <none>
multi-vector-scene-mix-gkx8n-7-pulsar-zookeeper-0                 1/1     Running                           0                6h30m   10.104.23.217   4am-node27   <none>           <none>
multi-vector-scene-mix-gkx8n-7-pulsar-zookeeper-1                 1/1     Running                           0                6h27m   10.104.30.200   4am-node38   <none>           <none>
multi-vector-scene-mix-gkx8n-7-pulsar-zookeeper-2                 1/1     Running                           0                6h26m   10.104.31.49    4am-node34   <none>           <none> 

client pod name: multi-vector-scene-mix-gkx8n-2043521713 client log:

截屏2024-04-22 21 20 18
wangting0128 commented 3 months ago

Recurrent

argo task: multi-vector-corn-1-1717077600 test case name: test_hybrid_search_locust_dql_max_reqs_cluster image: 2.4-20240530-68e2d532-amd64

server:

[2024-05-30 20:34:05,057 -  INFO - fouram]: [Base] Deploy initial state: 
I0530 14:12:12.999608     433 request.go:665] Waited for 1.198263357s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/argoproj.io/v1alpha1?timeout=32s
NAME                                                              READY   STATUS             RESTARTS          AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-corn-1-1717077600-7-etcd-0                           1/1     Running            0                 8m37s   10.104.26.20    4am-node32   <none>           <none>
multi-vector-corn-1-1717077600-7-etcd-1                           1/1     Running            0                 8m36s   10.104.23.159   4am-node27   <none>           <none>
multi-vector-corn-1-1717077600-7-etcd-2                           1/1     Running            0                 8m36s   10.104.21.105   4am-node24   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-datacoord-9d7d74c6v4wkb   1/1     Running            5 (6m16s ago)     8m37s   10.104.18.126   4am-node25   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-datanode-5f9f8bdfbsq7ms   1/1     Running            5 (6m11s ago)     8m37s   10.104.13.195   4am-node16   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-indexcoord-6fc48fdtrmwp   1/1     Running            0                 8m38s   10.104.20.226   4am-node22   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-indexnode-78fdb9f6fkjdf   1/1     Running            5 (6m14s ago)     8m38s   10.104.20.227   4am-node22   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-proxy-7996b7fdbf-fz9g2    1/1     Running            5 (2m7s ago)      8m38s   10.104.6.24     4am-node13   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-querycoord-565bf5frzt2t   1/1     Running            5 (2m6s ago)      8m38s   10.104.13.196   4am-node16   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-querynode-77ccb866wxw6t   1/1     Running            4 (6m59s ago)     8m38s   10.104.5.221    4am-node12   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-rootcoord-5978757cnt7vf   1/1     Running            5 (6m17s ago)     8m37s   10.104.18.127   4am-node25   <none>           <none>
multi-vector-corn-1-1717077600-7-minio-0                          1/1     Running            0                 8m36s   10.104.23.160   4am-node27   <none>           <none>
multi-vector-corn-1-1717077600-7-minio-1                          1/1     Running            0                 8m35s   10.104.26.25    4am-node32   <none>           <none>
multi-vector-corn-1-1717077600-7-minio-2                          1/1     Running            0                 8m35s   10.104.30.208   4am-node38   <none>           <none>
multi-vector-corn-1-1717077600-7-minio-3                          1/1     Running            0                 8m34s   10.104.19.205   4am-node28   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-0                  1/1     Running            0                 8m37s   10.104.26.21    4am-node32   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-1                  1/1     Running            0                 8m36s   10.104.23.161   4am-node27   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-2                  1/1     Running            0                 8m35s   10.104.21.106   4am-node24   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-init-62fk4         0/1     Completed          0                 8m38s   10.104.13.194   4am-node16   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-broker-0                  1/1     Running            0                 8m36s   10.104.9.34     4am-node14   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-proxy-0                   1/1     Running            0                 8m37s   10.104.6.25     4am-node13   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-pulsar-init-7rmcn         0/1     Completed          0                 8m38s   10.104.13.193   4am-node16   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-recovery-0                1/1     Running            0                 8m37s   10.104.6.26     4am-node13   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-zookeeper-0               1/1     Running            0                 8m37s   10.104.26.18    4am-node32   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-zookeeper-1               1/1     Running            0                 6m42s   10.104.30.216   4am-node38   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-zookeeper-2               1/1     Running            0                 4m46s   10.104.19.227   4am-node28   <none>           <none> (base.py:258)
[2024-05-30 20:34:05,057 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'NAME|multi-vector-corn-1-1717077600-7-milvus|multi-vector-corn-1-1717077600-7-minio|multi-vector-corn-1-1717077600-7-etcd|multi-vector-corn-1-1717077600-7-pulsar|multi-vector-corn-1-1717077600-7-zookeeper|multi-vector-corn-1-1717077600-7-kafka|multi-vector-corn-1-1717077600-7-log|multi-vector-corn-1-1717077600-7-tikv'  (util_cmd.py:14)
[2024-05-30 20:34:15,719 -  INFO - fouram]: [CliClient] pod details of release(multi-vector-corn-1-1717077600-7): 
 I0530 20:34:06.714505     544 request.go:665] Waited for 1.197453932s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/storage.k8s.io/v1?timeout=32s
NAME                                                              READY   STATUS             RESTARTS          AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-corn-1-1717077600-7-etcd-0                           1/1     Running            0                 6h30m   10.104.26.20    4am-node32   <none>           <none>
multi-vector-corn-1-1717077600-7-etcd-1                           1/1     Running            0                 6h30m   10.104.23.159   4am-node27   <none>           <none>
multi-vector-corn-1-1717077600-7-etcd-2                           1/1     Running            0                 6h30m   10.104.21.105   4am-node24   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-datacoord-9d7d74c6v4wkb   1/1     Running            5 (6h28m ago)     6h30m   10.104.18.126   4am-node25   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-datanode-5f9f8bdfbsq7ms   1/1     Running            5 (6h28m ago)     6h30m   10.104.13.195   4am-node16   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-indexcoord-6fc48fdtrmwp   1/1     Running            0                 6h30m   10.104.20.226   4am-node22   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-indexnode-78fdb9f6fkjdf   1/1     Running            5 (6h28m ago)     6h30m   10.104.20.227   4am-node22   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-proxy-7996b7fdbf-fz9g2    1/1     Running            5 (6h24m ago)     6h30m   10.104.6.24     4am-node13   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-querycoord-565bf5frzt2t   1/1     Running            5 (6h24m ago)     6h30m   10.104.13.196   4am-node16   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-querynode-77ccb866wxw6t   1/1     Running            4 (6h28m ago)     6h30m   10.104.5.221    4am-node12   <none>           <none>
multi-vector-corn-1-1717077600-7-milvus-rootcoord-5978757cnt7vf   1/1     Running            5 (6h28m ago)     6h30m   10.104.18.127   4am-node25   <none>           <none>
multi-vector-corn-1-1717077600-7-minio-0                          1/1     Running            0                 6h30m   10.104.23.160   4am-node27   <none>           <none>
multi-vector-corn-1-1717077600-7-minio-1                          1/1     Running            0                 6h30m   10.104.26.25    4am-node32   <none>           <none>
multi-vector-corn-1-1717077600-7-minio-2                          1/1     Running            0                 6h30m   10.104.30.208   4am-node38   <none>           <none>
multi-vector-corn-1-1717077600-7-minio-3                          1/1     Running            0                 6h30m   10.104.19.205   4am-node28   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-0                  1/1     Running            0                 6h30m   10.104.26.21    4am-node32   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-1                  1/1     Running            0                 6h30m   10.104.23.161   4am-node27   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-2                  1/1     Running            0                 6h30m   10.104.21.106   4am-node24   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-bookie-init-62fk4         0/1     Completed          0                 6h30m   10.104.13.194   4am-node16   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-broker-0                  1/1     Running            0                 6h30m   10.104.9.34     4am-node14   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-proxy-0                   1/1     Running            0                 6h30m   10.104.6.25     4am-node13   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-pulsar-init-7rmcn         0/1     Completed          0                 6h30m   10.104.13.193   4am-node16   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-recovery-0                1/1     Running            0                 6h30m   10.104.6.26     4am-node13   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-zookeeper-0               1/1     Running            0                 6h30m   10.104.26.18    4am-node32   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-zookeeper-1               1/1     Running            0                 6h28m   10.104.30.216   4am-node38   <none>           <none>
multi-vector-corn-1-1717077600-7-pulsar-zookeeper-2               1/1     Running            0                 6h26m   10.104.19.227   4am-node28   <none>           <none> 

client pod name: multi-vector-corn-1-1717077600-1154628152 client log:

截屏2024-05-31 14 13 28

flush timeout during: 2024-05-30 17:37:07,104 ~ the end

截屏2024-05-31 14 14 39
XuanYang-cn commented 2 months ago

This problem is related to concurrent flush and cp update.

Cocurrently flush will make cp lag more, at most 10mins. Because some times flushTs changes so quickly that cp updater cannot update cp immediately, it'll wait for 10mins-timely updater to update cp. Making flush timeout in 180s.

It's not an urgent issue for flush, because concurrent flush isn't common use cases.