milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.67k stars 2.85k forks source link

[Bug]: [benchmark][standalone] Collection load failed after rebuild index #32075

Open wangting0128 opened 5 months ago

wangting0128 commented 5 months ago

Is there an existing issue for this?

Environment

- Milvus version: 2.4-20240409-4c073047-amd64 
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq   
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.0rc66
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: multi-vector-corn-sp4ff test case name: test_hybrid_search_locust_shard16_float_dql_ivf_flat_standalone

server:

NAME                                                              READY   STATUS                            RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-corn-sp4ff-32-etcd-0                                 1/1     Running                           0               7h21m   10.104.19.165   4am-node28   <none>           <none>
multi-vector-corn-sp4ff-32-milvus-standalone-65b9bcf6c9-ltsnp     1/1     Running                           0               7h21m   10.104.15.128   4am-node20   <none>           <none>
multi-vector-corn-sp4ff-32-minio-758b7f9749-m9zb6                 1/1     Running                           0               7h21m   10.104.29.62    4am-node35   <none>           <none>

Segment Loaded Num image

memory usage

截屏2024-04-09 18 35 33

OOM log

截屏2024-04-09 18 37 09

load failed segment: 448956254485300637 image

Milvus(by-dev) > show segment
SegmentID: 448956254485288430 State: Flushed, Level: L1, Row Count:2428
SegmentID: 448956254485291836 State: Flushed, Level: L1, Row Count:2431
SegmentID: 448956254485291837 State: Flushed, Level: L1, Row Count:2424
SegmentID: 448956254485295566 State: Flushed, Level: L1, Row Count:2433
SegmentID: 448956254485296632 State: Flushed, Level: L1, Row Count:2421
SegmentID: 448956254485300636 State: Flushed, Level: L1, Row Count:2415
SegmentID: 448956254485300637 State: Flushed, Level: L1, Row Count:2436
SegmentID: 448956254485304967 State: Flushed, Level: L1, Row Count:2436
SegmentID: 448956254485308776 State: Flushed, Level: L1, Row Count:2426
SegmentID: 448956254485323342 State: Flushed, Level: L1, Row Count:2435
SegmentID: 448956254485326475 State: Flushed, Level: L1, Row Count:2440
SegmentID: 448956254485326743 State: Flushed, Level: L1, Row Count:2424
SegmentID: 448956254485330938 State: Flushed, Level: L1, Row Count:2418
SegmentID: 448956254485352604 State: Flushed, Level: L1, Row Count:2428
SegmentID: 448956254485375613 State: Flushed, Level: L1, Row Count:2436
SegmentID: 448956254485460072 State: Flushed, Level: L1, Row Count:1700
SegmentID: 448956254485471912 State: Flushed, Level: L1, Row Count:1701
SegmentID: 448956254485472508 State: Flushed, Level: L1, Row Count:1952
SegmentID: 448956254485473567 State: Flushed, Level: L1, Row Count:1949
SegmentID: 448956254485473833 State: Flushed, Level: L1, Row Count:1695
SegmentID: 448956254485474959 State: Flushed, Level: L1, Row Count:1944
SegmentID: 448956254485476151 State: Flushed, Level: L1, Row Count:1935
SegmentID: 448956254485477275 State: Flushed, Level: L1, Row Count:2190
SegmentID: 448956254485478136 State: Flushed, Level: L1, Row Count:1703
SegmentID: 448956254485478137 State: Flushed, Level: L1, Row Count:1958
SegmentID: 448956254485479394 State: Flushed, Level: L1, Row Count:1705
SegmentID: 448956254485479395 State: Flushed, Level: L1, Row Count:1711
SegmentID: 448956254485480586 State: Flushed, Level: L1, Row Count:1936
SegmentID: 448956254485480587 State: Flushed, Level: L1, Row Count:1934
SegmentID: 448956254485481910 State: Flushed, Level: L1, Row Count:2196
SegmentID: 448956254485481911 State: Flushed, Level: L1, Row Count:2194
SegmentID: 448956254485483499 State: Flushed, Level: L1, Row Count:2432
SegmentID: 448956254485483897 State: Flushed, Level: L1, Row Count:1928
SegmentID: 448956254485485219 State: Flushed, Level: L1, Row Count:1935
SegmentID: 448956254485486081 State: Flushed, Level: L1, Row Count:1901
SegmentID: 448956254485486149 State: Flushed, Level: L1, Row Count:1944
SegmentID: 448956254485487406 State: Flushed, Level: L1, Row Count:1919
SegmentID: 448956254485488068 State: Flushed, Level: L1, Row Count:2114
SegmentID: 448956254485488069 State: Flushed, Level: L1, Row Count:1853
SegmentID: 448956254485489458 State: Flushed, Level: L1, Row Count:1844
SegmentID: 448956254485489459 State: Flushed, Level: L1, Row Count:1612
SegmentID: 448956254485490716 State: Flushed, Level: L1, Row Count:2153
SegmentID: 448956254485490717 State: Flushed, Level: L1, Row Count:2146
SegmentID: 448956254485492172 State: Flushed, Level: L1, Row Count:1935
SegmentID: 448956254485492834 State: Flushed, Level: L1, Row Count:1698
SegmentID: 448956254485493430 State: Flushed, Level: L1, Row Count:1684
SegmentID: 448956254485494026 State: Flushed, Level: L1, Row Count:2019
SegmentID: 448956254485494027 State: Flushed, Level: L1, Row Count:2049
--- Growing: 0, Sealed: 0, Flushed: 48, Dropped: 0
--- Small Segments: 0, row count: 0  Other Segments: 48, row count: 100000
--- Total Segments: 48, row count: 100000
Milvus(by-dev) > show collections
================================================================================
DBID: 1
Collection ID: 448956254484955499   Collection Name: fouram_BsdSztNY
Collection State: CollectionCreated Create Time: 2024-04-09 11:11:07
Fields:
 - Field ID: 0   Field Name: RowID   Field Type: Int64
 - Field ID: 1   Field Name: Timestamp   Field Type: Int64
 - Field ID: 100     Field Name: id      Field Type: Int64
     - Primary Key: true, AutoID: false
 - Field ID: 101     Field Name: float_vector    Field Type: FloatVector
     - Type Param dim: 32768
 - Field ID: 102     Field Name: float_vector_1      Field Type: FloatVector
     - Type Param dim: 32768
 - Field ID: 103     Field Name: float_vector_2      Field Type: FloatVector
     - Type Param dim: 32768
 - Field ID: 104     Field Name: float_vector_3      Field Type: FloatVector
     - Type Param dim: 32768
 - Field ID: 105     Field Name: int8_1      Field Type: Int8
 - Field ID: 106     Field Name: int16_1     Field Type: Int16
 - Field ID: 107     Field Name: int32_1     Field Type: Int32
 - Field ID: 108     Field Name: int64_1     Field Type: Int64
 - Field ID: 109     Field Name: double_1    Field Type: Double
 - Field ID: 110     Field Name: float_1     Field Type: Float
 - Field ID: 111     Field Name: varchar_1   Field Type: VarChar
     - Type Param max_length: 10
 - Field ID: 112     Field Name: bool_1      Field Type: Bool
 - Field ID: 113     Field Name: json_1      Field Type: JSON
 - Field ID: 114     Field Name: array_int8_1    Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 115     Field Name: array_int16_1   Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 116     Field Name: array_int32_1   Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 117     Field Name: array_int64_1   Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 118     Field Name: array_double_1      Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 119     Field Name: array_float_1   Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 120     Field Name: array_varchar_1     Field Type: Array
     - Type Param max_capacity: 7
     - Type Param max_length: 10
 - Field ID: 121     Field Name: array_bool_1    Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 122     Field Name: int8_2      Field Type: Int8
 - Field ID: 123     Field Name: int16_2     Field Type: Int16
 - Field ID: 124     Field Name: int32_2     Field Type: Int32
 - Field ID: 125     Field Name: int64_2     Field Type: Int64
 - Field ID: 126     Field Name: double_2    Field Type: Double
 - Field ID: 127     Field Name: float_2     Field Type: Float
 - Field ID: 128     Field Name: varchar_2   Field Type: VarChar
     - Type Param max_length: 10
 - Field ID: 129     Field Name: bool_2      Field Type: Bool
 - Field ID: 130     Field Name: json_2      Field Type: JSON
 - Field ID: 131     Field Name: array_int8_2    Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 132     Field Name: array_int16_2   Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 133     Field Name: array_int32_2   Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 134     Field Name: array_int64_2   Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 135     Field Name: array_double_2      Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 136     Field Name: array_float_2   Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 137     Field Name: array_varchar_2     Field Type: Array
     - Type Param max_length: 10
     - Type Param max_capacity: 7
 - Field ID: 138     Field Name: array_bool_2    Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 139     Field Name: int8_3      Field Type: Int8
 - Field ID: 140     Field Name: int16_3     Field Type: Int16
 - Field ID: 141     Field Name: int32_3     Field Type: Int32
 - Field ID: 142     Field Name: int64_3     Field Type: Int64
 - Field ID: 143     Field Name: double_3    Field Type: Double
 - Field ID: 144     Field Name: float_3     Field Type: Float
 - Field ID: 145     Field Name: varchar_3   Field Type: VarChar
     - Type Param max_length: 10
 - Field ID: 146     Field Name: bool_3      Field Type: Bool
 - Field ID: 147     Field Name: json_3      Field Type: JSON
 - Field ID: 148     Field Name: array_int8_3    Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 149     Field Name: array_int16_3   Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 150     Field Name: array_int32_3   Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 151     Field Name: array_int64_3   Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 152     Field Name: array_double_3      Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 153     Field Name: array_float_3   Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 154     Field Name: array_varchar_3     Field Type: Array
     - Type Param max_length: 10
     - Type Param max_capacity: 7
 - Field ID: 155     Field Name: array_bool_3    Field Type: Array
     - Type Param max_capacity: 7
 - Field ID: 156     Field Name: varchar_tail_1      Field Type: VarChar
     - Type Param max_length: 10
 - Field ID: 157     Field Name: varchar_tail_2      Field Type: VarChar
     - Type Param max_length: 10
 - Field ID: 158     Field Name: varchar_tail_3      Field Type: VarChar
     - Type Param max_length: 10
 - Field ID: 159     Field Name: varchar_tail_4      Field Type: VarChar
     - Type Param max_length: 10
 - Field ID: 160     Field Name: varchar_tail_5      Field Type: VarChar
     - Type Param max_length: 10
 - Field ID: 161     Field Name: varchar_tail_6      Field Type: VarChar
     - Type Param max_length: 10
 - Field ID: 162     Field Name: varchar_tail_7      Field Type: VarChar
     - Type Param max_length: 10
 - Field ID: 163     Field Name: varchar_tail_8      Field Type: VarChar
     - Type Param max_length: 10
Enable Dynamic Schema: false
Consistency Level: Bounded
Start position for channel by-dev-rootcoord-dml_0(by-dev-rootcoord-dml_0_448956254484955499v0): [1 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_1(by-dev-rootcoord-dml_1_448956254484955499v1): [13 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_2(by-dev-rootcoord-dml_2_448956254484955499v2): [6 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_3(by-dev-rootcoord-dml_3_448956254484955499v3): [7 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_4(by-dev-rootcoord-dml_4_448956254484955499v4): [16 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_5(by-dev-rootcoord-dml_5_448956254484955499v5): [2 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_6(by-dev-rootcoord-dml_6_448956254484955499v6): [4 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_7(by-dev-rootcoord-dml_7_448956254484955499v7): [9 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_8(by-dev-rootcoord-dml_8_448956254484955499v8): [10 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_9(by-dev-rootcoord-dml_9_448956254484955499v9): [11 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_10(by-dev-rootcoord-dml_10_448956254484955499v10): [14 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_11(by-dev-rootcoord-dml_11_448956254484955499v11): [3 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_12(by-dev-rootcoord-dml_12_448956254484955499v12): [8 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_13(by-dev-rootcoord-dml_13_448956254484955499v13): [15 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_14(by-dev-rootcoord-dml_14_448956254484955499v14): [5 0 144 26 83 3 59 6]
Start position for channel by-dev-rootcoord-dml_15(by-dev-rootcoord-dml_15_448956254484955499v15): [12 0 144 26 83 3 59 6]
Collection properties(0):--- Total collections:  1   Matched collections:  1
--- Total channel: 16    Healthy collections: 1
================================================================================

client log:

[2024-04-09 07:01:43,000 -  INFO - fouram]: [Base] Connection params: {'alias': 'default', 'host': 'multi-vector-corn-sp4ff-32-milvus.qa-milvus.svc.cluster.local', 'port': '19530', 'uri': '', 'secure': False, 'user': '', 'password': '', 'token': '', 'db_name': ''} (base.py:237)
[2024-04-09 07:01:43,018 -  INFO - fouram]: [Base] Connect collection fouram_BsdSztNY (base.py:277)
[2024-04-09 07:01:43,025 -  INFO - fouram]: [Base] Collection schema: {'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 32768}}, {'name': 'float_vector_1', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 32768}}, {'name': 'float_vector_2', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 32768}}, {'name': 'float_vector_3', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 32768}}, {'name': 'int8_1', 'description': '', 'type': <DataType.INT8: 2>}, {'name': 'int16_1', 'description': '', 'type': <DataType.INT16: 3>}, {'name': 'int32_1', 'description': '', 'type': <DataType.INT32: 4>}, {'name': 'int64_1', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'double_1', 'description': '', 'type': <DataType.DOUBLE: 11>}, {'name': 'float_1', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'varchar_1', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 10}}, {'name': 'bool_1', 'description': '', 'type': <DataType.BOOL: 1>}, {'name': 'json_1', 'description': '', 'type': <DataType.JSON: 23>}, {'name': 'array_int8_1', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.INT8: 2>}, {'name': 'array_int16_1', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.INT16: 3>}, {'name': 'array_int32_1', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.INT32: 4>}, {'name': 'array_int64_1', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.INT64: 5>}, {'name': 'array_double_1', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.DOUBLE: 11>}, {'name': 'array_float_1', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.FLOAT: 10>}, {'name': 'array_varchar_1', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_length': 10, 'max_capacity': 7}, 'element_type': <DataType.VARCHAR: 21>}, {'name': 'array_bool_1', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.BOOL: 1>}, {'name': 'int8_2', 'description': '', 'type': <DataType.INT8: 2>}, {'name': 'int16_2', 'description': '', 'type': <DataType.INT16: 3>}, {'name': 'int32_2', 'description': '', 'type': <DataType.INT32: 4>}, {'name': 'int64_2', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'double_2', 'description': '', 'type': <DataType.DOUBLE: 11>}, {'name': 'float_2', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'varchar_2', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 10}}, {'name': 'bool_2', 'description': '', 'type': <DataType.BOOL: 1>}, {'name': 'json_2', 'description': '', 'type': <DataType.JSON: 23>}, {'name': 'array_int8_2', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.INT8: 2>}, {'name': 'array_int16_2', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.INT16: 3>}, {'name': 'array_int32_2', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.INT32: 4>}, {'name': 'array_int64_2', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.INT64: 5>}, {'name': 'array_double_2', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.DOUBLE: 11>}, {'name': 'array_float_2', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.FLOAT: 10>}, {'name': 'array_varchar_2', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_length': 10, 'max_capacity': 7}, 'element_type': <DataType.VARCHAR: 21>}, {'name': 'array_bool_2', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.BOOL: 1>}, {'name': 'int8_3', 'description': '', 'type': <DataType.INT8: 2>}, {'name': 'int16_3', 'description': '', 'type': <DataType.INT16: 3>}, {'name': 'int32_3', 'description': '', 'type': <DataType.INT32: 4>}, {'name': 'int64_3', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'double_3', 'description': '', 'type': <DataType.DOUBLE: 11>}, {'name': 'float_3', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'varchar_3', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 10}}, {'name': 'bool_3', 'description': '', 'type': <DataType.BOOL: 1>}, {'name': 'json_3', 'description': '', 'type': <DataType.JSON: 23>}, {'name': 'array_int8_3', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.INT8: 2>}, {'name': 'array_int16_3', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.INT16: 3>}, {'name': 'array_int32_3', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.INT32: 4>}, {'name': 'array_int64_3', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.INT64: 5>}, {'name': 'array_double_3', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.DOUBLE: 11>}, {'name': 'array_float_3', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.FLOAT: 10>}, {'name': 'array_varchar_3', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_length': 10, 'max_capacity': 7}, 'element_type': <DataType.VARCHAR: 21>}, {'name': 'array_bool_3', 'description': '', 'type': <DataType.ARRAY: 22>, 'params': {'max_capacity': 7}, 'element_type': <DataType.BOOL: 1>}, {'name': 'varchar_tail_1', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 10}}, {'name': 'varchar_tail_2', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 10}}, {'name': 'varchar_tail_3', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 10}}, {'name': 'varchar_tail_4', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 10}}, {'name': 'varchar_tail_5', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 10}}, {'name': 'varchar_tail_6', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 10}}, {'name': 'varchar_tail_7', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 10}}, {'name': 'varchar_tail_8', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 10}}], 'enable_dynamic_field': False} (base.py:326)
[2024-04-09 07:01:43,025 -  INFO - fouram]: [CommonCases] Prepare collection fouram_BsdSztNY done. (common_cases.py:76)
[2024-04-09 07:01:43,027 -  INFO - fouram]: [Base] Index params of fouram_BsdSztNY:[{'float_vector_3': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 8, 'efConstruction': 200}}}, {'int8_1': {}}, {'id': {'index_type': 'INVERTED'}}, {'bool_3': {'index_type': 'INVERTED'}}, {'float_vector': {'params': {'M': 8, 'efConstruction': 200}, 'index_type': 'HNSW', 'metric_type': 'L2'}}, {'float_vector_1': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 8, 'efConstruction': 200}}}, {'float_vector_2': {'metric_type': 'L2', 'params': {'M': 8, 'efConstruction': 200}, 'index_type': 'HNSW'}}] (base.py:481)
[2024-04-09 07:01:43,027 -  INFO - fouram]: [Base] Start release collection fouram_BsdSztNY (base.py:321)
[2024-04-09 07:01:45,277 -  INFO - fouram]: [Base] Clean all index done. (base.py:508)
[2024-04-09 07:01:45,277 -  INFO - fouram]: [Base] Start build index of IVF_FLAT for field:float_vector collection:fouram_BsdSztNY, params:{'index_type': 'IVF_FLAT', 'metric_type': 'L2', 'params': {'nlist': 1024}}, kwargs:{} (base.py:462)
[2024-04-09 07:05:24,170 -  INFO - fouram]: [Time] Index run in 218.8924s (api_request.py:49)
[2024-04-09 07:05:24,171 -  INFO - fouram]: [CommonCases] RT of build index IVF_FLAT: 218.8924s (common_cases.py:150)
[2024-04-09 07:05:24,171 -  INFO - fouram]: [CommonCases] Prepare index IVF_FLAT done. (common_cases.py:152)
[2024-04-09 07:05:24,171 -  INFO - fouram]: [CommonCases] Start building other fields index. (common_cases.py:173)
[2024-04-09 07:05:24,172 -  INFO - fouram]: [Base] Index params of fouram_BsdSztNY:[{'float_vector': {'index_type': 'IVF_FLAT', 'metric_type': 'L2', 'params': {'nlist': 1024}}}] (base.py:481)
[2024-04-09 07:05:24,173 -  INFO - fouram]: [Base] Start build index of IVF_FLAT for field:float_vector_1 collection:fouram_BsdSztNY, params:{'index_type': 'IVF_FLAT', 'metric_type': 'L2', 'params': {'nlist': 1024}}, kwargs:{} (base.py:462)
[2024-04-09 07:08:51,966 -  INFO - fouram]: [Time] Index run in 207.7917s (api_request.py:49)
[2024-04-09 07:08:51,966 -  INFO - fouram]: [CommonCases] RT of build vector field index `float_vector_1`: 207.7917s (common_cases.py:190)
[2024-04-09 07:08:51,967 -  INFO - fouram]: [Base] Start build index of IVF_FLAT for field:float_vector_2 collection:fouram_BsdSztNY, params:{'index_type': 'IVF_FLAT', 'metric_type': 'L2', 'params': {'nlist': 1024}}, kwargs:{} (base.py:462)
[2024-04-09 07:12:17,996 -  INFO - fouram]: [Time] Index run in 206.0292s (api_request.py:49)
[2024-04-09 07:12:17,997 -  INFO - fouram]: [CommonCases] RT of build vector field index `float_vector_2`: 206.0292s (common_cases.py:190)
[2024-04-09 07:12:17,997 -  INFO - fouram]: [Base] Start build index of IVF_FLAT for field:float_vector_3 collection:fouram_BsdSztNY, params:{'index_type': 'IVF_FLAT', 'metric_type': 'L2', 'params': {'nlist': 1024}}, kwargs:{} (base.py:462)
[2024-04-09 07:15:49,064 -  INFO - fouram]: [Time] Index run in 211.066s (api_request.py:49)
[2024-04-09 07:15:49,064 -  INFO - fouram]: [CommonCases] RT of build vector field index `float_vector_3`: 211.066s (common_cases.py:190)
[2024-04-09 07:15:49,065 -  INFO - fouram]: [Base] Start build scalar index of fouram_BsdSztNY for field:int8_1, index_params:{}, kwargs: {} (base.py:471)
[2024-04-09 07:16:37,369 -  INFO - fouram]: [Time] Index run in 48.3037s (api_request.py:49)
[2024-04-09 07:16:37,369 -  INFO - fouram]: [CommonCases] RT of build scalar field index `int8_1`: 48.3037s (common_cases.py:198)
[2024-04-09 07:16:37,369 -  INFO - fouram]: [Base] Start build scalar index of fouram_BsdSztNY for field:id, index_params:{'index_type': 'INVERTED'}, kwargs: {} (base.py:471)
[2024-04-09 07:17:25,173 -  INFO - fouram]: [Time] Index run in 47.8036s (api_request.py:49)
[2024-04-09 07:17:25,173 -  INFO - fouram]: [CommonCases] RT of build scalar field index `id`: 47.8036s (common_cases.py:198)
[2024-04-09 07:17:25,173 -  INFO - fouram]: [Base] Start build scalar index of fouram_BsdSztNY for field:bool_3, index_params:{'index_type': 'INVERTED'}, kwargs: {} (base.py:471)
[2024-04-09 07:18:12,295 -  INFO - fouram]: [Time] Index run in 47.1213s (api_request.py:49)
[2024-04-09 07:18:12,295 -  INFO - fouram]: [CommonCases] RT of build scalar field index `bool_3`: 47.1213s (common_cases.py:198)
[2024-04-09 07:18:12,295 -  INFO - fouram]: [CommonCases] Prepare scalars:['int8_1', 'id', 'bool_3'] vectors:['float_vector_1', 'float_vector_2', 'float_vector_3'] index done. (common_cases.py:200)
[2024-04-09 07:18:12,298 -  INFO - fouram]: [Base] Index params of fouram_BsdSztNY:[{'float_vector': {'index_type': 'IVF_FLAT', 'metric_type': 'L2', 'params': {'nlist': 1024}}}, {'float_vector_1': {'index_type': 'IVF_FLAT', 'metric_type': 'L2', 'params': {'nlist': 1024}}}, {'float_vector_3': {'index_type': 'IVF_FLAT', 'metric_type': 'L2', 'params': {'nlist': 1024}}}, {'int8_1': {}}, {'id': {'index_type': 'INVERTED'}}, {'bool_3': {'index_type': 'INVERTED'}}, {'float_vector_2': {'index_type': 'IVF_FLAT', 'metric_type': 'L2', 'params': {'nlist': 1024}}}] (base.py:481)
[2024-04-09 07:18:12,300 -  INFO - fouram]: [Base] Number of vectors in the collection(fouram_BsdSztNY): 100000 (base.py:514)
[2024-04-09 07:18:12,300 -  INFO - fouram]: [Base] Start load collection fouram_BsdSztNY,replica_number:1,kwargs:{} (base.py:316)
[2024-04-09 07:29:13,019 - ERROR - fouram]: RPC error: [get_loading_progress], <MilvusException: (code=65535, message=show collection failed: load segment failed, OOM if load, maxSegmentSize = 2802.8969621658325 MB,  memUsage = 57652.66796875 MB, predictMemUsage = 60455.56493091583 MB, totalMem = 65536 MB thresholdFactor = 0.900000)>, <Time:{'RPC start': '2024-04-09 07:29:13.018131', 'RPC error': '2024-04-09 07:29:13.019785'}> (decorators.py:146)
[2024-04-09 07:29:13,020 - ERROR - fouram]: RPC error: [wait_for_loading_collection], <MilvusException: (code=65535, message=show collection failed: load segment failed, OOM if load, maxSegmentSize = 2802.8969621658325 MB,  memUsage = 57652.66796875 MB, predictMemUsage = 60455.56493091583 MB, totalMem = 65536 MB thresholdFactor = 0.900000)>, <Time:{'RPC start': '2024-04-09 07:18:12.310206', 'RPC error': '2024-04-09 07:29:13.020776'}> (decorators.py:146)
[2024-04-09 07:29:13,020 - ERROR - fouram]: RPC error: [load_collection], <MilvusException: (code=65535, message=show collection failed: load segment failed, OOM if load, maxSegmentSize = 2802.8969621658325 MB,  memUsage = 57652.66796875 MB, predictMemUsage = 60455.56493091583 MB, totalMem = 65536 MB thresholdFactor = 0.900000)>, <Time:{'RPC start': '2024-04-09 07:18:12.300665', 'RPC error': '2024-04-09 07:29:13.020915'}> (decorators.py:146)
[2024-04-09 07:29:13,022 - ERROR - fouram]: (api_response) : [Collection.load] <MilvusException: (code=65535, message=show collection failed: load segment failed, OOM if load, maxSegmentSize = 2802.8969621658325 MB,  memUsage = 57652.66796875 MB, predictMemUsage = 60455.56493091583 MB, totalMem = 65536 MB thresholdFactor = 0.900000)>, [requestId: 53206e3c-f641-11ee-816c-e6b1e6fae1fe] (api_request.py:57)
[2024-04-09 07:29:13,022 - ERROR - fouram]: [CheckFunc] load request check failed, response:<MilvusException: (code=65535, message=show collection failed: load segment failed, OOM if load, maxSegmentSize = 2802.8969621658325 MB,  memUsage = 57652.66796875 MB, predictMemUsage = 60455.56493091583 MB, totalMem = 65536 MB thresholdFactor = 0.900000)> (func_check.py:54)

Expected Behavior

No response

Steps To Reproduce

1. test_hybrid_search_locust_shard16_float_dql_hnsw_standalone
2. based on step 1, rebuild index to IVF_FLAT: test_hybrid_search_locust_shard16_float_dql_ivf_flat_standalone <- load OOM

        :purpose:  `shard_num=16, float_vector DQL`
            verify concurrent DQL scenario which has 4 float_vector fields(IVF_FLAT) and 60 scalar fields

        :test steps:
            1. create collection with fields:
                'float_vector': 32768dim,
                'float_vector_1': 32768dim,
                'float_vector_2': 32768dim,
                'float_vector_3': 32768dim,
                all scalar fields: varchar max_length=10, array max_capacity=7
            2. build indexes:
                IVF_FLAT: 'float_vector', 'float_vector_1', 'float_vector_2', 'float_vector_3'
                default_scalar_index: 'int64_1'
                INVERTED: 'id', 'bool_3'
            3. insert 100k data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
                replica: 1
            7. concurrent request:
                - hybrid_search

Milvus Log

No response

Anything else?

step 1 test result: test_hybrid_search_locust_shard16_float_dql_hnsw_standalone

{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_16c64m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '16.0',
                                                               'memory': '64Gi'},
                                                    'requests': {'cpu': '9.0',
                                                                 'memory': '33Gi'}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1,
                                'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone',
                                 'metrics': {'podMonitor': {'enabled': True}}},
                       'pulsar': {'enabled': False},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': '2.4-20240409-4c073047-amd64'}}},
            'host': 'multi-vector-corn-sp4ff-32-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_hybrid_search_locust_shard16_float_dql_hnsw_standalone',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 32768,
                                                    'max_length': 10,
                                                    'scalars_index': {'int8_1': {},
                                                                      'id': {'index_type': 'INVERTED'},
                                                                      'bool_3': {'index_type': 'INVERTED'}},
                                                    'vectors_index': {'float_vector_1': {'index_type': 'HNSW',
                                                                                         'index_param': {'M': 8,
                                                                                                         'efConstruction': 200},
                                                                                         'metric_type': 'L2'},
                                                                      'float_vector_2': {'index_type': 'HNSW',
                                                                                         'index_param': {'M': 8,
                                                                                                         'efConstruction': 200},
                                                                                         'metric_type': 'L2'},
                                                                      'float_vector_3': {'index_type': 'HNSW',
                                                                                         'index_param': {'M': 8,
                                                                                                         'efConstruction': 200},
                                                                                         'metric_type': 'L2'}},
                                                    'scalars_params': {'array_int8_1': {'params': {'max_capacity': 7}},
                                                                       'array_int16_1': {'params': {'max_capacity': 7}},
                                                                       'array_int32_1': {'params': {'max_capacity': 7}},
                                                                       'array_int64_1': {'params': {'max_capacity': 7}},
                                                                       'array_double_1': {'params': {'max_capacity': 7}},
                                                                       'array_float_1': {'params': {'max_capacity': 7}},
                                                                       'array_varchar_1': {'params': {'max_capacity': 7}},
                                                                       'array_bool_1': {'params': {'max_capacity': 7}},
                                                                       'array_int8_2': {'params': {'max_capacity': 7}},
                                                                       'array_int16_2': {'params': {'max_capacity': 7}},
                                                                       'array_int32_2': {'params': {'max_capacity': 7}},
                                                                       'array_int64_2': {'params': {'max_capacity': 7}},
                                                                       'array_double_2': {'params': {'max_capacity': 7}},
                                                                       'array_float_2': {'params': {'max_capacity': 7}},
                                                                       'array_varchar_2': {'params': {'max_capacity': 7}},
                                                                       'array_bool_2': {'params': {'max_capacity': 7}},
                                                                       'array_int8_3': {'params': {'max_capacity': 7}},
                                                                       'array_int16_3': {'params': {'max_capacity': 7}},
                                                                       'array_int32_3': {'params': {'max_capacity': 7}},
                                                                       'array_int64_3': {'params': {'max_capacity': 7}},
                                                                       'array_double_3': {'params': {'max_capacity': 7}},
                                                                       'array_float_3': {'params': {'max_capacity': 7}},
                                                                       'array_varchar_3': {'params': {'max_capacity': 7}},
                                                                       'array_bool_3': {'params': {'max_capacity': 7}}},
                                                    'dataset_name': 'local',
                                                    'dataset_size': 100000,
                                                    'ni_per': 100},
                                 'collection_params': {'other_fields': ['float_vector_1',
                                                                        'float_vector_2',
                                                                        'float_vector_3',
                                                                        'int8_1',
                                                                        'int16_1',
                                                                        'int32_1',
                                                                        'int64_1',
                                                                        'double_1',
                                                                        'float_1',
                                                                        'varchar_1',
                                                                        'bool_1',
                                                                        'json_1',
                                                                        'array_int8_1',
                                                                        'array_int16_1',
                                                                        'array_int32_1',
                                                                        'array_int64_1',
                                                                        'array_double_1',
                                                                        'array_float_1',
                                                                        'array_varchar_1',
                                                                        'array_bool_1',
                                                                        'int8_2',
                                                                        'int16_2',
                                                                        'int32_2',
                                                                        'int64_2',
                                                                        'double_2',
                                                                        'float_2',
                                                                        'varchar_2',
                                                                        'bool_2',
                                                                        'json_2',
                                                                        'array_int8_2',
                                                                        'array_int16_2',
                                                                        'array_int32_2',
                                                                        'array_int64_2',
                                                                        'array_double_2',
                                                                        'array_float_2',
                                                                        'array_varchar_2',
                                                                        'array_bool_2',
                                                                        'int8_3',
                                                                        'int16_3',
                                                                        'int32_3',
                                                                        'int64_3',
                                                                        'double_3',
                                                                        'float_3',
                                                                        'varchar_3',
                                                                        'bool_3',
                                                                        'json_3',
                                                                        'array_int8_3',
                                                                        'array_int16_3',
                                                                        'array_int32_3',
                                                                        'array_int64_3',
                                                                        'array_double_3',
                                                                        'array_float_3',
                                                                        'array_varchar_3',
                                                                        'array_bool_3',
                                                                        'varchar_tail_1',
                                                                        'varchar_tail_2',
                                                                        'varchar_tail_3',
                                                                        'varchar_tail_4',
                                                                        'varchar_tail_5',
                                                                        'varchar_tail_6',
                                                                        'varchar_tail_7',
                                                                        'varchar_tail_8'],
                                                       'shards_num': 16},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'HNSW',
                                                  'index_param': {'M': 8,
                                                                  'efConstruction': 200}},
                                 'concurrent_params': {'concurrent_number': 20,
                                                       'during_time': '1h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'hybrid_search',
                                                       'weight': 1,
                                                       'params': {'nq': 1,
                                                                  'top_k': 1,
                                                                  'reqs': [{'search_param': {'ef': 128},
                                                                            'anns_field': 'float_vector',
                                                                            'expr': 'id '
                                                                                    '> '
                                                                                    '10000',
                                                                            'top_k': 10},
                                                                           {'search_param': {'ef': 64},
                                                                            'anns_field': 'float_vector_1',
                                                                            'expr': 'int8_1 '
                                                                                    '> '
                                                                                    '64',
                                                                            'top_k': 50},
                                                                           {'search_param': {'ef': 1024},
                                                                            'anns_field': 'float_vector_2',
                                                                            'expr': 'array_length(array_int8_2) '
                                                                                    '== '
                                                                                    '7',
                                                                            'top_k': 1000},
                                                                           {'search_param': {'ef': 4000},
                                                                            'anns_field': 'float_vector_3',
                                                                            'expr': 'bool_3 '
                                                                                    '== '
                                                                                    'True',
                                                                            'top_k': 3000}],
                                                                  'rerank': {'RRFRanker': []},
                                                                  'output_fields': ['*'],
                                                                  'ignore_growing': False,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'timeout': 60,
                                                                  'random_data': True}}]},
            'run_id': 2024040921155458,
            'datetime': '2024-04-09 03:08:35.544475',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 850.0167,
                                      'float_vector_1': {'RT': 870.0332},
                                      'float_vector_2': {'RT': 360.9028},
                                      'float_vector_3': {'RT': 0.5168},
                                      'int8_1': {'RT': 0.5143},
                                      'id': {'RT': 0.512},
                                      'bool_3': {'RT': 0.515}},
                            'insert': {'total_time': 2971.6812,
                                       'VPS': 33.651,
                                       'batch_time': 2.9717,
                                       'batch': 100},
                            'flush': {'RT': 3.2132},
                            'load': {'RT': 60.3367},
                            'Locust': {'Aggregated': {'Requests': 4488,
                                                      'Fails': 0,
                                                      'RPS': 1.25,
                                                      'fail_s': 0.0,
                                                      'RT_max': 23453.51,
                                                      'RT_avg': 15807.57,
                                                      'TP50': 16000.0,
                                                      'TP99': 21000.0},
                                       'hybrid_search': {'Requests': 4488,
                                                         'Fails': 0,
                                                         'RPS': 1.25,
                                                         'fail_s': 0.0,
                                                         'RT_max': 23453.51,
                                                         'RT_avg': 15807.57,
                                                         'TP50': 16000.0,
                                                         'TP99': 21000.0}}}}}
wangting0128 commented 5 months ago

The same scenario runs successfully

argo task:multi-vector-corn-576nt image: 2.4-20240407-c18193cb-amd64

server:

NAME                                                              READY   STATUS                            RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-corn-576nt-32-etcd-0                                 1/1     Running                           0                7h51m   10.104.27.15    4am-node31   <none>           <none>
multi-vector-corn-576nt-32-milvus-standalone-698c66d746-l8kbx     1/1     Running                           0                7h51m   10.104.29.72    4am-node35   <none>           <none>
multi-vector-corn-576nt-32-minio-68957bbcfc-nlrnh                 1/1     Running                           0                7h51m   10.104.27.16    4am-node31   <none>           <none>

image

{pod=~"multi-vector-corn-576nt-32-milvus-standalone-698c66d746-l8kbx"} |~ "load" milvus_load.log

client: image

yanliang567 commented 5 months ago

/unassign

longjiquan commented 5 months ago

The behavior may be related to the loading order of segments.

PredictedMemUsageAfterLoad = MemUsage + Predict(segment), and the Predict(segment) is positively related to the segment size, which will cause this issue.

Taking a very simple example, suppose we load all the segments sequentially, and the size of 5 segments are 1g, 2g, 3g, 4g, 5g, and the Predict(segment) = 2.5 segment size, but the ActualMemUsage(segment) = 1 segment size. Why we can have this assumption is that we have 48 segments and if all segments need 2.5 times size memory then 64G is far not enough.

So if we load them in descending order by segment size, the usage sequence will be [5, 9, 12, 14, 15], and the predict sequence will be [12.5, 15, 16.5, 17, 16.5], the final predicted memory usage is 16.5g. On the contray, if we load them in ascending order by segment size, the usage sequence will be [1, 3, 6, 10, 15], but the predict sequence will be [2.5, 6, 10.5, 16, 22.5], the final predicted memory usage is 22.5g.

I think above example can explain this issue.

czs007 commented 5 months ago

@longjiquan how about modify the loading order to be descending? and verfity this case.

yanliang567 commented 4 months ago

@longjiquan any updates