milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.71k stars 2.93k forks source link

[Bug]: [benchmark][cluster][multipleChunkedEnable] hybrid_search failed `failed to query: segment lacks, channel not available` in concurrent ddl &dql scene #37555

Open wangting0128 opened 1 week ago

wangting0128 commented 1 week ago

Is there an existing issue for this?

Environment

- Milvus version:master-20241108-a0315783-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc106
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: memory-opt-scenes-7vrcm test case name: test_hybrid_search_locust_multi_ddl_dql_hybrid_search_cluster

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
memory-opt-scenes-7vrcm-4-etcd-0                                  1/1     Running     0               7h13m   10.104.32.222   4am-node39   <none>           <none>
memory-opt-scenes-7vrcm-4-etcd-1                                  1/1     Running     0               7h13m   10.104.34.185   4am-node37   <none>           <none>
memory-opt-scenes-7vrcm-4-etcd-2                                  1/1     Running     0               7h13m   10.104.21.244   4am-node24   <none>           <none>
memory-opt-scenes-7vrcm-4-milvus-datanode-788f87c774-s8s94        1/1     Running     3 (7h8m ago)    7h13m   10.104.17.77    4am-node23   <none>           <none>
memory-opt-scenes-7vrcm-4-milvus-indexnode-6df6766f97-kbrtk       1/1     Running     2 (7h12m ago)   7h13m   10.104.23.186   4am-node27   <none>           <none>
memory-opt-scenes-7vrcm-4-milvus-indexnode-6df6766f97-s4cd7       1/1     Running     2 (7h12m ago)   7h13m   10.104.25.151   4am-node30   <none>           <none>
memory-opt-scenes-7vrcm-4-milvus-indexnode-6df6766f97-wmzbb       1/1     Running     1 (7h12m ago)   7h13m   10.104.33.233   4am-node36   <none>           <none>
memory-opt-scenes-7vrcm-4-milvus-indexnode-6df6766f97-xphjf       1/1     Running     2 (7h12m ago)   7h13m   10.104.26.206   4am-node32   <none>           <none>
memory-opt-scenes-7vrcm-4-milvus-mixcoord-67d644dc58-qtrbj        1/1     Running     3 (7h8m ago)    7h13m   10.104.17.74    4am-node23   <none>           <none>
memory-opt-scenes-7vrcm-4-milvus-proxy-cd7f9d658-h48qv            1/1     Running     3 (7h8m ago)    7h13m   10.104.18.109   4am-node25   <none>           <none>
memory-opt-scenes-7vrcm-4-milvus-querynode-7cf656bf97-775wd       1/1     Running     2 (7h12m ago)   7h13m   10.104.17.75    4am-node23   <none>           <none>
memory-opt-scenes-7vrcm-4-milvus-querynode-7cf656bf97-rwhqw       1/1     Running     2 (7h12m ago)   7h13m   10.104.18.116   4am-node25   <none>           <none>
memory-opt-scenes-7vrcm-4-minio-0                                 1/1     Running     0               7h13m   10.104.32.219   4am-node39   <none>           <none>
memory-opt-scenes-7vrcm-4-minio-1                                 1/1     Running     0               7h13m   10.104.27.102   4am-node31   <none>           <none>
memory-opt-scenes-7vrcm-4-minio-2                                 1/1     Running     0               7h13m   10.104.21.240   4am-node24   <none>           <none>
memory-opt-scenes-7vrcm-4-minio-3                                 1/1     Running     0               7h13m   10.104.34.187   4am-node37   <none>           <none>
memory-opt-scenes-7vrcm-4-pulsar-bookie-0                         1/1     Running     0               7h13m   10.104.32.223   4am-node39   <none>           <none>
memory-opt-scenes-7vrcm-4-pulsar-bookie-1                         1/1     Running     0               7h13m   10.104.27.103   4am-node31   <none>           <none>
memory-opt-scenes-7vrcm-4-pulsar-bookie-2                         1/1     Running     0               7h13m   10.104.21.245   4am-node24   <none>           <none>
memory-opt-scenes-7vrcm-4-pulsar-bookie-init-2sj2p                0/1     Completed   0               7h13m   10.104.25.150   4am-node30   <none>           <none>
memory-opt-scenes-7vrcm-4-pulsar-broker-0                         1/1     Running     0               7h13m   10.104.9.58     4am-node14   <none>           <none>
memory-opt-scenes-7vrcm-4-pulsar-proxy-0                          1/1     Running     0               7h13m   10.104.17.76    4am-node23   <none>           <none>
memory-opt-scenes-7vrcm-4-pulsar-pulsar-init-vgzkh                0/1     Completed   0               7h13m   10.104.25.149   4am-node30   <none>           <none>
memory-opt-scenes-7vrcm-4-pulsar-recovery-0                       1/1     Running     0               7h13m   10.104.33.232   4am-node36   <none>           <none>
memory-opt-scenes-7vrcm-4-pulsar-zookeeper-0                      1/1     Running     0               7h13m   10.104.23.189   4am-node27   <none>           <none>
memory-opt-scenes-7vrcm-4-pulsar-zookeeper-1                      1/1     Running     0               7h12m   10.104.32.235   4am-node39   <none>           <none>
memory-opt-scenes-7vrcm-4-pulsar-zookeeper-2                      1/1     Running     0               7h11m   10.104.27.111   4am-node31   <none>           <none>

client log:

[2024-11-08 08:20:22,107 - DEBUG - fouram]: (api_request)  : [Collection.hybrid_search] args: [[<pymilvus.client.abstract.AnnSearchRequest object at 0x7fc75b1cfd30>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7fc75b1cf760>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7fc75b1cf8e0>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7fc75b1cf970>], <pymilvus.client.abstract.RRFRanker object at 0x7fc75b1cf130>, 1, None, ['*'], 600, -1], kwargs: {}, [requestId: 4c408190-9daa-11ef-9ef3-6e945c89de88] (api_request.py:77)
[2024-11-08 08:20:45,569 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=failed to query: segment lacks[segment=453780940082818440]; segment lacks[segment=453780940082818440]: channel not available[channel=by-dev-rootcoord-dml_13_453780940082815326v0])>, [requestId: 4c408190-9daa-11ef-9ef3-6e945c89de88] (api_request.py:57)
[2024-11-08 08:20:45,569 - DEBUG - fouram]: (api_request)  : [Collection.hybrid_search] args: [[<pymilvus.client.abstract.AnnSearchRequest object at 0x7fc75b1c80d0>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7fc75b1c8310>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7fc75b1c8580>, <pymilvus.client.abstract.AnnSearchRequest object at 0x7fc75b1c83d0>], <pymilvus.client.abstract.RRFRanker object at 0x7fc75b1c8850>, 1, None, ['*'], 600, -1], kwargs: {}, [requestId: 5a3ca1f2-9daa-11ef-9ef3-6e945c89de88] (api_request.py:77)
[2024-11-08 08:21:06,588 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=failed to search: segment lacks[segment=453780940082818440]; segment lacks[segment=453780940082818440]: channel not available[channel=by-dev-rootcoord-dml_13_453780940082815326v0])>, [requestId: 5a3ca1f2-9daa-11ef-9ef3-6e945c89de88] (api_request.py:57)

{pod=~"memory-opt-scenes-7vrcm-4-milvus-proxy-cd7f9d658-h48qv"} |~ "5d22cede769b75e1b0ea480a317e30f9|scene_hybrid_search_test_kkxiK9Kp" server.log

Expected Behavior

No response

Steps To Reproduce

concurrent test and calculation of RT and QPS

        :purpose:  `DDL & DQL`
            verify DDL & DQL scenario,
            which has 4 vector fields(IVF_FLAT, HNSW, DISKANN, IVF_SQ8) and scalar fields: `int64_1`, `varchar_1`

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'float_vector_1': 128dim,
                'float_vector_2': 128dim,
                'float_vector_3': 128dim,
                scalar field: int64_1, varchar_1
            2. build indexes:
                IVF_FLAT: 'float_vector'
                HNSW: 'float_vector_1',
                DISKANN: 'float_vector_2'
                IVF_SQ8: 'float_vector_3'
                INVERTED: 'int64_1', 'varchar_1'
                default scalar index: 'id'
            3. insert 1 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
                replica: 1
            7. concurrent request:
                - scene_hybrid_search_test: 4 vector fields, 3 scalar fields
                    (collection: create->insert->flush->index->load->hybrid_search->drop)
                - search
                - hybrid_search
                - query

Milvus Log

No response

Anything else?

client config:

{
     "dataset_params": {
          "metric_type": "L2",
          "dim": 128,
          "scalars_index": {
               "id": {},
               "int64_1": {
                    "index_type": "INVERTED"
               },
               "varchar_1": {
                    "index_type": "INVERTED"
               }
          },
          "vectors_index": {
               "float_vector_1": {
                    "index_type": "HNSW",
                    "index_param": {
                         "M": 8,
                         "efConstruction": 200
                    },
                    "metric_type": "L2"
               },
               "float_vector_2": {
                    "index_type": "DISKANN",
                    "index_param": {},
                    "metric_type": "IP"
               },
               "float_vector_3": {
                    "index_type": "IVF_SQ8",
                    "index_param": {
                         "nlist": 2048
                    },
                    "metric_type": "L2"
               }
          },
          "scalars_params": {
               "float_vector_1": {
                    "params": {
                         "dim": 128
                    },
                    "other_params": {
                         "dataset": "sift"
                    }
               },
               "float_vector_2": {
                    "params": {
                         "dim": 128
                    },
                    "other_params": {
                         "dataset": "sift"
                    }
               },
               "float_vector_3": {
                    "params": {
                         "dim": 128
                    },
                    "other_params": {
                         "dataset": "sift"
                    }
               }
          },
          "dataset_name": "sift",
          "dataset_size": 1000000,
          "ni_per": 10000
     },
     "collection_params": {
          "other_fields": [
               "float_vector_1",
               "float_vector_2",
               "float_vector_3",
               "int64_1",
               "varchar_1"
          ],
          "shards_num": 2
     },
     "resource_groups_params": {
          "reset": false
     },
     "database_user_params": {
          "reset_rbac": false,
          "reset_db": false
     },
     "index_params": {
          "index_type": "IVF_FLAT",
          "index_param": {
               "nlist": 1024
          }
     },
     "concurrent_params": {
          "concurrent_number": 20,
          "during_time": "12h",
          "interval": 20
     },
     "concurrent_tasks": [
          {
               "type": "scene_hybrid_search_test",
               "weight": 1,
               "params": {
                    "nq": 1,
                    "top_k": 1,
                    "reqs": [
                         {
                              "search_param": {
                                   "nprobe": 128
                              },
                              "anns_field": "float_vector",
                              "top_k": 100
                         },
                         {
                              "search_param": {
                                   "nprobe": 32
                              },
                              "anns_field": "float_vector_scene_hybrid_search_test_1",
                              "top_k": 10
                         },
                         {
                              "search_param": {
                                   "ef": 32
                              },
                              "anns_field": "float_vector_scene_hybrid_search_test_2",
                              "top_k": 5
                         },
                         {
                              "search_param": {
                                   "search_list": 20
                              },
                              "anns_field": "float_vector_scene_hybrid_search_test_3",
                              "top_k": 10
                         }
                    ],
                    "rerank": {
                         "RRFRanker": []
                    },
                    "output_fields": [
                         "*"
                    ],
                    "ignore_growing": false,
                    "guarantee_timestamp": null,
                    "partition_names": null,
                    "timeout": 600,
                    "random_data": true,
                    "dataset": "local",
                    "dim": 128,
                    "shards_num": 2,
                    "data_size": 3000,
                    "nb": 3000,
                    "index_type": "IVF_SQ8",
                    "index_param": {
                         "nlist": 2048
                    },
                    "metric_type": "L2",
                    "other_fields": [
                         "float_vector_scene_hybrid_search_test_1",
                         "float_vector_scene_hybrid_search_test_2",
                         "float_vector_scene_hybrid_search_test_3",
                         "int64_1",
                         "bool_1",
                         "varchar_1"
                    ],
                    "replica_number": 1,
                    "scalars_params": {
                         "float_vector_scene_hybrid_search_test_1": {
                              "params": {
                                   "dim": 128
                              },
                              "other_params": {
                                   "dataset": "sift"
                              }
                         },
                         "float_vector_scene_hybrid_search_test_2": {
                              "params": {
                                   "dim": 128
                              },
                              "other_params": {
                                   "dataset": "sift"
                              }
                         },
                         "float_vector_scene_hybrid_search_test_3": {
                              "params": {
                                   "dim": 128
                              },
                              "other_params": {
                                   "dataset": "sift"
                              }
                         }
                    },
                    "scalars_index": {
                         "int64_1": {},
                         "bool_1": {
                              "index_type": "INVERTED"
                         },
                         "varchar_1": {
                              "index_type": "INVERTED"
                         }
                    },
                    "vectors_index": {
                         "float_vector_scene_hybrid_search_test_1": {
                              "index_type": "IVF_FLAT",
                              "index_param": {
                                   "nlist": 1024
                              },
                              "metric_type": "L2"
                         },
                         "float_vector_scene_hybrid_search_test_2": {
                              "index_type": "HNSW",
                              "index_param": {
                                   "M": 8,
                                   "efConstruction": 200
                              },
                              "metric_type": "L2"
                         },
                         "float_vector_scene_hybrid_search_test_3": {
                              "index_type": "DISKANN",
                              "index_param": {},
                              "metric_type": "IP"
                         }
                    },
                    "prepare_before_insert": false,
                    "hybrid_search_counts": 10,
                    "new_connect": false,
                    "new_user": false
               }
          },
          {
               "type": "search",
               "weight": 1,
               "params": {
                    "nq": 1000,
                    "top_k": 1,
                    "search_param": {
                         "nprobe": 1000
                    },
                    "expr": "int64_1 >= 0",
                    "guarantee_timestamp": null,
                    "partition_names": null,
                    "output_fields": null,
                    "ignore_growing": false,
                    "group_by_field": null,
                    "timeout": 600,
                    "random_data": true,
                    "check_task": "check_response",
                    "check_items": null
               }
          },
          {
               "type": "hybrid_search",
               "weight": 1,
               "params": {
                    "nq": 1,
                    "top_k": 100,
                    "reqs": [
                         {
                              "search_param": {
                                   "nprobe": 128
                              },
                              "anns_field": "float_vector",
                              "expr": "int64_1 > 100000",
                              "top_k": 100
                         },
                         {
                              "search_param": {
                                   "ef": 64
                              },
                              "anns_field": "float_vector_1",
                              "expr": "id < 900000",
                              "top_k": 10
                         },
                         {
                              "search_param": {
                                   "search_list": 32
                              },
                              "anns_field": "float_vector_2",
                              "expr": "varchar_1 > \"1\"",
                              "top_k": 30
                         },
                         {
                              "search_param": {
                                   "nprobe": 16
                              },
                              "anns_field": "float_vector_3",
                              "top_k": 400
                         }
                    ],
                    "rerank": {
                         "WeightedRanker": [
                              0.85,
                              0.95,
                              0.51,
                              0.32
                         ]
                    },
                    "output_fields": [
                         "*"
                    ],
                    "ignore_growing": false,
                    "guarantee_timestamp": null,
                    "partition_names": null,
                    "timeout": 600,
                    "random_data": true,
                    "check_task": "check_response",
                    "check_items": null
               }
          },
          {
               "type": "query",
               "weight": 1,
               "params": {
                    "ids": null,
                    "expr": "int64_1 > -1 && ",
                    "output_fields": [
                         "*"
                    ],
                    "offset": null,
                    "limit": null,
                    "ignore_growing": false,
                    "partition_names": null,
                    "timeout": 600,
                    "consistency_level": null,
                    "random_data": true,
                    "random_count": 20,
                    "random_range": [
                         0,
                         100000
                    ],
                    "field_name": "id",
                    "field_type": "int64",
                    "check_task": "check_response",
                    "check_items": null
               }
          }
     ]
}

server config:

{
     "queryNode": {
          "resources": {
               "limits": {
                    "cpu": "32.0",
                    "memory": "32Gi"
               },
               "requests": {
                    "cpu": "17.0",
                    "memory": "17Gi"
               }
          },
          "replicas": 2
     },
     "indexNode": {
          "resources": {
               "limits": {
                    "cpu": "8.0",
                    "memory": "8Gi"
               },
               "requests": {
                    "cpu": "5.0",
                    "memory": "5Gi"
               }
          },
          "replicas": 4
     },
     "dataNode": {
          "resources": {
               "limits": {
                    "cpu": "2.0",
                    "memory": "8Gi"
               },
               "requests": {
                    "cpu": "2.0",
                    "memory": "5Gi"
               }
          }
     },
     "cluster": {
          "enabled": true
     },
     "pulsar": {},
     "kafka": {},
     "minio": {
          "metrics": {
               "podMonitor": {
                    "enabled": true
               }
          }
     },
     "etcd": {
          "metrics": {
               "enabled": true,
               "podMonitor": {
                    "enabled": true
               }
          }
     },
     "metrics": {
          "serviceMonitor": {
               "enabled": true
          }
     },
     "log": {
          "level": "debug"
     },
     "extraConfigFiles": {
          "user.yaml": "queryNode:\n  segcore:\n    multipleChunkedEnable: true\n"
     },
     "image": {
          "all": {
               "repository": "harbor.milvus.io/milvus/milvus",
               "tag": "master-20241108-a0315783-amd64"
          }
     }
}
sunby commented 6 days ago

@wangting0128 same as https://github.com/milvus-io/milvus/issues/37553, please verify it.

wangting0128 commented 5 days ago

verification passed

argo task: memory-opt-scenes-7w2vb image: master-20241111-fca946de-amd64

wangting0128 commented 3 days ago

reproduce

argo task:memory-opt-scenes-2x7j4 test case name:test_inverted_locust_hnsw_diskann_dml_dql_cluster image:master-20241114-cd181e4c-amd64

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
memory-opt-scenes-2x7j4-3-etcd-0                                  1/1     Running     0               173m    10.104.34.97    4am-node37   <none>           <none>
memory-opt-scenes-2x7j4-3-etcd-1                                  1/1     Running     0               173m    10.104.19.85    4am-node28   <none>           <none>
memory-opt-scenes-2x7j4-3-etcd-2                                  1/1     Running     0               173m    10.104.23.153   4am-node27   <none>           <none>
memory-opt-scenes-2x7j4-3-milvus-datanode-5fbf6cf547-qkrst        1/1     Running     1 (173m ago)    173m    10.104.6.199    4am-node13   <none>           <none>
memory-opt-scenes-2x7j4-3-milvus-indexnode-79bb49f75d-tf6xd       1/1     Running     1 (173m ago)    173m    10.104.32.65    4am-node39   <none>           <none>
memory-opt-scenes-2x7j4-3-milvus-indexnode-79bb49f75d-z7hc4       1/1     Running     2 (173m ago)    173m    10.104.30.110   4am-node38   <none>           <none>
memory-opt-scenes-2x7j4-3-milvus-indexnode-79bb49f75d-z89d8       1/1     Running     2 (173m ago)    173m    10.104.15.141   4am-node20   <none>           <none>
memory-opt-scenes-2x7j4-3-milvus-indexnode-79bb49f75d-zq4tc       1/1     Running     2 (173m ago)    173m    10.104.20.123   4am-node22   <none>           <none>
memory-opt-scenes-2x7j4-3-milvus-mixcoord-57ccd4b99c-f77lt        1/1     Running     2 (173m ago)    173m    10.104.30.111   4am-node38   <none>           <none>
memory-opt-scenes-2x7j4-3-milvus-proxy-688997675f-vm9hl           1/1     Running     2 (173m ago)    173m    10.104.30.112   4am-node38   <none>           <none>
memory-opt-scenes-2x7j4-3-milvus-querynode-749f88656d-l2dnl       1/1     Running     2 (173m ago)    173m    10.104.14.95    4am-node18   <none>           <none>
memory-opt-scenes-2x7j4-3-milvus-querynode-749f88656d-rmnd7       1/1     Running     2 (173m ago)    173m    10.104.9.247    4am-node14   <none>           <none>
memory-opt-scenes-2x7j4-3-minio-0                                 1/1     Running     0               173m    10.104.24.14    4am-node29   <none>           <none>
memory-opt-scenes-2x7j4-3-minio-1                                 1/1     Running     0               173m    10.104.34.95    4am-node37   <none>           <none>
memory-opt-scenes-2x7j4-3-minio-2                                 1/1     Running     0               173m    10.104.19.86    4am-node28   <none>           <none>
memory-opt-scenes-2x7j4-3-minio-3                                 1/1     Running     0               173m    10.104.18.179   4am-node25   <none>           <none>
memory-opt-scenes-2x7j4-3-pulsar-bookie-0                         1/1     Running     0               173m    10.104.24.16    4am-node29   <none>           <none>
memory-opt-scenes-2x7j4-3-pulsar-bookie-1                         1/1     Running     0               173m    10.104.21.132   4am-node24   <none>           <none>
memory-opt-scenes-2x7j4-3-pulsar-bookie-2                         1/1     Running     0               173m    10.104.34.100   4am-node37   <none>           <none>
memory-opt-scenes-2x7j4-3-pulsar-bookie-init-49bw9                0/1     Completed   0               173m    10.104.18.177   4am-node25   <none>           <none>
memory-opt-scenes-2x7j4-3-pulsar-broker-0                         1/1     Running     0               173m    10.104.18.176   4am-node25   <none>           <none>
memory-opt-scenes-2x7j4-3-pulsar-proxy-0                          1/1     Running     0               173m    10.104.21.129   4am-node24   <none>           <none>
memory-opt-scenes-2x7j4-3-pulsar-pulsar-init-pqsmj                0/1     Completed   0               173m    10.104.18.175   4am-node25   <none>           <none>
memory-opt-scenes-2x7j4-3-pulsar-recovery-0                       1/1     Running     0               173m    10.104.5.203    4am-node12   <none>           <none>
memory-opt-scenes-2x7j4-3-pulsar-zookeeper-0                      1/1     Running     0               173m    10.104.34.94    4am-node37   <none>           <none>
memory-opt-scenes-2x7j4-3-pulsar-zookeeper-1                      1/1     Running     0               172m    10.104.23.155   4am-node27   <none>           <none>
memory-opt-scenes-2x7j4-3-pulsar-zookeeper-2                      1/1     Running     0               172m    10.104.19.91    4am-node28   <none>           <none>

client logs: search, hybrid_search, query all raise error

[2024-11-14 05:14:16,286 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=503, message=failed to search: segment lacks[segment=453917219271353555]: channel not available[channel=by-dev-rootcoord-dml_1_453917219263349003v1])>, <Time:{'RPC start': '2024-11-14 05:13:55.264303', 'RPC error': '2024-11-14 05:14:16.286368'}> (decorators.py:140)
[2024-11-14 05:14:16,286 - ERROR - fouram]: (api_response) : [Collection.search] <MilvusException: (code=503, message=failed to search: segment lacks[segment=453917219271353555]: channel not available[channel=by-dev-rootcoord-dml_1_453917219263349003v1])>, [requestId: 3eca957c-a247-11ef-8ab0-d63d32d0e24a] (api_request.py:57)
截屏2024-11-14 14 54 40

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `vector: memory and disk index`
            verify concurrent DML & DQL scenario which has 4 float_vector fields & 16 scalar fields

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'float_vector_1': 128dim,
                'float_vector_2': 200dim,
                'float_vector_3': 200dim,
                'int8_1', 'int16_1', 'int32_1', 'int64_1', 'double_1', 'float_1', 'varchar_1', 'bool_1',
                'int8_2', 'int16_2', 'int32_2', 'int64_2', 'double_2', 'float_2', 'varchar_2', 'bool_2'
            2. build indexes:
                HNSW: 'float_vector'
                DIAKANN_IP: 'float_vector_1'
                HNSW: 'float_vector_2'
                DIAKANN_L2: 'float_vector_3'
                scalar_default_index: 'int8_1', 'int16_1', 'int32_1', 'int64_1', 'double_1', 'float_1', 'varchar_1'
                scalar_INVERTED_index: 'int8_2', 'int16_2', 'int32_2', 'int64_2', 'double_2', 'float_2', 'varchar_2', 'bool_2'
            3. insert 5 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - insert
                - delete
                - flush
                - load
                - search
                - hybrid_search
                - query
         (base.py:44)

@sunby please help to check /assign @sunby

wangting0128 commented 2 days ago

reproduce

argo task:inverted-corn-1731618000 test case name:test_inverted_locust_hnsw_ivf_sq8_dml_dql_cluster image: master-20241114-1d06d432-amd64

server:

NAME                                                              READY   STATUS             RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-118000-7-59-1335-etcd-0                             1/1     Running            0                3h48m   10.104.25.232   4am-node30   <none>           <none>
inverted-corn-118000-7-59-1335-etcd-1                             1/1     Running            0                3h48m   10.104.19.130   4am-node28   <none>           <none>
inverted-corn-118000-7-59-1335-etcd-2                             1/1     Running            0                3h48m   10.104.24.120   4am-node29   <none>           <none>
inverted-corn-118000-7-59-1335-milvus-datanode-5d5f54b44f-t2pxx   1/1     Running            1 (3h44m ago)    3h48m   10.104.14.230   4am-node18   <none>           <none>
inverted-corn-118000-7-59-1335-milvus-indexnode-5978dc656b6gclb   1/1     Running            1 (3h48m ago)    3h48m   10.104.13.26    4am-node16   <none>           <none>
inverted-corn-118000-7-59-1335-milvus-indexnode-5978dc656bccfvp   1/1     Running            1 (3h48m ago)    3h48m   10.104.6.180    4am-node13   <none>           <none>
inverted-corn-118000-7-59-1335-milvus-mixcoord-598668594f-ltf7w   1/1     Running            1 (3h48m ago)    3h48m   10.104.6.178    4am-node13   <none>           <none>
inverted-corn-118000-7-59-1335-milvus-proxy-58684d8f59-ffc2h      1/1     Running            1 (3h48m ago)    3h48m   10.104.6.181    4am-node13   <none>           <none>
inverted-corn-118000-7-59-1335-milvus-querynode-75bb9f596dwlh4l   1/1     Running            1 (3h48m ago)    3h48m   10.104.6.179    4am-node13   <none>           <none>
inverted-corn-118000-7-59-1335-minio-0                            1/1     Running            0                3h48m   10.104.25.231   4am-node30   <none>           <none>
inverted-corn-118000-7-59-1335-minio-1                            1/1     Running            0                3h48m   10.104.19.129   4am-node28   <none>           <none>
inverted-corn-118000-7-59-1335-minio-2                            1/1     Running            0                3h48m   10.104.24.125   4am-node29   <none>           <none>
inverted-corn-118000-7-59-1335-minio-3                            1/1     Running            0                3h48m   10.104.16.212   4am-node21   <none>           <none>
inverted-corn-118000-7-59-1335-pulsar-bookie-0                    1/1     Running            0                3h48m   10.104.32.63    4am-node39   <none>           <none>
inverted-corn-118000-7-59-1335-pulsar-bookie-1                    1/1     Running            0                3h48m   10.104.23.48    4am-node27   <none>           <none>
inverted-corn-118000-7-59-1335-pulsar-bookie-2                    1/1     Running            0                3h48m   10.104.24.126   4am-node29   <none>           <none>
inverted-corn-118000-7-59-1335-pulsar-bookie-init-6x8hw           0/1     Completed          0                3h48m   10.104.25.225   4am-node30   <none>           <none>
inverted-corn-118000-7-59-1335-pulsar-broker-0                    1/1     Running            0                3h48m   10.104.25.226   4am-node30   <none>           <none>
inverted-corn-118000-7-59-1335-pulsar-proxy-0                     1/1     Running            0                3h48m   10.104.14.229   4am-node18   <none>           <none>
inverted-corn-118000-7-59-1335-pulsar-pulsar-init-mqb7p           0/1     Completed          0                3h48m   10.104.14.231   4am-node18   <none>           <none>
inverted-corn-118000-7-59-1335-pulsar-recovery-0                  1/1     Running            0                3h48m   10.104.13.31    4am-node16   <none>           <none>
inverted-corn-118000-7-59-1335-pulsar-zookeeper-0                 1/1     Running            0                3h48m   10.104.25.233   4am-node30   <none>           <none>
inverted-corn-118000-7-59-1335-pulsar-zookeeper-1                 1/1     Running            0                3h48m   10.104.17.80    4am-node23   <none>           <none>
inverted-corn-118000-7-59-1335-pulsar-zookeeper-2                 1/1     Running            0                3h47m   10.104.21.232   4am-node24   <none>           <none> 

client log:

[2024-11-15 00:42:09,127 - ERROR - fouram]: (api_response) : [Collection.search] <MilvusException: (code=503, message=failed to query: segment lacks[segment=453933363675746891]: channel not available[channel=by-dev-rootcoord-dml_0_453933363650626287v0])>, [requestId: 6544f6ea-a2ea-11ef-96d3-06c4b494423f] (api_request.py:57)
[2024-11-15 00:42:18,805 - ERROR - fouram]: (api_response) : [Collection.query] <MilvusException: (code=503, message=failed to query: segment lacks[segment=453933363675746891]: channel not available[channel=by-dev-rootcoord-dml_0_453933363650626287v0])>, [requestId: 6b4b1808-a2ea-11ef-96d3-06c4b494423f] (api_request.py:57)
[2024-11-15 00:43:58,003 - ERROR - fouram]: (api_response) : [Collection.hybrid_search] <MilvusException: (code=503, message=failed to search: segment lacks[segment=453933363675998103]: channel not available[channel=by-dev-rootcoord-dml_1_453933363650626287v1])>, [requestId: a66b4cd2-a2ea-11ef-96d3-06c4b494423f] (api_request.py:57)

test steps:

        concurrent test and calculation of RT and QPS

        :purpose:  `vector: memory index`
            verify concurrent DML & DQL scenario which has 2 float_vector fields & 16 scalar fields

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'float_vector_1': 200dim,
                'int8_1', 'int16_1', 'int32_1', 'int64_1', 'double_1', 'float_1', 'varchar_1', 'bool_1',
                'int8_2', 'int16_2', 'int32_2', 'int64_2', 'double_2', 'float_2', 'varchar_2', 'bool_2'
            2. build indexes:
                HNSW: 'float_vector'
                IVF_SQ8: 'float_vector_1'
                scalar_default_index: 'int8_1', 'int16_1', 'int32_1', 'int64_1', 'double_1', 'float_1', 'varchar_1'
                scalar_INVERTED_index: 'int8_2', 'int16_2', 'int32_2', 'int64_2', 'double_2', 'float_2', 'varchar_2', 'bool_2'
            3. insert 5 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - insert
                - delete
                - flush
                - load
                - search
                - hybrid_search
                - query
sunby commented 2 days ago

@wangting0128 https://github.com/milvus-io/milvus/pull/37694 fix it. /assign @wangting0128

wangting0128 commented 2 days ago

working on it