milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.51k stars 2.83k forks source link

[Bug]: [benchmark][standalone][memory]Concurrent create and drop collection, the memory of Milvus Standalone continues to rise #19492

Closed wangting0128 closed 1 year ago

wangting0128 commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:master-20220927-67b95597
- Deployment mode(standalone or cluster):standalone
- SDK version(e.g. pymilvus v2.0.0rc2):2.2.0.dev32
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: fouram-2bn8t

client yaml: client-configmap: client-random-locust-search-filter-100m-ddl-6d server-configmap: server-single-32c128m

client pod: fouram-2bn8t-3147776694

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP             NODE         NOMINATED NODE   READINESS GATES
fouram-2bn8t-1-etcd-0                                             1/1     Running     0               63m     10.104.4.167   4am-node11   <none>           <none>
fouram-2bn8t-1-milvus-standalone-7f68d6999c-9zfgq                 1/1     Running     0               63m     10.104.5.75    4am-node12   <none>           <none>
fouram-2bn8t-1-minio-5b9f74d47b-6b7ws                             1/1     Running     0               63m     10.104.5.74    4am-node12   <none>           <none>
截屏2022-09-27 下午9 40 27 截屏2022-09-27 下午9 39 35

Memory increased by 3G in an hour

Expected Behavior

Memory usage remains stable without significant fluctuations

Steps To Reproduce

1、create collection
2、build index of ivf_sq8
3、insert 100000 vectors
4、flush
5、build index with the same params
6、load
7、locust concurrent: search, load, query, scene_test 《- during 1h

Milvus Log

No response

Anything else?

client-random-locust-search-filter-100m-ddl-6d:

    locust_random_performance:
      collections:
        -
          collection_name: sift_10w_128_l2
          other_fields: float1
          ni_per: 50000
          build_index: true
          index_type: ivf_sq8
          index_param:
            nlist: 2048
          task:
            types:
              -
                type: query
                weight: 20
                params:
                  top_k: 10
                  nq: 10
                  search_param:
                    nprobe: 16
                  filters:
                    -
                      range: "{'range': {'float1': {'GT': -1.0, 'LT': collection_size * 0.5}}}"
              -
                type: load
                weight: 1
              -
                type: get
                weight: 10
                params:
                  ids_length: 10
              -
                type: scene_test
                weight: 2
            connection_num: 1
            clients_num: 20
            spawn_rate: 2
            during_time: 1h
    def scene_test(self, collection_name=None, vectors=None, ids=None, index_type="ivf_sq8",
                   index_param={'nlist': 2048}, metric_type="l2"):
        logger.debug("[scene_test] Start scene test : %s" % collection_name)
        self.create_collection(dimension=128, collection_name=collection_name)
        time.sleep(1)

        collection_info = self.get_info(collection_name)

        entities = utils.generate_entities(collection_info, vectors, ids)
        logger.debug("[scene_test] Start insert : %s" % collection_name)
        self.insert(entities, collection_name=collection_name)
        logger.debug("[scene_test] Start flush : %s" % collection_name)
        self.flush(collection_name=collection_name)

        self.count(collection_name=collection_name)

        logger.debug("[scene_test] {0} start create index:{1}, index_param:{2}, metric_type:{3}".format(collection_name,
                                                                                                        index_type,
                                                                                                        index_param,
                                                                                                        metric_type))
        self.create_index(field_name='float_vector', index_type=index_type, metric_type=metric_type,
                          collection_name=collection_name, index_param=index_param)
        time.sleep(59)

        logger.debug("[scene_test] Start drop : %s" % collection_name)
        self.drop(collection_name=collection_name)
        logger.debug("[scene_test]Scene test close : %s" % collection_name)
xiaofan-luan commented 1 year ago

/assign @longjiquan

longjiquan commented 1 year ago

standalone: image

cluster: image

wangting0128 commented 1 year ago

argo task:fouram-9txhk

test yaml: client-configmap:client-random-locust-hnsw-search-filter-100m-ddl server-configmap:server-single-32c128m

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP             NODE         NOMINATED NODE   READINESS GATES
fouram-9txhk-1-etcd-0                                             1/1     Running     0               102s    10.104.9.100   4am-node14   <none>           <none>
fouram-9txhk-1-milvus-standalone-59456db974-ztczl                 1/1     Running     0               102s    10.104.5.88    4am-node12   <none>           <none>
fouram-9txhk-1-minio-6cd598d9f9-t4qvn                             1/1     Running     0               102s    10.104.6.43    4am-node13   <none>           <none>

monitor:

截屏2022-10-12 11 12 01 截屏2022-10-12 11 12 15
longjiquan commented 1 year ago

tests only with load/query: image

longjiquan commented 1 year ago

tests only with scene_test: image

longjiquan commented 1 year ago

For both two cases, the memory will go to be stable finally.

jingkl commented 1 year ago

server-instance fouram-tag-no-clean-ksrxv-1 server-configmap server-single-4c8m client-configmap client-random-locust-search-filter-10w-onddl

master-20221019-52cd40fb 2.2.0.dev42

fouram-tag-no-clean-ksrxv-1-etcd-0                                1/1     Running     0               116s    10.104.6.187   4am-node13   <none>           <none>
fouram-tag-no-clean-ksrxv-1-milvus-standalone-dd96f57cc-mcknk     1/1     Running     0               116s    10.104.6.186   4am-node13   <none>           <none>
fouram-tag-no-clean-ksrxv-1-minio-69888ddbd-vqz2j                 1/1     Running     0               116s    10.104.6.185   4am-node13   <none>           <none>

memory:

截屏2022-10-20 15 54 07
data:
  config.yaml: |
    locust_random_performance:
      collections:
        -
          collection_name: sift_10w_128_l2
          other_fields: float1
          ni_per: 50000
          build_index: true
          index_type: ivf_sq8
          index_param:
            nlist: 2048
          task:
            types:
              -
                type: scene_test
                weight: 10
            connection_num: 1
            clients_num: 20
            spawn_rate: 2
            during_time: 24h
longjiquan commented 1 year ago

image

longjiquan commented 1 year ago

image

longjiquan commented 1 year ago

However, if we don't set cache for rocksdb by SetNoBlockCache(true), heaptrack will tell you no memory leakage. image

xiaofan-luan commented 1 year ago

this might not be a issue since rosksdb block cache takes 1 gigas by default?

longjiquan commented 1 year ago

this might not be a issue since rosksdb block cache takes 1 gigas by default?

Yeah, we'll also set the cache size according to the total memory size & memory usage.

jingkl commented 1 year ago

server-instance fouram-vn7sw-1 server-configmap server-single-4c8m client-configmap client-random-locust-search-filter-10w-onddl master-20221025-ec83bbf7 2.2.0.dev63

截屏2022-10-28 17 15 02
jingkl commented 1 year ago

It has been verified that this memory rise problem still exists, please fix it. @longjiquan

longjiquan commented 1 year ago

already verified by branch test-no-block-cache, commit: e39befc. /assign @jingkl

longjiquan commented 1 year ago

image

longjiquan commented 1 year ago

After verification, it is proved that the block cache of rocksdb holds the memory. Of course, this behavior matches the expectations and should not be treated as a memory leak problem even not an issue.

jingkl commented 1 year ago

This issue has been verified and the behaviour has been judged to be as expected, so close the issue