Closed ThreadDao closed 1 year ago
I think db creation is all about meta, it shall not lead to OOM. /assign @jaime0815 /unassign
It seems rocksdb had used a lot of memory, the size of sst files grows to 11G in the data directory.
/var/lib/milvus/rdb_data:
total 11G
-rw-r--r-- 1 root root 66M May 17 06:34 003435.sst
-rw-r--r-- 1 root root 66M May 17 03:16 002822.sst
-rw-r--r-- 1 root root 66M May 17 00:37 002275.sst
-rw-r--r-- 1 root root 66M May 16 20:10 001496.sst
-rw-r--r-- 1 root root 66M May 16 12:59 000306.sst
-rw-r--r-- 1 root root 66M May 16 12:30 000212.sst
-rw-r--r-- 1 root root 66M May 16 12:48 000267.sst
-rw-r--r-- 1 root root 66M May 16 12:21 000178.sst
-rw-r--r-- 1 root root 66M May 16 13:13 000361.sst
-rw-r--r-- 1 root root 66M May 16 13:09 000341.sst
-rw-r--r-- 1 root root 66M May 16 12:25 000194.sst
-rw-r--r-- 1 root root 66M May 16 13:09 000343.sst
-rw-r--r-- 1 root root 66M May 16 13:04 000325.sst
-rw-r--r-- 1 root root 66M May 16 12:53 000285.sst
-rw-r--r-- 1 root root 66M May 16 12:37 000231.sst
-rw-r--r-- 1 root root 66M May 16 12:20 000177.sst
-rw-r--r-- 1 root root 66M May 16 12:42 000247.sst
-rw-r--r-- 1 root root 66M May 16 13:57 000531.sst
-rw-r--r-- 1 root root 66M May 16 12:17 000158.sst
-rw-r--r-- 1 root root 66M May 16 11:47 000051.sst
-rw-r--r-- 1 root root 66M May 16 13:04 000326.sst
-rw-r--r-- 1 root root 66M May 16 11:44 000038.sst
-rw-r--r-- 1 root root 66M May 16 11:57 000097.sst
-rw-r--r-- 1 root root 66M May 16 12:37 000230.sst
-rw-r--r-- 1 root root 66M May 16 11:50 000064.sst
-rw-r--r-- 1 root root 66M May 16 12:08 000126.sst
-rw-r--r-- 1 root root 66M May 16 11:41 000026.sst
-rw-r--r-- 1 root root 66M May 16 12:12 000142.sst
-rw-r--r-- 1 root root 66M May 16 11:57 000098.sst
-rw-r--r-- 1 root root 66M May 16 11:44 000039.sst
-rw-r--r-- 1 root root 66M May 16 12:08 000125.sst
-rw-r--r-- 1 root root 66M May 17 06:34 003436.sst
-rw-r--r-- 1 root root 65M May 17 07:15 003531.sst
-rw-r--r-- 1 root root 65M May 16 21:54 001814.sst
-rw-r--r-- 1 root root 65M May 16 20:37 001619.sst
....
milvus failed to start, due to the produce or consume stream being too slow.
[2023/05/17 06:57:12.545 +00:00] [WARN] [server/rocksmq_impl.go:628] ["rocksmq produce too slowly"] [topic=zong-db-rootcoord-delta_0] ["get lock elapse"=8226] ["alloc elapse"=0] ["write elapse"=1] ["updatePage elapse"=0] ["produce total elapse"=8227]
[2023/05/17 06:57:12.545 +00:00] [WARN] [server/rocksmq_impl.go:628] ["rocksmq produce too slowly"] [topic=zong-db-rootcoord-delta_0] ["get lock elapse"=8221] ["alloc elapse"=0] ["write elapse"=0] ["updatePage elapse"=0] ["produce total elapse"=8221]
[2023/05/17 06:57:12.545 +00:00] [WARN] [server/rocksmq_impl.go:628] ["rocksmq produce too slowly"] [topic=zong-db-rootcoord-delta_0] ["get lock elapse"=8212] ["alloc elapse"=0] ["write elapse"=1] ["updatePage elapse"=0] ["produce total elapse"=8213]
[2023/05/17 06:57:12.545 +00:00] [WARN] [server/rocksmq_impl.go:628] ["rocksmq produce too slowly"] [topic=zong-db-rootcoord-delta_0] ["get lock elapse"=8212] ["alloc elapse"=0] ["write elapse"=0] ["updatePage elapse"=0] ["produce total elapse"=8212]
[2023/05/17 06:57:12.545 +00:00] [WARN] [server/rocksmq_impl.go:628] ["rocksmq produce too slowly"] [topic=zong-db-rootcoord-delta_0] ["get lock elapse"=8212] ["alloc elapse"=0] ["write elapse"=0] ["updatePage elapse"=0] ["produce total elapse"=8212]
[2023/05/17 06:57:12.546 +00:00] [WARN] [server/rocksmq_impl.go:628] ["rocksmq produce too slowly"] [topic=zong-db-rootcoord-delta_0] ["get lock elapse"=8212] ["alloc elapse"=0] ["write elapse"=0] ["updatePage elapse"=0] ["produce total elapse"=8212]
[2023/05/17 06:57:12.550 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=3.116324127s]
[2023/05/17 06:57:12.550 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=3.037529922s]
[2023/05/17 06:57:12.550 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=3.027311795s]
[2023/05/17 06:57:12.550 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=2.783017938s]
related to https://github.com/milvus-io/milvus/issues/24106, the retention mechanism doesn't work.
Maybe the reason is the case test_create_collection_exceeds_per_db
, create max_collections_per_db=65536
collections. just my guess
It seems rocksdb memory leak
Any idea about this error? My milvus version : v2.0.2 standlone Only 100 rows test data , 2 collection...
Any idea about this error? My milvus version : v2.0.2 standlone Only 100 rows test data , 2 collection...
I don't think you met the same issue, as v2.0.2 does not include database feature in this issue. please retry with latest v2.2.8
@jaime0815
image: 2.2.0-20230524-e8545777
I guess the reason is so many (max 65536) collections are created. Inserted data can be negligible
zong-db-1-etcd-0 1/1 Running 0 4d2h
zong-db-1-milvus-standalone-84f7c587bc-hbbwt 1/1 Running 1 (9m27s ago) 48m
zong-db-1-minio-7457dc9fdb-dfpns 1/1 Running 0 4d2h
The prof option of Jemalloc is enabled, It causes significant performance degradation. https://github.com/milvus-io/milvus/blob/15368f5e752cc5c152b9954dd4a55d0f79926e27/internal/core/thirdparty/jemalloc/CMakeLists.txt#L50
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
Is there an existing issue for this?
Environment
Current Behavior
deploy standalone with config
Debug and run all the db test cases.
run a utility case:
standalone pod oomkilled. The weird things is that there is only 7 collections and every collection is empty (no insertion)
db_D3198izZ {'row_count': 0} db_Vb2tsSQR {'row_count': 0} db_xuHut7cZ {'row_count': 0} db_QOMN4Rjq {'row_count': 0} db_XTMwuDnD {'row_count': 0} db_4uO2Icsv {'row_count': 0} db_3S8QG4lr {'row_count': 0}
zong-db-etcd-0 1/1 Running 0 4h52m zong-db-milvus-standalone-c775f6d94-pshxl 1/1 Running 7 (3m15s ago) 4h49m zong-db-minio-95fb5b866-zss9c 1/1 Running 0 4h52m