milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.27k stars 2.91k forks source link

[Bug]: Standalone oomkilled after upgrading mater branch image #34055

Closed ThreadDao closed 3 weeks ago

ThreadDao commented 4 months ago

Is there an existing issue for this?

Environment

- Milvus version: e83ecd50 -> master-20240621-4e414fb7-amd64
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):   rocksmq  
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

  1. standalone pod zong-go-master-milvus-standalone-5cc75d965b-t46jr with resource config
    Limits:
      cpu:     8
      memory:  16Gi
    Requests:
      cpu:      5
      memory:   9Gi
  2. upgrade image to master-20240621-4e414fb7-amd64 and pod oomkilled
  3. upgrade standalone pod memory limit to 32Gi and pod is running
    • pod memory usage

image

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

pods:

zong-go-master-etcd-0                                             1/1     Running                           0              3d      10.104.25.21    4am-node30   <none>           <none>
zong-go-master-milvus-standalone-6cdbf578d-phtls                  1/1     Running                           0              10m     10.104.19.7     4am-node28   <none>           <none>
zong-go-master-minio-55956b966c-s48rt                             1/1     Running                           0              3d      10.104.34.72    4am-node37   <none>           <none>

Anything else?

No response

yanliang567 commented 4 months ago

there are only 20 collections with totally 15k vectors loaded before upgrading, btw

ThreadDao commented 4 months ago

pyroscope of pod zong-go-master-milvus-standalone-6cdbf578d-phtls: http://10.100.36.158:4040/?query=zong-go-master-milvus-standalone.alloc_objects%7B%7D&from=1718955620&until=1718959425

xiaofan-luan commented 4 months ago

So we have 20collections with only 15K vectors each collection and 16GB standalone get OOM?

sunby commented 4 months ago

same issue: [Bug]: Upgrading from v2.3.15 to master-20240625-506a9152-amd64 due to DN and QN crash and one index