milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
28.55k stars 2.75k forks source link

[Bug]: filter indexed segments failed #33589

Closed TonyAnn closed 1 week ago

TonyAnn commented 1 month ago

Is there an existing issue for this?

Environment

- Milvus version:2.2.14
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

milvus is stuck when building an index,

indexcoord throws [WARN] [indexcoord/node_manager.go:144] ["get IndexNode slots failed"] [nodeID=223] [error="context canceled"]

datacoord throws failed to get index of collection, ["filter indexed segments failed"] [error="context deadline exceeded"

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

bw_etcd_ALL.240604-133724.bak.gz milvus-log.tar.gz

Anything else?

No response

SimFG commented 1 month ago

@TonyAnn How did you deploy milvus? According to the log of index node, it seems that some permissions are missing. image

TonyAnn commented 1 month ago

@SimFG hi sim, milvus cluster deployed via helm in kubernetes

SimFG commented 1 month ago

You also need to confirm whether the etcd service is normal, because etcd also has failed operations during the indexnode startup process. image

TonyAnn commented 1 month ago

@SimFG By checking the log and command line, etcd is fine. etcdctl --endpoints=http://10.96.1.142:2379 endpoint health http://10.96.1.142:2379 is healthy: successfully committed proposal: took = 2.873091ms

SimFG commented 1 month ago

@TonyAnn You can try to redeploy milvus. According to the log, there should be a problem with the index node startup that caused the entire milvus to fail to start.

SimFG commented 1 month ago

@TonyAnn What version of milvus did you use?

TonyAnn commented 1 month ago

@TonyAnn What version of milvus did you use?

@SimFG version is 2.2.14

SimFG commented 1 month ago

You can try the latest milvus version. I checked the logs and the core reason is not because of permissions. If you want to solve the permission warn, you can try to add the following configuration in the deployment:

 securityContext:
   capabilities:
     add: ["SYS_NICE"]

It seems to be mainly abnormal in connecting to etcd. I recommend trying the latest version, because we rarely maintain 2.2.x now. The latest version of milvus is 2.4.3

yanliang567 commented 1 month ago

please use v2.4.4 /assign @TonyAnn

stale[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.