milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.22k stars 2.9k forks source link

[Bug]: Cluster compaction is too slow and times out with 3 hours #37219

Open ThreadDao opened 1 day ago

ThreadDao commented 1 day ago

Is there an existing issue for this?

Environment

- Milvus version: master-20241025-139f4e5a-amd64
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

server config

  config:
    dataCoord:
      compaction:
        clustering:
          autoEnable: true
      enableActiveStandby: true
    indexCoord:
      enableActiveStandby: true
    log:
      level: debug
    queryCoord:
      enableActiveStandby: true
    queryNode:
      levelZeroForwardPolicy: RemoteLoad
    rootCoord:
      enableActiveStandby: true
    trace:
      exporter: jaeger
      jaeger:
        url: http://tempo-distributor.tempo:14268/api/traces
      sampleFraction: 1

test steps

image

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

pods

stats-int64-op-52-1694-etcd-0                                   1/1     Running     0                3d2h    10.104.26.173   4am-node32   <none>           <none>
stats-int64-op-52-1694-etcd-1                                   1/1     Running     0                3d2h    10.104.15.46    4am-node20   <none>           <none>
stats-int64-op-52-1694-etcd-2                                   1/1     Running     0                3d2h    10.104.24.183   4am-node29   <none>           <none>
stats-int64-op-52-1694-milvus-datanode-6794d6d9c6-d5pm2         1/1     Running     0                5h12m   10.104.33.169   4am-node36   <none>           <none>
stats-int64-op-52-1694-milvus-indexnode-8b8878bdc-l9tg8         1/1     Running     0                3d2h    10.104.9.206    4am-node14   <none>           <none>
stats-int64-op-52-1694-milvus-indexnode-8b8878bdc-lzhgt         1/1     Running     0                3d2h    10.104.13.3     4am-node16   <none>           <none>
stats-int64-op-52-1694-milvus-mixcoord-9f94b959-c9s7s           1/1     Running     0                3d2h    10.104.33.206   4am-node36   <none>           <none>
stats-int64-op-52-1694-milvus-proxy-7c55d5b65b-9k9d4            1/1     Running     0                3d2h    10.104.4.253    4am-node11   <none>           <none>
stats-int64-op-52-1694-milvus-querynode-0-5ddcd8784b-7vrxf      1/1     Running     0                5h45m   10.104.1.121    4am-node10   <none>           <none>
stats-int64-op-52-1694-milvus-querynode-0-5ddcd8784b-fz9sw      1/1     Running     0                5h45m   10.104.13.199   4am-node16   <none>           <none>
stats-int64-op-52-1694-milvus-querynode-0-5ddcd8784b-q6548      1/1     Running     35 (2d21h ago)   3d2h    10.104.4.254    4am-node11   <none>           <none>
stats-int64-op-52-1694-milvus-querynode-0-5ddcd8784b-zrs6j      1/1     Running     35 (2d21h ago)   3d2h    10.104.5.127    4am-node12   <none>           <none>
stats-int64-op-52-1694-minio-0                                  1/1     Running     0                3d2h    10.104.26.174   4am-node32   <none>           <none>
stats-int64-op-52-1694-minio-1                                  1/1     Running     0                3d2h    10.104.15.47    4am-node20   <none>           <none>
stats-int64-op-52-1694-minio-2                                  1/1     Running     0                3d2h    10.104.24.186   4am-node29   <none>           <none>
stats-int64-op-52-1694-minio-3                                  1/1     Running     0                3d2h    10.104.27.54    4am-node31   <none>           <none>
stats-int64-op-52-1694-pulsar-bookie-0                          1/1     Running     0                3d2h    10.104.24.187   4am-node29   <none>           <none>
stats-int64-op-52-1694-pulsar-bookie-1                          1/1     Running     0                3d2h    10.104.20.91    4am-node22   <none>           <none>
stats-int64-op-52-1694-pulsar-bookie-2                          1/1     Running     0                3d2h    10.104.27.57    4am-node31   <none>           <none>
stats-int64-op-52-1694-pulsar-bookie-init-ncvpc                 0/1     Completed   0                3d2h    10.104.13.254   4am-node16   <none>           <none>
stats-int64-op-52-1694-pulsar-broker-0                          1/1     Running     0                3d2h    10.104.26.169   4am-node32   <none>           <none>
stats-int64-op-52-1694-pulsar-proxy-0                           1/1     Running     0                3d2h    10.104.15.43    4am-node20   <none>           <none>
stats-int64-op-52-1694-pulsar-pulsar-init-gwp95                 0/1     Completed   0                3d2h    10.104.13.253   4am-node16   <none>           <none>
stats-int64-op-52-1694-pulsar-recovery-0                        1/1     Running     0                3d2h    10.104.26.168   4am-node32   <none>           <none>
stats-int64-op-52-1694-pulsar-zookeeper-0                       1/1     Running     0                3d2h    10.104.27.56    4am-node31   <none>           <none>
stats-int64-op-52-1694-pulsar-zookeeper-1                       1/1     Running     0                3d2h    10.104.21.96    4am-node24   <none>           <none>
stats-int64-op-52-1694-pulsar-zookeeper-2                       1/1     Running     0                3d2h    10.104.24.191   4am-node29   <none>           <none>

Anything else?

No response

xiaofan-luan commented 1 day ago

how many cores are there in a datanode?

@wayblink could you help on this?