milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.52k stars 2.83k forks source link

[Bug]: After enabling the streaming node, all test cases hung after Milvus recovering from the minio pod kill chaos test #36388

Open zhuwenxing opened 2 hours ago

zhuwenxing commented 2 hours ago

Is there an existing issue for this?

Environment

- Milvus version:master-20240919-f6526121-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

image

flush and describe collection cost a lot of time image

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-straming-node-cron/detail/chaos-test-straming-node-cron/19/pipeline log: artifacts-s3-pod-kill-19-server-logs.tar.gz

Anything else?

No response

zhuwenxing commented 2 hours ago

cluster: 4am ns: chaos-testing pod info


[2024-09-20T05:36:50.773Z] + kubectl get pods -o wide

[2024-09-20T05:36:50.774Z] + grep s3-pod-kill-19

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-etcd-0                                             1/1     Running       0               33m     10.104.23.239   4am-node27   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-etcd-1                                             1/1     Running       0               33m     10.104.18.133   4am-node25   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-etcd-2                                             1/1     Running       0               33m     10.104.19.61    4am-node28   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-milvus-datanode-5b5b66b7f-cswsx                    1/1     Running       2 (33m ago)     33m     10.104.5.32     4am-node12   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-milvus-datanode-5b5b66b7f-w7dg9                    1/1     Running       2 (33m ago)     33m     10.104.21.227   4am-node24   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-milvus-indexnode-565dcbb85d-94dlb                  1/1     Running       2 (33m ago)     33m     10.104.30.131   4am-node38   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-milvus-indexnode-565dcbb85d-flv48                  1/1     Running       2 (32m ago)     33m     10.104.17.239   4am-node23   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-milvus-indexnode-565dcbb85d-wqbpf                  1/1     Running       2 (33m ago)     33m     10.104.18.129   4am-node25   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-milvus-mixcoord-77fd744f5b-phnj5                   1/1     Running       2 (33m ago)     33m     10.104.32.220   4am-node39   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-milvus-proxy-7997d5fb9d-q4nh6                      1/1     Running       2 (33m ago)     33m     10.104.24.102   4am-node29   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-milvus-querynode-75b6fcc584-88bbh                  1/1     Running       1 (32m ago)     33m     10.104.4.115    4am-node11   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-milvus-querynode-75b6fcc584-cczt6                  1/1     Running       2 (32m ago)     33m     10.104.23.241   4am-node27   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-milvus-querynode-75b6fcc584-l4c8d                  1/1     Running       2 (33m ago)     33m     10.104.23.236   4am-node27   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-milvus-streamingnode-567dcb44f4-mzxm7              1/1     Running       2 (33m ago)     33m     10.104.27.171   4am-node31   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-milvus-streamingnode-567dcb44f4-nffng              1/1     Running       2 (33m ago)     33m     10.104.20.88    4am-node22   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-minio-0                                            1/1     Running       0               8m50s   10.104.20.161   4am-node22   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-minio-1                                            1/1     Running       0               8m50s   10.104.30.158   4am-node38   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-minio-2                                            1/1     Running       0               8m50s   10.104.32.4     4am-node39   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-minio-3                                            1/1     Running       0               8m50s   10.104.18.161   4am-node25   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-pulsar-bookie-0                                    1/1     Running       0               33m     10.104.33.141   4am-node36   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-pulsar-bookie-1                                    1/1     Running       0               33m     10.104.32.227   4am-node39   <none>           <none>

[2024-09-20T05:36:51.030Z] s3-pod-kill-19-pulsar-bookie-init-cnb24                           0/1     Completed     0               33m     10.104.33.133   4am-node36   <none>           <none>

[2024-09-20T05:36:51.031Z] s3-pod-kill-19-pulsar-broker-0                                    1/1     Running       0               33m     10.104.14.190   4am-node18   <none>           <none>

[2024-09-20T05:36:51.031Z] s3-pod-kill-19-pulsar-proxy-0                                     1/1     Running       0               33m     10.104.20.90    4am-node22   <none>           <none>

[2024-09-20T05:36:51.031Z] s3-pod-kill-19-pulsar-pulsar-init-pbqtc                           0/1     Completed     0               33m     10.104.20.89    4am-node22   <none>           <none>

[2024-09-20T05:36:51.031Z] s3-pod-kill-19-pulsar-recovery-0                                  1/1     Running       0               33m     10.104.18.128   4am-node25   <none>           <none>

[2024-09-20T05:36:51.031Z] s3-pod-kill-19-pulsar-zookeeper-0                                 1/1     Running       0               33m     10.104.23.240   4am-node27   <none>           <none>
zhuwenxing commented 2 hours ago

/assign @chyezh PTAL