milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.89k stars 2.87k forks source link

[Bug]: milvus Cluster indexnode error can not #36533

Open fjfzyxy opened 2 weeks ago

fjfzyxy commented 2 weeks ago

Is there an existing issue for this?

Environment

- Milvus version: 2.2.14
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

1

Expected Behavior

Initially, the cluster datacoord OomKill,At the same time, the index nodealso reported an error. Increase the resources of datacoord after restarting the pod indexnode error log ,now cluster can not use, help me please

Steps To Reproduce

Initially, the  cluster datacoord OomKill,At the same time, the index nodealso reported an error.
Increase the resources of datacoord after restarting the pod
indexnode error log ,now cluster can not use, help me please

Milvus Log

[2024/09/26 02:14:41.343 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize[errcode:404, exception:, errmessage:No response body.]\n"] [2024/09/26 02:14:41.343 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize[errcode:404, exception:, errmessage:No response body.]"] [stack="github.com/milvus-io/milvus/internal/indexnode.(indexBuildTask).BuildIndex\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task.go:340\ngithub.com/milvus-io/milvus/internal/indexnode.(TaskScheduler).processTask.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:207\ngithub.com/milvus-io/milvus/internal/indexnode.(TaskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:220\ngithub.com/milvus-io/milvus/internal/indexnode.(TaskScheduler).indexBuildLoop.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:253"] [2024/09/26 02:14:41.343 +00:00] [INFO] [indexnode/taskinfo_ops.go:42] ["IndexNode store task state"] [clusterID=by-dev] [buildID=452312785507291952] [state=Retry] ["fail reason"="[UnexpectedError] Error:GetObjectSize[errcode:404, exception:, errmessage:No response body.]"] [2024/09/26 02:14:41.358 +00:00] [INFO] [indexnode/indexnode_service.go:164] ["drop index build jobs"] [traceID=23d0249c40c4d2b9] [ClusterID=by-dev] [IndexBuildIDs="[452312785507292141]"] [2024/09/26 02:14:41.358 +00:00] [INFO] [indexnode/indexnode_service.go:182] ["drop index build jobs success"] [traceID=23d0249c40c4d2b9] [ClusterID=by-dev] [IndexBuildIDs="[452312785507292141]"] [2024/09/26 02:14:41.395 +00:00] [INFO] [indexnode/indexnode_service.go:164] ["drop index build jobs"] [traceID=7b913cbc2dd83e03] [ClusterID=by-dev] [IndexBuildIDs="[452312785507292208]"] [2024/09/26 02:14:41.395 +00:00] [INFO] [indexnode/indexnode_service.go:182] ["drop index build jobs success"] [traceID=7b913cbc2dd83e03] [ClusterID=by-dev] [IndexBuildIDs="[452312785507292208]"] [2024/09/26 02:14:41.411 +00:00] [INFO] [indexnode/indexnode_service.go:164] ["drop index build jobs"] [traceID=49396fa0dae7533] [ClusterID=by-dev] [IndexBuildIDs="[452312785507292255]"] [2024/09/26 02:14:41.411 +00:00] [INFO] [indexnode/indexnode_service.go:182] ["drop index build jobs success"] [traceID=49396fa0dae7533] [ClusterID=by-dev] [IndexBuildIDs="[452312785507292255]"] [2024/09/26 02:14:41.416 +00:00] [INFO] [indexnode/indexnode_service.go:209] ["Get Index Job Stats"] [traceID=7cf6cccaf9c656aa] [Unissued=0] [Active=0] [Slot=1] [2024/09/26 02:14:41.422 +00:00] [INFO] [indexnode/indexnode_service.go:47] ["IndexNode building index ..."] [traceID=7260878a4d08affb] [ClusterID=by-dev] [IndexBuildID=452312785507296356] [IndexID=0] [IndexName=] [IndexFilePrefix=file/index_files] [IndexVersion=63] [DataPaths="[file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785505916532,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506746993,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506755645,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506763943,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506773416,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506782845,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506792154,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506800336,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506809350,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506818222,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506827909,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507036684,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507045187,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507054175,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507062484,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507085485,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507093314,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507149778,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507160220,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507170821,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507180693,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507190123,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507198814,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507207452,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507215854,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507243918,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507251136,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507261733,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507268835,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507279521,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507287881,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507295663]"] [TypeParams="[{\"key\":\"dim\",\"value\":\"512\"}]"] [IndexParams="[{\"key\":\"index_type\",\"value\":\"IVF_SQ8\"},{\"key\":\"metric_type\",\"value\":\"IP\"},{\"key\":\"nlist\",\"value\":\"1024\"}]"] [num_rows=1698] [2024/09/26 02:14:41.422 +00:00] [INFO] [indexnode/indexnode_service.go:112] ["IndexNode successfully scheduled"] [traceID=7260878a4d08affb] [IndexBuildID=452312785507296356] [ClusterID=by-dev] [indexName=] [2024/09/26 02:14:41.422 +00:00] [INFO] [indexnode/task.go:153] ["Begin to prepare indexBuildTask"] [buildID=452312785507296356] [Collection=0] [SegmentIf=0] ["queue duration"=46.498µs] [2024/09/26 02:14:41.422 +00:00] [INFO] [indexnode/task.go:183] ["Successfully prepare indexBuildTask"] [buildID=452312785507296356] [Collection=0] [SegmentIf=0] [2024/09/26 02:14:41.425 +00:00] [INFO] [indexnode/indexnode_service.go:209] ["Get Index Job Stats"] [traceID=b79344c433f628e] [Unissued=0] [Active=1] [Slot=0] [2024/09/26 02:14:41.442 +00:00] [INFO] [indexnode/task.go:317] ["index params are ready"] [buildID=452312785507296356] ["index params"="{\"dim\":\"512\",\"index_type\":\"IVF_SQ8\",\"metric_type\":\"IP\",\"nlist\":\"1024\"}"] 2024-09-26 02:14:41,442 | INFO | default | [SEGCORE][InitSDKAPI][milvus] init aws with log level:error 2024-09-26 02:14:41,442 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 'milvus-cluster-minio:9000', default_bucket_name:'milvus-bucket', use_secure:'false'] [2024/09/26 02:14:41.532 +00:00] [INFO] [indexnode/indexnode_service.go:209] ["Get Index Job Stats"] [traceID=a33b9e04e855476] [Unissued=0] [Active=1] [Slot=0] 2024-09-26 02:14:42,139 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2024-09-26 02:14:42.139 AWSClient [140500363163392] HTTP response code: 404 Resolved remote host IP address: 10.43.209.195 Request ID: Exception name: Error message: No response body. 11 response headers: accept-ranges : bytes content-length : 0 content-security-policy : block-all-mixed-content date : Thu, 26 Sep 2024 02:14:42 GMT server : MinIO strict-transport-security : max-age=31536000; includeSubDomains vary : Accept-Encoding x-amz-id-2 : e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 x-amz-request-id : 17F8A98D7648463F x-content-type-options : nosniff x-xss-protection : 1; mode=block

Anything else?

Initially, the cluster datacoord OomKill,At the same time, the index nodealso reported an error. Increase the resources of datacoord after restarting the pod indexnode error log ,now cluster can not use, help me please

fjfzyxy commented 2 weeks ago

I guess it's because the data in Minio and IndexNode is inconsistent, but I don't know what caused it. When Indexndoe started, I went to Minio to request a certain data block, but I found that there was no error. I exited with a 404 exception

xiaocai2333 commented 2 weeks ago

looks like some binlogs are no longer exist.

[2024/09/26 02:14:41.343 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize[errcode:404, exception:, errmessage:No response body.]"] 
fjfzyxy commented 2 weeks ago

looks like some binlogs are no longer exist.

[2024/09/26 02:14:41.343 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize[errcode:404, exception:, errmessage:No response body.]"] 

so what can i do to restore it

xiaocai2333 commented 2 weeks ago

Can you provide more logs with ID 452312785507291952 ? @fjfzyxy Are there any logs before the OOM?

fjfzyxy commented 2 weeks ago

Can you provide more logs with ID 452312785507291952 ? @fjfzyxy Are there any logs before the OOM?

OOM kill is datacoord, i ues kubectl describe pod datacoord status See oomkill,Increase resources After recovery

xiaocai2333 commented 2 weeks ago

looks like some binlogs are no longer exist.

[2024/09/26 02:14:41.343 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize[errcode:404, exception:, errmessage:No response body.]"] 

so what can i do to restore it

I guess it was GCed, can you load the collection? If can load, please retry drop index and recreate index.

fjfzyxy commented 2 weeks ago

looks like some binlogs are no longer exist.

[2024/09/26 02:14:41.343 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize[errcode:404, exception:, errmessage:No response body.]"] 

so what can i do to restore it

I guess it was GCed, can you load the collection? If can load, please retry drop index and recreate index.

The data in the collection is very important, there are tens of millions, only 3 collections can be loaded, and the amount of data is very small, only a few hundred

xiaocai2333 commented 2 weeks ago

Can you provide more logs with ID 452312785507291952 ? @fjfzyxy Are there any logs before the OOM?

OOM kill is datacoord, i ues kubectl describe pod datacoord status See oomkill,Increase resources After recovery

ok, can you find the log with segmentID 452312785505909898? maybe can tell us why the binlogs was GCed.

fjfzyxy commented 2 weeks ago

ok, can you find the log with segmentID 452312785505909898? maybe can tell us why the binlogs was GCed.

how to find the log with segmentID 452312785505909898?

xiaocai2333 commented 2 weeks ago

ok, can you find the log with segmentID 452312785505909898? maybe can tell us why the binlogs was GCed.

how to find the log with segmentID 452312785505909898?

You can package the datacoord logs and upload them here.

xiaocai2333 commented 2 weeks ago

Can you see what files exist in this segment in minio? @fjfzyxy the prefix is file/insert_log/448639064597621869/448639064597622737/452312785505909898/101.

fjfzyxy commented 2 weeks ago

Can you see what files exist in this segment in minio? @fjfzyxy the prefix is file/insert_log/448639064597621869/448639064597622737/452312785505909898/101.

i see it yesterday ,Some have, some don't

fjfzyxy commented 2 weeks ago

ok, can you find the log with segmentID 452312785505909898? maybe can tell us why the binlogs was GCed.

how to find the log with segmentID 452312785505909898?

You can package the datacoord logs and upload them here.

Can you see what files exist in this segment in minio? @fjfzyxy the prefix is file/insert_log/448639064597621869/448639064597622737/452312785505909898/101.

the pack to big ,can u leave vx

xiaocai2333 commented 2 weeks ago

ok, can you find the log with segmentID 452312785505909898? maybe can tell us why the binlogs was GCed.

how to find the log with segmentID 452312785505909898?

You can package the datacoord logs and upload them here.

Can you see what files exist in this segment in minio? @fjfzyxy the prefix is file/insert_log/448639064597621869/448639064597622737/452312785505909898/101.

the pack to big ,can u leave vx

you can send to my email: cai.zhang@zilliz.com

fjfzyxy commented 2 weeks ago

ok, can you find the log with segmentID 452312785505909898? maybe can tell us why the binlogs was GCed.

how to find the log with segmentID 452312785505909898?

You can package the datacoord logs and upload them here.

Can you see what files exist in this segment in minio? @fjfzyxy the prefix is file/insert_log/448639064597621869/448639064597622737/452312785505909898/101.

the pack to big ,can u leave vx

you can send to my email: cai.zhang@zilliz.com

send to u

yanliang567 commented 2 weeks ago

/assign @xiaocai2333 /unassign

xiaofan-luan commented 2 weeks ago

Is there an existing issue for this?

  • [x] I have searched the existing issues

Environment

- Milvus version: 2.2.14
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

1

Expected Behavior

Initially, the cluster datacoord OomKill,At the same time, the index nodealso reported an error. Increase the resources of datacoord after restarting the pod indexnode error log ,now cluster can not use, help me please

Steps To Reproduce

Initially, the  cluster datacoord OomKill,At the same time, the index nodealso reported an error.
Increase the resources of datacoord after restarting the pod
indexnode error log ,now cluster can not use, help me please

Milvus Log

[2024/09/26 02:14:41.343 +00:00] [WARN] [indexcgowrapper/helper.go:76] ["failed to create index, C Runtime Exception: [UnexpectedError] Error:GetObjectSize[errcode:404, exception:, errmessage:No response body.]\n"] [2024/09/26 02:14:41.343 +00:00] [ERROR] [indexnode/task.go:340] ["failed to build index"] [error="[UnexpectedError] Error:GetObjectSize[errcode:404, exception:, errmessage:No response body.]"] [stack="github.com/milvus-io/milvus/internal/indexnode.(indexBuildTask).BuildIndex\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task.go:340\ngithub.com/milvus-io/milvus/internal/indexnode.(TaskScheduler).processTask.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:207\ngithub.com/milvus-io/milvus/internal/indexnode.(TaskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:220\ngithub.com/milvus-io/milvus/internal/indexnode.(TaskScheduler).indexBuildLoop.func1\n\t/go/src/github.com/milvus-io/milvus/internal/indexnode/task_scheduler.go:253"] [2024/09/26 02:14:41.343 +00:00] [INFO] [indexnode/taskinfo_ops.go:42] ["IndexNode store task state"] [clusterID=by-dev] [buildID=452312785507291952] [state=Retry] ["fail reason"="[UnexpectedError] Error:GetObjectSize[errcode:404, exception:, errmessage:No response body.]"] [2024/09/26 02:14:41.358 +00:00] [INFO] [indexnode/indexnode_service.go:164] ["drop index build jobs"] [traceID=23d0249c40c4d2b9] [ClusterID=by-dev] [IndexBuildIDs="[452312785507292141]"] [2024/09/26 02:14:41.358 +00:00] [INFO] [indexnode/indexnode_service.go:182] ["drop index build jobs success"] [traceID=23d0249c40c4d2b9] [ClusterID=by-dev] [IndexBuildIDs="[452312785507292141]"] [2024/09/26 02:14:41.395 +00:00] [INFO] [indexnode/indexnode_service.go:164] ["drop index build jobs"] [traceID=7b913cbc2dd83e03] [ClusterID=by-dev] [IndexBuildIDs="[452312785507292208]"] [2024/09/26 02:14:41.395 +00:00] [INFO] [indexnode/indexnode_service.go:182] ["drop index build jobs success"] [traceID=7b913cbc2dd83e03] [ClusterID=by-dev] [IndexBuildIDs="[452312785507292208]"] [2024/09/26 02:14:41.411 +00:00] [INFO] [indexnode/indexnode_service.go:164] ["drop index build jobs"] [traceID=49396fa0dae7533] [ClusterID=by-dev] [IndexBuildIDs="[452312785507292255]"] [2024/09/26 02:14:41.411 +00:00] [INFO] [indexnode/indexnode_service.go:182] ["drop index build jobs success"] [traceID=49396fa0dae7533] [ClusterID=by-dev] [IndexBuildIDs="[452312785507292255]"] [2024/09/26 02:14:41.416 +00:00] [INFO] [indexnode/indexnode_service.go:209] ["Get Index Job Stats"] [traceID=7cf6cccaf9c656aa] [Unissued=0] [Active=0] [Slot=1] [2024/09/26 02:14:41.422 +00:00] [INFO] [indexnode/indexnode_service.go:47] ["IndexNode building index ..."] [traceID=7260878a4d08affb] [ClusterID=by-dev] [IndexBuildID=452312785507296356] [IndexID=0] [IndexName=] [IndexFilePrefix=file/index_files] [IndexVersion=63] [DataPaths="[file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785505916532,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506746993,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506755645,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506763943,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506773416,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506782845,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506792154,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506800336,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506809350,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506818222,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785506827909,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507036684,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507045187,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507054175,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507062484,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507085485,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507093314,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507149778,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507160220,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507170821,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507180693,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507190123,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507198814,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507207452,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507215854,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507243918,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507251136,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507261733,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507268835,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507279521,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507287881,file/insert_log/448639064597621869/448639064597622737/452312785505909898/101/452312785507295663]"] [TypeParams="[{"key":"dim","value":"512"}]"] [IndexParams="[{"key":"index_type","value":"IVF_SQ8"},{"key":"metric_type","value":"IP"},{"key":"nlist","value":"1024"}]"] [num_rows=1698] [2024/09/26 02:14:41.422 +00:00] [INFO] [indexnode/indexnode_service.go:112] ["IndexNode successfully scheduled"] [traceID=7260878a4d08affb] [IndexBuildID=452312785507296356] [ClusterID=by-dev] [indexName=] [2024/09/26 02:14:41.422 +00:00] [INFO] [indexnode/task.go:153] ["Begin to prepare indexBuildTask"] [buildID=452312785507296356] [Collection=0] [SegmentIf=0] ["queue duration"=46.498µs] [2024/09/26 02:14:41.422 +00:00] [INFO] [indexnode/task.go:183] ["Successfully prepare indexBuildTask"] [buildID=452312785507296356] [Collection=0] [SegmentIf=0] [2024/09/26 02:14:41.425 +00:00] [INFO] [indexnode/indexnode_service.go:209] ["Get Index Job Stats"] [traceID=b79344c433f628e] [Unissued=0] [Active=1] [Slot=0] [2024/09/26 02:14:41.442 +00:00] [INFO] [indexnode/task.go:317] ["index params are ready"] [buildID=452312785507296356] ["index params"="{"dim":"512","index_type":"IVF_SQ8","metric_type":"IP","nlist":"1024"}"] 2024-09-26 02:14:41,442 | INFO | default | [SEGCORE][InitSDKAPI][milvus] init aws with log level:error 2024-09-26 02:14:41,442 | INFO | default | [SEGCORE][N6milvus7storage17MinioChunkManagerE::MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: 'milvus-cluster-minio:9000', default_bucket_name:'milvus-bucket', use_secure:'false'] [2024/09/26 02:14:41.532 +00:00] [INFO] [indexnode/indexnode_service.go:209] ["Get Index Job Stats"] [traceID=a33b9e04e855476] [Unissued=0] [Active=1] [Slot=0] 2024-09-26 02:14:42,139 | INFO | default | [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2024-09-26 02:14:42.139 AWSClient [140500363163392] HTTP response code: 404 Resolved remote host IP address: 10.43.209.195 Request ID: Exception name: Error message: No response body. 11 response headers: accept-ranges : bytes content-length : 0 content-security-policy : block-all-mixed-content date : Thu, 26 Sep 2024 02:14:42 GMT server : MinIO strict-transport-security : max-age=31536000; includeSubDomains vary : Accept-Encoding x-amz-id-2 : e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 x-amz-request-id : 17F8A98D7648463F x-content-type-options : nosniff x-xss-protection : 1; mode=block

Anything else?

Initially, the cluster datacoord OomKill,At the same time, the index nodealso reported an error. Increase the resources of datacoord after restarting the pod indexnode error log ,now cluster can not use, help me please

try to upgrade to latest 2.3 version. I guess there are some S3 access issue on early version of milvus.

xiaofan-luan commented 2 weeks ago

upgrade to latest 2.3 or 2.4 should automatically solve the problem