milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.09k stars 2.95k forks source link

[Bug]: Error: Milvus is not ready yet. after upgrade from 2.4.0 to 2.4.10, standalone #36337

Open chenhanneu opened 2 months ago

chenhanneu commented 2 months ago

Is there an existing issue for this?

Environment

- Milvus version:v2.4.0->v2.4.10
- Deployment mode(standalone or cluster):standalone 
- MQ type(rocksmq, pulsar or kafka): default
- SDK version(e.g. pymilvus v2.0.0rc2): default
- OS(Ubuntu or CentOS): centos
- CPU/Memory: 
- GPU: 
- Others: attu:2.3.10

Current Behavior

standalone: container_name: milvus-standalone image: milvusdb/milvus:v2.4.10

docker compose down docker compose up -d

attu report a error: Error: Milvus is not ready yet.

Expected Behavior

No response

Steps To Reproduce

cat milvus.yaml
common:
  ...
  security:
    authorizationEnabled: true
    # The superusers will ignore some system check processes,
    # like the old password verification when updating the credential
    superUsers:
    tlsMode: 0
  ...

Milvus Log

[2024/09/18 08:41:50.985 +00:00] [WARN] [rootcoord/list_db_task.go:56] ["get current user from context failed"] [error="fail to get authorization from the md, authorization:[token]"]
[2024/09/18 08:41:50.994 +00:00] [WARN] [proxy/impl.go:5461] ["check health fail"] [traceID=5e7ed504a1e1157484d3dac4b95f68e4] [role=datacoord]
[2024/09/18 08:41:50.997 +00:00] [WARN] [proxy/impl.go:5461] ["check health fail"] [traceID=5e7ed504a1e1157484d3dac4b95f68e4] [role=rootcoord]

Anything else?

No response

yanliang567 commented 2 months ago

@chenhanneu please attach the milvus log file for investigation. For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.

/assign @chenhanneu /unassign

chenhanneu commented 2 months ago
milvus-standalone  | I20240918 08:38:22.050482    38 SegmentSealedImpl.cpp:253] [SERVER][LoadFieldData][milvus] segment 450427173232733160 loads field 0 with num_rows 1041
milvus-standalone  | I20240918 08:38:22.050511    38 SegmentSealedImpl.cpp:266] [SERVER][LoadFieldData][milvus] segment 450427173232733160 submits load field 0 task to thread pool
milvus-standalone  | I20240918 08:38:22.052690   100 SegmentSealedImpl.cpp:275] [SERVER][LoadFieldData][milvus] segment 450427173232733160 loads field 102 done
milvus-standalone  | I20240918 08:38:22.053453    37 SegmentSealedImpl.cpp:275] [SERVER][LoadFieldData][milvus] segment 450427173232733160 loads field 1 done
milvus-standalone  | I20240918 08:38:22.053460    39 SegmentSealedImpl.cpp:275] [SERVER][LoadFieldData][milvus] segment 450427173232733160 loads field 100 done
milvus-standalone  | I20240918 08:38:22.053606    38 SegmentSealedImpl.cpp:275] [SERVER][LoadFieldData][milvus] segment 450427173232733160 loads field 0 done
milvus-standalone  | I20240918 08:38:22.053934   164 SegmentSealedImpl.cpp:275] [SERVER][LoadFieldData][milvus] segment 450427173232733160 loads field 101 done
milvus-standalone  | [2024/09/18 08:38:22.193 +00:00] [WARN] [grpcclient/client.go:440] ["failed to verify node session"] [error="session not found->proxy-10: node not found[node=10]"] [errorVerbose="session not found->proxy-10: node not found[node=10]\n(1) attached stack trace\n  -- stack trace:\n  | github.com/milvus-io/milvus/pkg/util/merr.WrapErrNodeNotFound\n  | \t/workspace/source/pkg/util/merr/utils.go:811\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).verifySession\n  | \t/workspace/source/internal/util/grpcclient/client.go:384\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).checkNodeSessionExist\n  | \t/workspace/source/internal/util/grpcclient/client.go:438\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).call.func2\n  | \t/workspace/source/internal/util/grpcclient/client.go:472\n  | github.com/milvus-io/milvus/pkg/util/retry.Handle\n  | \t/workspace/source/pkg/util/retry/retry.go:104\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).call\n  | \t/workspace/source/internal/util/grpcclient/client.go:470\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n  | \t/workspace/source/internal/util/grpcclient/client.go:557\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n  | \t/workspace/source/internal/util/grpcclient/client.go:573\n  | github.com/milvus-io/milvus/internal/distributed/proxy/client.wrapGrpcCall[...]\n  | \t/workspace/source/internal/distributed/proxy/client/client.go:89\n  | github.com/milvus-io/milvus/internal/distributed/proxy/client.(*Client).RefreshPolicyInfoCache\n  | \t/workspace/source/internal/distributed/proxy/client/client.go:155\n  | github.com/milvus-io/milvus/internal/util/proxyutil.(*ProxyClientManager).RefreshPolicyInfoCache.func1.1\n  | \t/workspace/source/internal/util/proxyutil/proxy_client_manager.go:271\n  | golang.org/x/sync/errgroup.(*Group).Go.func1\n  | \t/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75\n  | runtime.goexit\n  | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (2) session not found->proxy-10\nWraps: (3) node not found[node=10]\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) merr.milvusError"]
milvus-standalone  | [2024/09/18 08:38:22.193 +00:00] [WARN] [retry/retry.go:106] ["retry func failed"] [retried=8] [error="node not found"]
milvus-standalone  | [2024/09/18 08:38:22.193 +00:00] [WARN] [grpcclient/client.go:560] ["ClientBase Call grpc call get error"] [role=proxy-10] [address=] [error="stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace\n/workspace/source/internal/util/grpcclient/client.go:559 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n/workspace/source/internal/util/grpcclient/client.go:573 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/workspace/source/internal/distributed/proxy/client/client.go:89 github.com/milvus-io/milvus/internal/distributed/proxy/client.wrapGrpcCall[...]\n/workspace/source/internal/distributed/proxy/client/client.go:155 github.com/milvus-io/milvus/internal/distributed/proxy/client.(*Client).RefreshPolicyInfoCache\n/workspace/source/internal/util/proxyutil/proxy_client_manager.go:271 github.com/milvus-io/milvus/internal/util/proxyutil.(*ProxyClientManager).RefreshPolicyInfoCache.func1.1\n/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75 golang.org/x/sync/errgroup.(*Group).Go.func1\n/usr/local/go/src/runtime/asm_amd64.s:1650 runtime.goexit: node not found"] [errorVerbose="stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace: node not found\n(1) attached stack trace\n  -- stack trace:\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n  | \t/workspace/source/internal/util/grpcclient/client.go:559\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n  | \t/workspace/source/internal/util/grpcclient/client.go:573\n  | github.com/milvus-io/milvus/internal/distributed/proxy/client.wrapGrpcCall[...]\n  | \t/workspace/source/internal/distributed/proxy/client/client.go:89\n  | github.com/milvus-io/milvus/internal/distributed/proxy/client.(*Client).RefreshPolicyInfoCache\n  | \t/workspace/source/internal/distributed/proxy/client/client.go:155\n  | github.com/milvus-io/milvus/internal/util/proxyutil.(*ProxyClientManager).RefreshPolicyInfoCache.func1.1\n  | \t/workspace/source/internal/util/proxyutil/proxy_client_manager.go:271\n  | golang.org/x/sync/errgroup.(*Group).Go.func1\n  | \t/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75\n  | runtime.goexit\n  | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (2) stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace\n  | /workspace/source/internal/util/grpcclient/client.go:559 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n  | /workspace/source/internal/util/grpcclient/client.go:573 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n  | /workspace/source/internal/distributed/proxy/client/client.go:89 github.com/milvus-io/milvus/internal/distributed/proxy/client.wrapGrpcCall[...]\n  | /workspace/source/internal/distributed/proxy/client/client.go:155 github.com/milvus-io/milvus/internal/distributed/proxy/client.(*Client).RefreshPolicyInfoCache\n  | /workspace/source/internal/util/proxyutil/proxy_client_manager.go:271 github.com/milvus-io/milvus/internal/util/proxyutil.(*ProxyClientManager).RefreshPolicyInfoCache.func1.1\n  | /go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75 golang.org/x/sync/errgroup.(*Group).Go.func1\n  | /usr/local/go/src/runtime/asm_amd64.s:1650 runtime.goexit\nWraps: (3) node not found\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) merr.milvusError"]
milvus-standalone  | [2024/09/18 08:38:22.193 +00:00] [WARN] [retry/retry.go:46] ["retry func failed"] [retried=0] [error="RefreshPolicyInfoCache failed, proxyID = 10, err = stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace\n/workspace/source/internal/util/grpcclient/client.go:559 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n/workspace/source/internal/util/grpcclient/client.go:573 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/workspace/source/internal/distributed/proxy/client/client.go:89 github.com/milvus-io/milvus/internal/distributed/proxy/client.wrapGrpcCall[...]\n/workspace/source/internal/distributed/proxy/client/client.go:155 github.com/milvus-io/milvus/internal/distributed/proxy/client.(*Client).RefreshPolicyInfoCache\n/workspace/source/internal/util/proxyutil/proxy_client_manager.go:271 
github.com/milvus-io/milvus/internal/util/proxyutil.(*ProxyClientManager).RefreshPolicyInfoCache.func1.1\n/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75 golang.org/x/sync/errgroup.(*Group).Go.func1\n/usr/local/go/src/runtime/asm_amd64.s:1650 runtime.goexit: node not found"]
milvus-standalone  | [2024/09/18 08:40:19.759 +00:00] [WARN] [flowgraph/node.go:82] ["some node(s) haven't received input"] [list="[nodeCtxTtChecker-ddNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-dmInputNode-data-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ttNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-writeNode-by-dev-_7_450427173245144560v1]"] ["duration "=2m0s]
milvus-standalone  | [2024/09/18 08:41:09.812 +00:00] [WARN] [rootcoord/list_db_task.go:56] ["get current user from context failed"] [error="fail to get authorization from the md, authorization:[token]"]
milvus-standalone  | [2024/09/18 08:41:09.816 +00:00] [WARN] [proxy/impl.go:5461] ["check health fail"] [traceID=8d071c64943ccd105cce36417e4fcd3f] [role=datacoord]
milvus-standalone  | [2024/09/18 08:41:09.820 +00:00] [WARN] [proxy/impl.go:5461] ["check health fail"] [traceID=8d071c64943ccd105cce36417e4fcd3f] [role=rootcoord]
milvus-standalone  | [2024/09/18 08:41:50.985 +00:00] [WARN] [rootcoord/list_db_task.go:56] ["get current user from context failed"] [error="fail to get authorization from the md, authorization:[token]"]
milvus-standalone  | [2024/09/18 08:41:50.994 +00:00] [WARN] [proxy/impl.go:5461] ["check health fail"] [traceID=5e7ed504a1e1157484d3dac4b95f68e4] [role=datacoord]
milvus-standalone  | [2024/09/18 08:41:50.997 +00:00] [WARN] [proxy/impl.go:5461] ["check health fail"] [traceID=5e7ed504a1e1157484d3dac4b95f68e4] [role=rootcoord]
milvus-standalone  | [2024/09/18 08:42:19.759 +00:00] [WARN] [flowgraph/node.go:82] ["some node(s) haven't received input"] [list="[nodeCtxTtChecker-InsertNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-FilterNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-DeleteNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-ttNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-ttNode-by-dev-_0_450427173248748199v1,nodeCtxTtChecker-ddNode-by-dev-_15_450427173248748199v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-writeNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ttNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-writeNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-DeleteNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-InsertNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-FilterNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-writeNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ttNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ddNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-ttNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-DeleteNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-DeleteNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ddNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ttNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-ddNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-FilterNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-DeleteNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_0_450427173248748199v1,nodeCtxTtChecker-ddNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-InsertNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-ddNode-by-dev-_0_450427173248748199v1,nodeCtxTtChecker-dmInputNode-data-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-InsertNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-ddNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-FilterNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-DeleteNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-FilterNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-FilterNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-writeNode-by-dev-_0_450427173248748199v1,nodeCtxTtChecker-InsertNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-InsertNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-writeNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-writeNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-writeNode-by-dev-_15_450427173248748199v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-DeleteNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-ttNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-writeNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-ttNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-ddNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-ddNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-dmInputNode-data-by-dev-_15_450427173248748199v0,nodeCtxTtChecker-writeNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-ttNode-by-dev-_15_450427173248748199v0,nodeCtxTtChecker-FilterNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-InsertNode-by-dev-_2_450427173254265396v0]"] ["duration "=2m0s]
...........
[2024/09/18 09:10:19.759 +00:00] [WARN] [flowgraph/node.go:82] ["some node(s) haven't received input"] [list="[nodeCtxTtChecker-writeNode-by-dev-_0_450427173248748199v1,nodeCtxTtChecker-InsertNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-InsertNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-writeNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-writeNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-writeNode-by-dev-_15_450427173248748199v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-ttNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-DeleteNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-ttNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-writeNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-ddNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-ddNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-dmInputNode-data-by-dev-_15_450427173248748199v0,nodeCtxTtChecker-writeNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-ttNode-by-dev-_15_450427173248748199v0,nodeCtxTtChecker-FilterNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-InsertNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-InsertNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-FilterNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-DeleteNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-ttNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-ttNode-by-dev-_0_450427173248748199v1,nodeCtxTtChecker-ddNode-by-dev-_15_450427173248748199v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-writeNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-InsertNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-dmInputNode-data-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ttNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-writeNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-DeleteNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-DeleteNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-FilterNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-writeNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ttNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ddNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-ttNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-DeleteNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-DeleteNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ddNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ttNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-ddNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-FilterNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_0_450427173248748199v1,nodeCtxTtChecker-ddNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-InsertNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-ddNode-by-dev-_0_450427173248748199v1,nodeCtxTtChecker-dmInputNode-data-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-InsertNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-ddNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-FilterNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-DeleteNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-FilterNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-FilterNode-by-dev-_7_450427173245144560v1]"] ["duration "=2m0s]
[2024/09/18 09:12:19.759 +00:00] [WARN] [flowgraph/node.go:82] ["some node(s) haven't received input"] [list="[nodeCtxTtChecker-ddNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-FilterNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-DeleteNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-FilterNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-FilterNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-writeNode-by-dev-_0_450427173248748199v1,nodeCtxTtChecker-InsertNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-InsertNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-writeNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-writeNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-writeNode-by-dev-_15_450427173248748199v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-DeleteNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-ttNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-writeNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-ttNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-ddNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-ddNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-dmInputNode-data-by-dev-_15_450427173248748199v0,nodeCtxTtChecker-writeNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-ttNode-by-dev-_15_450427173248748199v0,nodeCtxTtChecker-FilterNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-InsertNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-InsertNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-FilterNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-DeleteNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-ttNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-ttNode-by-dev-_0_450427173248748199v1,nodeCtxTtChecker-ddNode-by-dev-_15_450427173248748199v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-writeNode-by-dev-_3_450427173254349246v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ttNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-writeNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-DeleteNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-InsertNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-FilterNode-by-dev-_4_450427173254349246v1,nodeCtxTtChecker-writeNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ttNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ddNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-ttNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-DeleteNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-DeleteNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ddNode-by-dev-_7_450427173245144560v1,nodeCtxTtChecker-ttNode-by-dev-_2_450427173254265396v0,nodeCtxTtChecker-ddNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-FilterNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-DeleteNode-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-dmInputNode-data-by-dev-_0_450427173248748199v1,nodeCtxTtChecker-ddNode-by-dev-_6_450427173245144560v0,nodeCtxTtChecker-InsertNode-by-dev-_8_450427173247792906v0,nodeCtxTtChecker-ddNode-by-dev-_0_450427173248748199v1,nodeCtxTtChecker-dmInputNode-data-by-dev-_1_450427173252008263v0,nodeCtxTtChecker-InsertNode-by-dev-_3_450427173254349246v0]"] ["duration "=2m0s]
yanliang567 commented 2 months ago

@chenhanneu the attached logs are all warning, which does not matter to milvus healthy. please attach the completed milvus log files so that we could know what was happening.

chenhanneu commented 2 months ago

milvus.log

yanliang567 commented 2 months ago

The milvus server looks healthy. could you please try to connect to it via sdk, such as pymilvus?

milvus-standalone  | ---Milvus Proxy successfully initialized and ready to serve!---
milvus-standalone  | [2024/09/18 09:54:12.310 +00:00] [INFO] [proxy/service.go:461] ["init Proxy server done"]
yanliang567 commented 2 months ago

If you can connect to it via sdk, it could be an Attu issue...

hieunc229 commented 2 months ago

I'm having same issue with fresh install on EC2 Ubuntu and my Macbook Pro, tried with Docker, Docker Compose, v2.4.11, v2.4.10 https://discord.com/channels/1160323594396635310/1285628350529798305

Looks like I cannot connect to the grpc port 19530 from outside of the milvus-standalone container

yanliang567 commented 2 months ago

did you forward the 19530 port to local, or please try to telnet the milvus address to make sure the network is healthy.

hieunc229 commented 2 months ago

Yes, when trying with curl localhost: 19530, it shows 404 page (which I guess the port was forward correctly)

shanghaikid commented 2 months ago

attu will call CheckHealth on connecting.

// Check the health of the Milvus server
const res = yield milvusClient.checkHealth();
// If the server is not healthy, throw an error
if (!res.isHealthy) {
    throw new Error('Milvus is not ready yet.');
}
chenhanneu commented 2 months ago

via pymilvus can connected

chenhanneu commented 2 months ago

but on k8s ,i upgrade from 2.4.0 to 2.4.11. attu connect is ok. docker compose upgrade milvus can't connect.

hieunc229 commented 2 months ago

Also tried with Rest API, it works with v1, but not v2

RakeshRaj97 commented 2 months ago

Facing similar issue after upgrading Milvus. I can connect using API and Attu fails to connect.

yesyue commented 2 months ago

i have the same problom attu2.4.7 milvus2.4.10

yanliang567 commented 2 months ago

please attache the completed milvus log file to check if the milvus server is healthy

SuNDeeP-FW commented 2 months ago

Even we are facing same issue. We see this error "milvusdb-attu-76cf6cfdd7-b7zvw POST /api/v1/milvus/connect 500 Error: Milvus is not ready yet." attu2.4.7 milvus2.4.11

But it works with PyMilvus and Milvus-cli as well. Any suggestions to solve will be really helpful.

yanliang567 commented 2 months ago

sounds like milvus health check is not working as expected. could you guys help to attach the milvus logs for investigation? please refer this doc to export the whole Milvus logs if installed with k8s. For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.

jubingc commented 1 month ago

I was redirected from https://github.com/milvus-io/milvus/discussions/36669

The issue in attu seems to be intermittent. There are some health checks and query node requests that fail.

milvus-logs.tar.gz

yanliang567 commented 1 month ago

/assign @congqixia please help to take a look at the latest logs/

yhmo commented 1 month ago
  1. "Fail to watch" log in datanode:

    {"level":"INFO","time":"2024/10/08 16:44:23.346 +00:00","caller":"datanode/channel_manager.go:383","message":"Stop timer for ToWatch operation timeout","channel":"by-dev-rootcoord-dml_0_452122179384142244v0","opID":453090797513616090,"timeout":"5m0s"}
    {"level":"INFO","time":"2024/10/08 16:44:23.346 +00:00","caller":"datanode/channel_manager.go:204","message":"Fail to watch","opID":453090797513616090,"channel":"by-dev-rootcoord-dml_0_452122179384142244v0","State":"WatchFailure"}
    {"level":"INFO","time":"2024/10/08 16:44:23.349 +00:00","caller":"datanode/channel_manager.go:383","message":"Stop timer for ToWatch operation timeout","channel":"by-dev-rootcoord-dml_15_452122179384142208v0","opID":453090797513616093,"timeout":"5m0s"}
    {"level":"INFO","time":"2024/10/08 16:44:23.349 +00:00","caller":"datanode/channel_manager.go:204","message":"Fail to watch","opID":453090797513616093,"channel":"by-dev-rootcoord-dml_15_452122179384142208v0","State":"WatchFailure"}
    {"level":"INFO","time":"2024/10/08 16:44:23.360 +00:00","caller":"datanode/channel_manager.go:383","message":"Stop timer for ToWatch operation timeout","channel":"by-dev-rootcoord-dml_11_452122179384543655v0","opID":453090797513616092,"timeout":"5m0s"}
    {"level":"INFO","time":"2024/10/08 16:44:23.360 +00:00","caller":"datanode/channel_manager.go:204","message":"Fail to watch","opID":453090797513616092,"channel":"by-dev-rootcoord-dml_11_452122179384543655v0","State":"WatchFailure"}
    {"level":"INFO","time":"2024/10/08 16:44:23.363 +00:00","caller":"datanode/channel_manager.go:383","message":"Stop timer for ToWatch operation timeout","channel":"by-dev-rootcoord-dml_11_452122179383941939v0","opID":453090797513616091,"timeout":"5m0s"}
    {"level":"INFO","time":"2024/10/08 16:44:23.363 +00:00","caller":"datanode/channel_manager.go:204","message":"Fail to watch","opID":453090797513616091,"channel":"by-dev-rootcoord-dml_11_452122179383941939v0","State":"WatchFailure"}
  2. Warn log in datacoord: {"level":"WARN","time":"2024/10/08 16:45:42.511 +00:00","caller":"datacoord/session_manager.go:245","message":"failed to sync segments","nodeID":2360,"planID":0,"error":"channel not found[channel=by-dev-rootcoord-dml_0_452122179384142244v0]"}

  3. Check health error in proxy: {"level":"WARN","time":"2024/10/08 16:45:43.673 +00:00","caller":"proxy/impl.go:5340","message":"check health fail","traceID":"e35f0d3b42e612bc02674d9ceb2b0000","role":"datacoord"}

yhmo commented 1 month ago

@jubingc I think maybe your pulsar service doesn't work well. Do you have the log of the pulsar service between 2024/10/08 16:45:30~2024/10/08 16:45:43?

jubingc commented 3 weeks ago

Hi @yhmo Sorry, I missed your previous comment. We just lost the pulsar logs due to 1-month retention policy.

I encountered this issue again. Restarting data node pods fixed the issue.

I do see error logs in the proxy pod

{"level":"WARN","time":"2024/11/07 23:55:26.770 +00:00","caller":"proxy/impl.go:5340","message":"check health fail","traceID":"05a8d7b736c066ab10d69b3df9aba2ed","role":"da
tacoord"}
{"level":"WARN","time":"2024/11/07 23:55:27.995 +00:00","caller":"proxy/impl.go:5340","message":"check health fail","traceID":"097753fe523fd247cff4952832c06183","role":"da
tacoord"}

And error logs in data node pods around the same time

Fail to watch
caller | datanode/channel_manager.go:204
channel | by-dev-rootcoord-dml_8_453509215889864878v0
level | INFO
opID | 453594199961661200
State | WatchFailure
time | 2024/11/07 23:57:02.514 +00:00

No error in mixcoord and shared pulsar pods. Other milvus clusters using the same pulsar cluster seem fine.

What would you suggest for the next step?