milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.08k stars 2.88k forks source link

[Bug]: milvus-standalone service exited #23300

Closed fall-nebula closed 1 year ago

fall-nebula commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version: 2.2.5
- Deployment mode(standalone or cluster): standalone 
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):java 2.2.4
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others: attu:v2.2.3

Current Behavior

Today service milvus-standalone exited from docker compose.

# docker compose ps
NAME                COMMAND                  SERVICE             STATUS              PORTS
milvus-etcd         "etcd -advertise-cli…"   etcd                running             2379-2380/tcp
milvus-minio        "/usr/bin/docker-ent…"   minio               running (healthy)   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp
milvus-standalone   "/tini -- milvus run…"   standalone          exited (0)

7 hours before the program went down, milvus was searched and queried using the Java SDK, using examples from the official documentation. 6 hours before the program went down, I installed Attu 2.2.3 and tried out its load/release collections and search features. Then milvus and Attu were left untouched for about 6 hours. (The last Attu's last Http log is at Fri, 07 Apr 2023 03:47:01 GMT) Milvus' last log was 2023/04/07 09:08:27.649 +00:00

In Milvus' log there are many ERROR log

[ERROR] [grpcclient/client.go:158] ["failed to get client address"] [error="find no available rootcoord, check rootcoord state"]
[ERROR] [grpcclient/client.go:158] ["failed to get client address"] [error="find no available rootcoord, check rootcoord state"]
[ERROR] [config/etcd_source.go:151] ["can not pull configs"] [error="context deadline exceeded"]
[2023/04/06 13:56:57.769 +00:00] [ERROR] [config/etcd_source.go:151] ["can not pull configs"] [error="rpc error: code = Unknown desc = context deadline exceeded"]

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

The full log file can be downloaded here(37.8MB). The end of Milvus logs:

[2023/04/07 09:08:27.125 +00:00] [INFO] [msgstream/mq_msgstream.go:182] ["start to close mq msg stream"] ["producer num"=0] ["consumer num"=1]
[2023/04/07 09:08:27.125 +00:00] [INFO] [server/rocksmq_impl.go:552] ["Rocksmq destroy consumer group successfully "] [topic=by-dev-rootcoord-delta_0] [group=by-dev-queryNode-440556259765662279-9] [elapsed=0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [client/client_impl.go:141] ["Consumer MsgMutex closed"]
[2023/04/07 09:08:27.125 +00:00] [WARN] [flowgraph/input_node.go:84] ["MsgStream closed"] ["input node"=dmInputNode-query-440556259765662279-by-dev-rootcoord-delta_0_440556259765662279v0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [flowgraph/input_node.go:86] ["trigger force sync"] [collection=440556259765662279] [position="{\"BeginTs\":440626960298934482,\"EndTs\":440626960298934486,\"Msgs\":[],\"StartPositions\":[{\"channel_name\":\"by-dev-rootcoord-delta_0\",\"msgID\":\"p1QFEA9WHQY=\",\"msgGroup\":\"by-dev-queryNode-440556259765662279-9\",\"timestamp\":440626960298934482}],\"EndPositions\":[{\"channel_name\":\"by-dev-rootcoord-delta_0\",\"msgID\":\"slQFEA9WHQY=\",\"msgGroup\":\"by-dev-queryNode-440556259765662279-9\",\"timestamp\":440626960298934486}]}"]
[2023/04/07 09:08:27.125 +00:00] [INFO] [flowgraph/node.go:112] ["flow graph node closed"] [nodeName=dmInputNode-query-440556259765662279-by-dev-rootcoord-delta_0_440556259765662279v0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [flowgraph/node.go:112] ["flow graph node closed"] [nodeName=fdNode-by-dev-rootcoord-delta_0_440556259765662279v0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [flowgraph/node.go:112] ["flow graph node closed"] [nodeName=dNode-by-dev-rootcoord-delta_0_440556259765662279v0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [querynode/flow_graph_service_time_node.go:60] ["service node hit close msg"] [collectionID=440556259765662279] [tSafe=0] [channel=by-dev-rootcoord-delta_0_440556259765662279v0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [flowgraph/node.go:112] ["flow graph node closed"] [nodeName=stNode-by-dev-rootcoord-delta_0_440556259765662279v0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [querynode/partition.go:100] ["remove a segment from replica"] [collectionID=440556259765662279] [partitionID=440556259765662280] [segmentID=440602961792467122] [segmentType=Sealed]
[2023/04/07 09:08:27.125 +00:00] [INFO] [querynode/segment.go:234] ["delete segment from memory"] [collectionID=440556259765662279] [partitionID=440556259765662280] [segmentID=440602961792467122] [segmentType=Sealed]
[2023/04/07 09:08:27.125 +00:00] [INFO] [querynode/meta_replica.go:725] ["segment removed from collection replica"] ["QueryNode ID"=9] ["collection ID"=440556259765662279] ["partition ID"=440556259765662280] ["segment ID"=440602961792467122] ["segment type"=Sealed] ["row count"=1022] ["segment indexed fields"=0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [querynode/partition.go:100] ["remove a segment from replica"] [collectionID=440556259765662279] [partitionID=440556259765662280] [segmentID=440602961792467121] [segmentType=Sealed]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/segment.go:234] ["delete segment from memory"] [collectionID=440556259765662279] [partitionID=440556259765662280] [segmentID=440602961792467121] [segmentType=Sealed]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/meta_replica.go:725] ["segment removed from collection replica"] ["QueryNode ID"=9] ["collection ID"=440556259765662279] ["partition ID"=440556259765662280] ["segment ID"=440602961792467121] ["segment type"=Sealed] ["row count"=976] ["segment indexed fields"=0]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/partition.go:100] ["remove a segment from replica"] [collectionID=440556259765662279] [partitionID=440556259766644697] [segmentID=440602961792467065] [segmentType=Sealed]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/segment.go:234] ["delete segment from memory"] [collectionID=440556259765662279] [partitionID=440556259766644697] [segmentID=440602961792467065] [segmentType=Sealed]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/meta_replica.go:725] ["segment removed from collection replica"] ["QueryNode ID"=9] ["collection ID"=440556259765662279] ["partition ID"=440556259766644697] ["segment ID"=440602961792467065] ["segment type"=Sealed] ["row count"=1] ["segment indexed fields"=0]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/partition.go:100] ["remove a segment from replica"] [collectionID=440556259765662279] [partitionID=440556259766644697] [segmentID=440602961792467067] [segmentType=Sealed]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/segment.go:234] ["delete segment from memory"] [collectionID=440556259765662279] [partitionID=440556259766644697] [segmentID=440602961792467067] [segmentType=Sealed]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/meta_replica.go:725] ["segment removed from collection replica"] ["QueryNode ID"=9] ["collection ID"=440556259765662279] ["partition ID"=440556259766644697] ["segment ID"=440602961792467067] ["segment type"=Sealed] ["row count"=2] ["segment indexed fields"=0]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/collection.go:423] ["delete collection"] [collectionID=440556259765662279]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/shard_cluster_service.go:101] ["start to close shard cluster service"]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/shard_cluster.go:184] ["Close shard cluster"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_0_440556259765662279v0] [replicaID=440602962355027970]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/shard_cluster.go:395] ["Shard Cluster update state"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_0_440556259765662279v0] [replicaID=440602962355027970] ["old state"=1] ["new state"=2] [caller=github.com/milvus-io/milvus/internal/querynode.(*ShardCluster).Close.func1]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_node_detector.go:131] ["Closed NodeDetector watch loop quit"] [collectionID=440556259765662279] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_cluster.go:407] ["ShardCluster node channel closed"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_0_440556259765662279v0] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_segment_detector.go:118] ["Closed SegmentDetector watch loop quit"] [collectionID=440556259765662279] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [INFO] [querynode/shard_cluster.go:184] ["Close shard cluster"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_1_440556259765662279v1] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [INFO] [querynode/shard_cluster.go:395] ["Shard Cluster update state"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_1_440556259765662279v1] [replicaID=440602962355027970] ["old state"=1] ["new state"=2] [caller=github.com/milvus-io/milvus/internal/querynode.(*ShardCluster).Close.func1]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_cluster.go:430] ["ShardCluster segment channel closed"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_0_440556259765662279v0] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_node_detector.go:131] ["Closed NodeDetector watch loop quit"] [collectionID=440556259765662279] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_cluster.go:407] ["ShardCluster node channel closed"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_1_440556259765662279v1] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_segment_detector.go:118] ["Closed SegmentDetector watch loop quit"] [collectionID=440556259765662279] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/query_shard_service.go:150] ["Close query shard service"]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_cluster.go:430] ["ShardCluster segment channel closed"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_1_440556259765662279v1] [replicaID=440602962355027970]
{"level":"warn","ts":"2023-04-07T09:08:27.129Z","logger":"etcd-client","caller":"v3@v3.5.5/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000141500/etcd:2379","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: requested lease not found"}
[2023/04/07 09:08:27.129 +00:00] [INFO] [querynode/service.go:243] ["Graceful stop grpc server..."]
[2023/04/07 09:08:27.129 +00:00] [INFO] [querycoord/service.go:293] ["QueryCoord stop"] [Address=172.23.0.4:19531]
[2023/04/07 09:08:27.247 +00:00] [INFO] [querycoordv2/server.go:422] ["stop cluster..."]
[2023/04/07 09:08:27.247 +00:00] [INFO] [querycoordv2/server.go:427] ["stop dist controller..."]
[2023/04/07 09:08:27.248 +00:00] [INFO] [querycoordv2/server.go:432] ["stop checker controller..."]
[2023/04/07 09:08:27.248 +00:00] [INFO] [querycoordv2/server.go:437] ["stop task scheduler..."]
[2023/04/07 09:08:27.248 +00:00] [INFO] [querycoordv2/server.go:442] ["stop job scheduler..."]
[2023/04/07 09:08:27.248 +00:00] [INFO] [querycoordv2/server.go:446] ["stop observers..."]
[2023/04/07 09:08:27.248 +00:00] [INFO] [querycoordv2/server.go:464] ["QueryCoord stop successfully"]
{"level":"warn","ts":"2023-04-07T09:08:27.247Z","logger":"etcd-client","caller":"v3@v3.5.5/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001e9c000/etcd:2379","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: requested lease not found"}
[2023/04/07 09:08:27.248 +00:00] [INFO] [querycoord/service.go:305] ["Graceful stop grpc server..."]
[2023/04/07 09:08:27.248 +00:00] [INFO] [proxy/service.go:504] ["Proxy stop"] ["internal address"=172.23.0.4:19529] ["external address"=172.23.0.4:19529]
[2023/04/07 09:08:27.248 +00:00] [INFO] [proxy/service.go:522] ["Graceful stop grpc internal server..."]
[2023/04/07 09:08:27.249 +00:00] [INFO] [proxy/service.go:526] ["Graceful stop grpc external server..."]
[2023/04/07 09:08:27.254 +00:00] [INFO] [proxy/proxy.go:385] ["close id allocator"] [role=proxy]
[2023/04/07 09:08:27.254 +00:00] [INFO] [proxy/proxy.go:390] ["close segment id assigner"] [role=proxy]
[2023/04/07 09:08:27.254 +00:00] [INFO] [proxy/proxy.go:395] ["close scheduler"] [role=proxy]
[2023/04/07 09:08:27.254 +00:00] [INFO] [proxy/proxy.go:403] ["close channels time ticker"] [role=proxy]
{"level":"warn","ts":"2023-04-07T09:08:27.460Z","logger":"etcd-client","caller":"v3@v3.5.5/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003addc0/etcd:2379","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: requested lease not found"}
[2023/04/07 09:08:27.460 +00:00] [INFO] [proxy/channels_mgr.go:303] ["all dml stream removed"]
[2023/04/07 09:08:27.460 +00:00] [INFO] [rootcoord/service.go:281] ["Rootcoord stop"] [Address=172.23.0.4:53100]
[2023/04/07 09:08:27.460 +00:00] [INFO] [rootcoord/root_coord.go:175] ["update rootcoord state"] [state=Abnormal]
[2023/04/07 09:08:27.460 +00:00] [INFO] [rootcoord/root_coord.go:689] ["stop rootcoord executor"]
[2023/04/07 09:08:27.460 +00:00] [INFO] [rootcoord/root_coord.go:696] ["stop rootcoord scheduler"]
[2023/04/07 09:08:27.460 +00:00] [INFO] [rootcoord/root_coord.go:703] ["cancel rootcoord goroutines"]
[2023/04/07 09:08:27.463 +00:00] [WARN] [retry/retry.go:44] ["retry func failed"] ["retry time"=0] [error="rpc error: code = Canceled desc = grpc: the client connection is closing"]
[2023/04/07 09:08:27.463 +00:00] [INFO] [grpcclient/client.go:280] ["ClientBase grpc error, start to reset connection"] [role=querycoord] [error="rpc error: code = Canceled desc = grpc: the client connection is closing"]
[2023/04/07 09:08:27.463 +00:00] [WARN] [grpcclient/client.go:318] ["ClientBase ReCall grpc first call get error"] [role=querycoord] [error="err: rpc error: code = Canceled desc = grpc: the client connection is closing\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:317 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:359 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).GetMetrics\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:174 github.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func1\n/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 golang.org/x/sync/errgroup.(*Group).Go.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"]
[2023/04/07 09:08:27.464 +00:00] [WARN] [grpclog/grpclog.go:46] ["[core][Channel #193 SubChannel #194] grpc: addrConn.createTransport failed to connect to {\n  \"Addr\": \"172.23.0.4:19529\",\n  \"ServerName\": \"172.23.0.4:19529\",\n  \"Attributes\": null,\n  \"BalancerAttributes\": null,\n  \"Type\": 0,\n  \"Metadata\": null\n}. Err: connection error: desc = \"transport: Error while dialing dial tcp 172.23.0.4:19529: connect: connection refused\""]
[2023/04/07 09:08:27.464 +00:00] [WARN] [retry/retry.go:44] ["retry func failed"] ["retry time"=0] [error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.23.0.4:19529: connect: connection refused\""]
[2023/04/07 09:08:27.464 +00:00] [INFO] [grpcclient/client.go:280] ["ClientBase grpc error, start to reset connection"] [role=proxy] [error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.23.0.4:19529: connect: connection refused\""]
[2023/04/07 09:08:27.464 +00:00] [WARN] [grpcclient/client.go:318] ["ClientBase ReCall grpc first call get error"] [role=proxy] [error="err: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.23.0.4:19529: connect: connection refused\"\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:317 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/proxy/client/client.go:209 github.com/milvus-io/milvus/internal/distributed/proxy/client.(*Client).GetProxyMetrics\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/proxy_client_manager.go:254 github.com/milvus-io/milvus/internal/rootcoord.(*proxyClientManager).GetProxyMetrics.func1\n/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 golang.org/x/sync/errgroup.(*Group).Go.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"]
[2023/04/07 09:08:27.464 +00:00] [WARN] [grpclog/grpclog.go:46] ["[core][Channel #306 SubChannel #307] grpc: addrConn.createTransport failed to connect to {\n  \"Addr\": \"172.23.0.4:19529\",\n  \"ServerName\": \"172.23.0.4:19529\",\n  \"Attributes\": null,\n  \"BalancerAttributes\": null,\n  \"Type\": 0,\n  \"Metadata\": null\n}. Err: connection error: desc = \"transport: Error while dialing dial tcp 172.23.0.4:19529: connect: connection refused\""]
[2023/04/07 09:08:27.565 +00:00] [WARN] [grpclog/grpclog.go:46] ["[core][Channel #306 SubChannel #307] grpc: addrConn.createTransport failed to connect to {\n  \"Addr\": \"172.23.0.4:19529\",\n  \"ServerName\": \"172.23.0.4:19529\",\n  \"Attributes\": null,\n  \"BalancerAttributes\": null,\n  \"Type\": 0,\n  \"Metadata\": null\n}. Err: connection error: desc = \"transport: Error while dialing dial tcp 172.23.0.4:19529: connect: connection refused\""]
[2023/04/07 09:08:27.642 +00:00] [INFO] [rootcoord/root_coord.go:711] ["revoke rootcoord session"]
[2023/04/07 09:08:27.642 +00:00] [INFO] [rootcoord/service.go:310] ["Rootcoord begin to stop grpc server"]
[2023/04/07 09:08:27.642 +00:00] [INFO] [rootcoord/service.go:313] ["Graceful stop grpc server..."]
{"level":"warn","ts":"2023-04-07T09:08:27.642Z","logger":"etcd-client","caller":"v3@v3.5.5/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003ad880/etcd:2379","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: requested lease not found"}
{"level":"warn","ts":"2023-04-07T09:08:27.642Z","logger":"etcd-client","caller":"v3@v3.5.5/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003ad880/etcd:2379","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
[2023/04/07 09:08:27.642 +00:00] [WARN] [client/client.go:89] ["QueryCoordClient GetSessions failed"] [error="context canceled"]
[2023/04/07 09:08:27.642 +00:00] [ERROR] [grpcclient/client.go:158] ["failed to get client address"] [error="context canceled"] [stack="github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).connect\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:158\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).GetGrpcClient\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:131\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).callOnce\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:256\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:327\ngithub.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).GetMetrics\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:359\ngithub.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func1\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:174\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57"]
[2023/04/07 09:08:27.642 +00:00] [ERROR] [grpcclient/client.go:330] ["ClientBase ReCall grpc second call get error"] [role=querycoord] [error="err: context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:329 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:359 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).GetMetrics\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:174 github.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func1\n/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 golang.org/x/sync/errgroup.(*Group).Go.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"] [stack="github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:330\ngithub.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).GetMetrics\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:359\ngithub.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func1\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:174\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57"]
{"level":"warn","ts":"2023-04-07T09:08:27.642Z","logger":"etcd-client","caller":"v3@v3.5.5/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003ad880/etcd:2379","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
[2023/04/07 09:08:27.642 +00:00] [WARN] [client/client.go:93] ["DataCoordClient, getSessions failed"] [key=datacoord] [error="context canceled"]
[2023/04/07 09:08:27.642 +00:00] [ERROR] [grpcclient/client.go:158] ["failed to get client address"] [error="context canceled"] [stack="github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).connect\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:158\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).GetGrpcClient\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:131\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).callOnce\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:256\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:312\ngithub.com/milvus-io/milvus/internal/distributed/datacoord/client.(*Client).GetMetrics\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:502\ngithub.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func2\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:195\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57"]
[2023/04/07 09:08:27.642 +00:00] [WARN] [grpcclient/client.go:318] ["ClientBase ReCall grpc first call get error"] [role=datacoord] [error="err: context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:317 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:502 github.com/milvus-io/milvus/internal/distributed/datacoord/client.(*Client).GetMetrics\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:195 github.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func2\n/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 golang.org/x/sync/errgroup.(*Group).Go.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"]
[2023/04/07 09:08:27.642 +00:00] [WARN] [client/client.go:93] ["DataCoordClient, getSessions failed"] [key=datacoord] [error="context canceled"]
{"level":"warn","ts":"2023-04-07T09:08:27.642Z","logger":"etcd-client","caller":"v3@v3.5.5/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003ad880/etcd:2379","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
[2023/04/07 09:08:27.642 +00:00] [ERROR] [grpcclient/client.go:158] ["failed to get client address"] [error="context canceled"] [stack="github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).connect\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:158\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).GetGrpcClient\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:131\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).callOnce\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:256\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:327\ngithub.com/milvus-io/milvus/internal/distributed/datacoord/client.(*Client).GetMetrics\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:502\ngithub.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func2\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:195\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57"]
[2023/04/07 09:08:27.642 +00:00] [ERROR] [grpcclient/client.go:330] ["ClientBase ReCall grpc second call get error"] [role=datacoord] [error="err: context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:329 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:502 github.com/milvus-io/milvus/internal/distributed/datacoord/client.(*Client).GetMetrics\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:195 github.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func2\n/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 golang.org/x/sync/errgroup.(*Group).Go.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"] [stack="github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:330\ngithub.com/milvus-io/milvus/internal/distributed/datacoord/client.(*Client).GetMetrics\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:502\ngithub.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func2\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:195\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57"]
[2023/04/07 09:08:27.643 +00:00] [INFO] [server/global_rmq.go:88] ["Close Rocksmq!"]
[2023/04/07 09:08:27.644 +00:00] [WARN] [server/rocksmq_retention.go:110] ["Rocksmq retention finish!"]
[2023/04/07 09:08:27.644 +00:00] [INFO] [server/rocksmq_impl.go:552] ["Rocksmq destroy consumer group successfully "] [topic=by-dev-rootcoord-dml_0] [group=by-dev-dataNode-6-by-dev-rootcoord-dml_0_440556259765662279v0] [elapsed=0]
[2023/04/07 09:08:27.644 +00:00] [INFO] [server/rocksmq_impl.go:552] ["Rocksmq destroy consumer group successfully "] [topic=by-dev-rootcoord-dml_10] [group=by-dev-dataNode-6-by-dev-rootcoord-dml_10_440556259765686118v0] [elapsed=0]
[2023/04/07 09:08:27.644 +00:00] [INFO] [server/rocksmq_impl.go:552] ["Rocksmq destroy consumer group successfully "] [topic=by-dev-rootcoord-dml_11] [group=by-dev-dataNode-6-by-dev-rootcoord-dml_11_440556259765686118v1] [elapsed=0]
[2023/04/07 09:08:27.644 +00:00] [INFO] [server/rocksmq_impl.go:552] ["Rocksmq destroy consumer group successfully "] [topic=by-dev-rootcoord-dml_1] [group=by-dev-dataNode-6-by-dev-rootcoord-dml_1_440556259765662279v1] [elapsed=0]

Anything else?

No response

xiaofan-luan commented 1 year ago

Is there an existing issue for this?

  • [x] I have searched the existing issues

Environment

- Milvus version: 2.2.5
- Deployment mode(standalone or cluster): standalone 
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):java 2.2.4
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others: attu:v2.2.3

Current Behavior

Today service milvus-standalone exited from docker compose.

# docker compose ps
NAME                COMMAND                  SERVICE             STATUS              PORTS
milvus-etcd         "etcd -advertise-cli…"   etcd                running             2379-2380/tcp
milvus-minio        "/usr/bin/docker-ent…"   minio               running (healthy)   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp
milvus-standalone   "/tini -- milvus run…"   standalone          exited (0)

7 hours before the program went down, milvus was searched and queried using the Java SDK, using examples from the official documentation. 6 hours before the program went down, I installed Attu 2.2.3 and tried out its load/release collections and search features. Then milvus and Attu were left untouched for about 6 hours. (The last Attu's last Http log is at Fri, 07 Apr 2023 03:47:01 GMT) Milvus' last log was 2023/04/07 09:08:27.649 +00:00

In Milvus' log there are many ERROR log

[ERROR] [grpcclient/client.go:158] ["failed to get client address"] [error="find no available rootcoord, check rootcoord state"]
[ERROR] [grpcclient/client.go:158] ["failed to get client address"] [error="find no available rootcoord, check rootcoord state"]
[ERROR] [config/etcd_source.go:151] ["can not pull configs"] [error="context deadline exceeded"]
[2023/04/06 13:56:57.769 +00:00] [ERROR] [config/etcd_source.go:151] ["can not pull configs"] [error="rpc error: code = Unknown desc = context deadline exceeded"]

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

The full log file can be downloaded here(37.8MB). The end of Milvus logs:

[2023/04/07 09:08:27.125 +00:00] [INFO] [msgstream/mq_msgstream.go:182] ["start to close mq msg stream"] ["producer num"=0] ["consumer num"=1]
[2023/04/07 09:08:27.125 +00:00] [INFO] [server/rocksmq_impl.go:552] ["Rocksmq destroy consumer group successfully "] [topic=by-dev-rootcoord-delta_0] [group=by-dev-queryNode-440556259765662279-9] [elapsed=0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [client/client_impl.go:141] ["Consumer MsgMutex closed"]
[2023/04/07 09:08:27.125 +00:00] [WARN] [flowgraph/input_node.go:84] ["MsgStream closed"] ["input node"=dmInputNode-query-440556259765662279-by-dev-rootcoord-delta_0_440556259765662279v0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [flowgraph/input_node.go:86] ["trigger force sync"] [collection=440556259765662279] [position="{\"BeginTs\":440626960298934482,\"EndTs\":440626960298934486,\"Msgs\":[],\"StartPositions\":[{\"channel_name\":\"by-dev-rootcoord-delta_0\",\"msgID\":\"p1QFEA9WHQY=\",\"msgGroup\":\"by-dev-queryNode-440556259765662279-9\",\"timestamp\":440626960298934482}],\"EndPositions\":[{\"channel_name\":\"by-dev-rootcoord-delta_0\",\"msgID\":\"slQFEA9WHQY=\",\"msgGroup\":\"by-dev-queryNode-440556259765662279-9\",\"timestamp\":440626960298934486}]}"]
[2023/04/07 09:08:27.125 +00:00] [INFO] [flowgraph/node.go:112] ["flow graph node closed"] [nodeName=dmInputNode-query-440556259765662279-by-dev-rootcoord-delta_0_440556259765662279v0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [flowgraph/node.go:112] ["flow graph node closed"] [nodeName=fdNode-by-dev-rootcoord-delta_0_440556259765662279v0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [flowgraph/node.go:112] ["flow graph node closed"] [nodeName=dNode-by-dev-rootcoord-delta_0_440556259765662279v0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [querynode/flow_graph_service_time_node.go:60] ["service node hit close msg"] [collectionID=440556259765662279] [tSafe=0] [channel=by-dev-rootcoord-delta_0_440556259765662279v0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [flowgraph/node.go:112] ["flow graph node closed"] [nodeName=stNode-by-dev-rootcoord-delta_0_440556259765662279v0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [querynode/partition.go:100] ["remove a segment from replica"] [collectionID=440556259765662279] [partitionID=440556259765662280] [segmentID=440602961792467122] [segmentType=Sealed]
[2023/04/07 09:08:27.125 +00:00] [INFO] [querynode/segment.go:234] ["delete segment from memory"] [collectionID=440556259765662279] [partitionID=440556259765662280] [segmentID=440602961792467122] [segmentType=Sealed]
[2023/04/07 09:08:27.125 +00:00] [INFO] [querynode/meta_replica.go:725] ["segment removed from collection replica"] ["QueryNode ID"=9] ["collection ID"=440556259765662279] ["partition ID"=440556259765662280] ["segment ID"=440602961792467122] ["segment type"=Sealed] ["row count"=1022] ["segment indexed fields"=0]
[2023/04/07 09:08:27.125 +00:00] [INFO] [querynode/partition.go:100] ["remove a segment from replica"] [collectionID=440556259765662279] [partitionID=440556259765662280] [segmentID=440602961792467121] [segmentType=Sealed]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/segment.go:234] ["delete segment from memory"] [collectionID=440556259765662279] [partitionID=440556259765662280] [segmentID=440602961792467121] [segmentType=Sealed]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/meta_replica.go:725] ["segment removed from collection replica"] ["QueryNode ID"=9] ["collection ID"=440556259765662279] ["partition ID"=440556259765662280] ["segment ID"=440602961792467121] ["segment type"=Sealed] ["row count"=976] ["segment indexed fields"=0]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/partition.go:100] ["remove a segment from replica"] [collectionID=440556259765662279] [partitionID=440556259766644697] [segmentID=440602961792467065] [segmentType=Sealed]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/segment.go:234] ["delete segment from memory"] [collectionID=440556259765662279] [partitionID=440556259766644697] [segmentID=440602961792467065] [segmentType=Sealed]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/meta_replica.go:725] ["segment removed from collection replica"] ["QueryNode ID"=9] ["collection ID"=440556259765662279] ["partition ID"=440556259766644697] ["segment ID"=440602961792467065] ["segment type"=Sealed] ["row count"=1] ["segment indexed fields"=0]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/partition.go:100] ["remove a segment from replica"] [collectionID=440556259765662279] [partitionID=440556259766644697] [segmentID=440602961792467067] [segmentType=Sealed]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/segment.go:234] ["delete segment from memory"] [collectionID=440556259765662279] [partitionID=440556259766644697] [segmentID=440602961792467067] [segmentType=Sealed]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/meta_replica.go:725] ["segment removed from collection replica"] ["QueryNode ID"=9] ["collection ID"=440556259765662279] ["partition ID"=440556259766644697] ["segment ID"=440602961792467067] ["segment type"=Sealed] ["row count"=2] ["segment indexed fields"=0]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/collection.go:423] ["delete collection"] [collectionID=440556259765662279]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/shard_cluster_service.go:101] ["start to close shard cluster service"]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/shard_cluster.go:184] ["Close shard cluster"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_0_440556259765662279v0] [replicaID=440602962355027970]
[2023/04/07 09:08:27.127 +00:00] [INFO] [querynode/shard_cluster.go:395] ["Shard Cluster update state"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_0_440556259765662279v0] [replicaID=440602962355027970] ["old state"=1] ["new state"=2] [caller=github.com/milvus-io/milvus/internal/querynode.(*ShardCluster).Close.func1]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_node_detector.go:131] ["Closed NodeDetector watch loop quit"] [collectionID=440556259765662279] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_cluster.go:407] ["ShardCluster node channel closed"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_0_440556259765662279v0] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_segment_detector.go:118] ["Closed SegmentDetector watch loop quit"] [collectionID=440556259765662279] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [INFO] [querynode/shard_cluster.go:184] ["Close shard cluster"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_1_440556259765662279v1] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [INFO] [querynode/shard_cluster.go:395] ["Shard Cluster update state"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_1_440556259765662279v1] [replicaID=440602962355027970] ["old state"=1] ["new state"=2] [caller=github.com/milvus-io/milvus/internal/querynode.(*ShardCluster).Close.func1]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_cluster.go:430] ["ShardCluster segment channel closed"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_0_440556259765662279v0] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_node_detector.go:131] ["Closed NodeDetector watch loop quit"] [collectionID=440556259765662279] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_cluster.go:407] ["ShardCluster node channel closed"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_1_440556259765662279v1] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_segment_detector.go:118] ["Closed SegmentDetector watch loop quit"] [collectionID=440556259765662279] [replicaID=440602962355027970]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/query_shard_service.go:150] ["Close query shard service"]
[2023/04/07 09:08:27.128 +00:00] [WARN] [querynode/shard_cluster.go:430] ["ShardCluster segment channel closed"] [collectionID=440556259765662279] [channel=by-dev-rootcoord-dml_1_440556259765662279v1] [replicaID=440602962355027970]
{"level":"warn","ts":"2023-04-07T09:08:27.129Z","logger":"etcd-client","caller":"v3@v3.5.5/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000141500/etcd:2379","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: requested lease not found"}
[2023/04/07 09:08:27.129 +00:00] [INFO] [querynode/service.go:243] ["Graceful stop grpc server..."]
[2023/04/07 09:08:27.129 +00:00] [INFO] [querycoord/service.go:293] ["QueryCoord stop"] [Address=172.23.0.4:19531]
[2023/04/07 09:08:27.247 +00:00] [INFO] [querycoordv2/server.go:422] ["stop cluster..."]
[2023/04/07 09:08:27.247 +00:00] [INFO] [querycoordv2/server.go:427] ["stop dist controller..."]
[2023/04/07 09:08:27.248 +00:00] [INFO] [querycoordv2/server.go:432] ["stop checker controller..."]
[2023/04/07 09:08:27.248 +00:00] [INFO] [querycoordv2/server.go:437] ["stop task scheduler..."]
[2023/04/07 09:08:27.248 +00:00] [INFO] [querycoordv2/server.go:442] ["stop job scheduler..."]
[2023/04/07 09:08:27.248 +00:00] [INFO] [querycoordv2/server.go:446] ["stop observers..."]
[2023/04/07 09:08:27.248 +00:00] [INFO] [querycoordv2/server.go:464] ["QueryCoord stop successfully"]
{"level":"warn","ts":"2023-04-07T09:08:27.247Z","logger":"etcd-client","caller":"v3@v3.5.5/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001e9c000/etcd:2379","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: requested lease not found"}
[2023/04/07 09:08:27.248 +00:00] [INFO] [querycoord/service.go:305] ["Graceful stop grpc server..."]
[2023/04/07 09:08:27.248 +00:00] [INFO] [proxy/service.go:504] ["Proxy stop"] ["internal address"=172.23.0.4:19529] ["external address"=172.23.0.4:19529]
[2023/04/07 09:08:27.248 +00:00] [INFO] [proxy/service.go:522] ["Graceful stop grpc internal server..."]
[2023/04/07 09:08:27.249 +00:00] [INFO] [proxy/service.go:526] ["Graceful stop grpc external server..."]
[2023/04/07 09:08:27.254 +00:00] [INFO] [proxy/proxy.go:385] ["close id allocator"] [role=proxy]
[2023/04/07 09:08:27.254 +00:00] [INFO] [proxy/proxy.go:390] ["close segment id assigner"] [role=proxy]
[2023/04/07 09:08:27.254 +00:00] [INFO] [proxy/proxy.go:395] ["close scheduler"] [role=proxy]
[2023/04/07 09:08:27.254 +00:00] [INFO] [proxy/proxy.go:403] ["close channels time ticker"] [role=proxy]
{"level":"warn","ts":"2023-04-07T09:08:27.460Z","logger":"etcd-client","caller":"v3@v3.5.5/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003addc0/etcd:2379","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: requested lease not found"}
[2023/04/07 09:08:27.460 +00:00] [INFO] [proxy/channels_mgr.go:303] ["all dml stream removed"]
[2023/04/07 09:08:27.460 +00:00] [INFO] [rootcoord/service.go:281] ["Rootcoord stop"] [Address=172.23.0.4:53100]
[2023/04/07 09:08:27.460 +00:00] [INFO] [rootcoord/root_coord.go:175] ["update rootcoord state"] [state=Abnormal]
[2023/04/07 09:08:27.460 +00:00] [INFO] [rootcoord/root_coord.go:689] ["stop rootcoord executor"]
[2023/04/07 09:08:27.460 +00:00] [INFO] [rootcoord/root_coord.go:696] ["stop rootcoord scheduler"]
[2023/04/07 09:08:27.460 +00:00] [INFO] [rootcoord/root_coord.go:703] ["cancel rootcoord goroutines"]
[2023/04/07 09:08:27.463 +00:00] [WARN] [retry/retry.go:44] ["retry func failed"] ["retry time"=0] [error="rpc error: code = Canceled desc = grpc: the client connection is closing"]
[2023/04/07 09:08:27.463 +00:00] [INFO] [grpcclient/client.go:280] ["ClientBase grpc error, start to reset connection"] [role=querycoord] [error="rpc error: code = Canceled desc = grpc: the client connection is closing"]
[2023/04/07 09:08:27.463 +00:00] [WARN] [grpcclient/client.go:318] ["ClientBase ReCall grpc first call get error"] [role=querycoord] [error="err: rpc error: code = Canceled desc = grpc: the client connection is closing\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:317 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:359 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).GetMetrics\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:174 github.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func1\n/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 golang.org/x/sync/errgroup.(*Group).Go.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"]
[2023/04/07 09:08:27.464 +00:00] [WARN] [grpclog/grpclog.go:46] ["[core][Channel #193 SubChannel #194] grpc: addrConn.createTransport failed to connect to {\n  \"Addr\": \"172.23.0.4:19529\",\n  \"ServerName\": \"172.23.0.4:19529\",\n  \"Attributes\": null,\n  \"BalancerAttributes\": null,\n  \"Type\": 0,\n  \"Metadata\": null\n}. Err: connection error: desc = \"transport: Error while dialing dial tcp 172.23.0.4:19529: connect: connection refused\""]
[2023/04/07 09:08:27.464 +00:00] [WARN] [retry/retry.go:44] ["retry func failed"] ["retry time"=0] [error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.23.0.4:19529: connect: connection refused\""]
[2023/04/07 09:08:27.464 +00:00] [INFO] [grpcclient/client.go:280] ["ClientBase grpc error, start to reset connection"] [role=proxy] [error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.23.0.4:19529: connect: connection refused\""]
[2023/04/07 09:08:27.464 +00:00] [WARN] [grpcclient/client.go:318] ["ClientBase ReCall grpc first call get error"] [role=proxy] [error="err: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.23.0.4:19529: connect: connection refused\"\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:317 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/proxy/client/client.go:209 github.com/milvus-io/milvus/internal/distributed/proxy/client.(*Client).GetProxyMetrics\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/proxy_client_manager.go:254 github.com/milvus-io/milvus/internal/rootcoord.(*proxyClientManager).GetProxyMetrics.func1\n/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 golang.org/x/sync/errgroup.(*Group).Go.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"]
[2023/04/07 09:08:27.464 +00:00] [WARN] [grpclog/grpclog.go:46] ["[core][Channel #306 SubChannel #307] grpc: addrConn.createTransport failed to connect to {\n  \"Addr\": \"172.23.0.4:19529\",\n  \"ServerName\": \"172.23.0.4:19529\",\n  \"Attributes\": null,\n  \"BalancerAttributes\": null,\n  \"Type\": 0,\n  \"Metadata\": null\n}. Err: connection error: desc = \"transport: Error while dialing dial tcp 172.23.0.4:19529: connect: connection refused\""]
[2023/04/07 09:08:27.565 +00:00] [WARN] [grpclog/grpclog.go:46] ["[core][Channel #306 SubChannel #307] grpc: addrConn.createTransport failed to connect to {\n  \"Addr\": \"172.23.0.4:19529\",\n  \"ServerName\": \"172.23.0.4:19529\",\n  \"Attributes\": null,\n  \"BalancerAttributes\": null,\n  \"Type\": 0,\n  \"Metadata\": null\n}. Err: connection error: desc = \"transport: Error while dialing dial tcp 172.23.0.4:19529: connect: connection refused\""]
[2023/04/07 09:08:27.642 +00:00] [INFO] [rootcoord/root_coord.go:711] ["revoke rootcoord session"]
[2023/04/07 09:08:27.642 +00:00] [INFO] [rootcoord/service.go:310] ["Rootcoord begin to stop grpc server"]
[2023/04/07 09:08:27.642 +00:00] [INFO] [rootcoord/service.go:313] ["Graceful stop grpc server..."]
{"level":"warn","ts":"2023-04-07T09:08:27.642Z","logger":"etcd-client","caller":"v3@v3.5.5/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003ad880/etcd:2379","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: requested lease not found"}
{"level":"warn","ts":"2023-04-07T09:08:27.642Z","logger":"etcd-client","caller":"v3@v3.5.5/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003ad880/etcd:2379","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
[2023/04/07 09:08:27.642 +00:00] [WARN] [client/client.go:89] ["QueryCoordClient GetSessions failed"] [error="context canceled"]
[2023/04/07 09:08:27.642 +00:00] [ERROR] [grpcclient/client.go:158] ["failed to get client address"] [error="context canceled"] [stack="github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).connect\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:158\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).GetGrpcClient\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:131\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).callOnce\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:256\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:327\ngithub.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).GetMetrics\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:359\ngithub.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func1\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:174\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57"]
[2023/04/07 09:08:27.642 +00:00] [ERROR] [grpcclient/client.go:330] ["ClientBase ReCall grpc second call get error"] [role=querycoord] [error="err: context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:329 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:359 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).GetMetrics\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:174 github.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func1\n/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 golang.org/x/sync/errgroup.(*Group).Go.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"] [stack="github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:330\ngithub.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).GetMetrics\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:359\ngithub.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func1\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:174\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57"]
{"level":"warn","ts":"2023-04-07T09:08:27.642Z","logger":"etcd-client","caller":"v3@v3.5.5/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003ad880/etcd:2379","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
[2023/04/07 09:08:27.642 +00:00] [WARN] [client/client.go:93] ["DataCoordClient, getSessions failed"] [key=datacoord] [error="context canceled"]
[2023/04/07 09:08:27.642 +00:00] [ERROR] [grpcclient/client.go:158] ["failed to get client address"] [error="context canceled"] [stack="github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).connect\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:158\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).GetGrpcClient\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:131\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).callOnce\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:256\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:312\ngithub.com/milvus-io/milvus/internal/distributed/datacoord/client.(*Client).GetMetrics\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:502\ngithub.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func2\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:195\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57"]
[2023/04/07 09:08:27.642 +00:00] [WARN] [grpcclient/client.go:318] ["ClientBase ReCall grpc first call get error"] [role=datacoord] [error="err: context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:317 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:502 github.com/milvus-io/milvus/internal/distributed/datacoord/client.(*Client).GetMetrics\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:195 github.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func2\n/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 golang.org/x/sync/errgroup.(*Group).Go.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"]
[2023/04/07 09:08:27.642 +00:00] [WARN] [client/client.go:93] ["DataCoordClient, getSessions failed"] [key=datacoord] [error="context canceled"]
{"level":"warn","ts":"2023-04-07T09:08:27.642Z","logger":"etcd-client","caller":"v3@v3.5.5/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003ad880/etcd:2379","attempt":0,"error":"rpc error: code = Canceled desc = context canceled"}
[2023/04/07 09:08:27.642 +00:00] [ERROR] [grpcclient/client.go:158] ["failed to get client address"] [error="context canceled"] [stack="github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).connect\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:158\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).GetGrpcClient\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:131\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).callOnce\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:256\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:327\ngithub.com/milvus-io/milvus/internal/distributed/datacoord/client.(*Client).GetMetrics\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:502\ngithub.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func2\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:195\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57"]
[2023/04/07 09:08:27.642 +00:00] [ERROR] [grpcclient/client.go:330] ["ClientBase ReCall grpc second call get error"] [role=datacoord] [error="err: context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:329 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:502 github.com/milvus-io/milvus/internal/distributed/datacoord/client.(*Client).GetMetrics\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:195 github.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func2\n/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 golang.org/x/sync/errgroup.(*Group).Go.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"] [stack="github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:330\ngithub.com/milvus-io/milvus/internal/distributed/datacoord/client.(*Client).GetMetrics\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/client/client.go:502\ngithub.com/milvus-io/milvus/internal/rootcoord.(*QuotaCenter).syncMetrics.func2\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:195\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57"]
[2023/04/07 09:08:27.643 +00:00] [INFO] [server/global_rmq.go:88] ["Close Rocksmq!"]
[2023/04/07 09:08:27.644 +00:00] [WARN] [server/rocksmq_retention.go:110] ["Rocksmq retention finish!"]
[2023/04/07 09:08:27.644 +00:00] [INFO] [server/rocksmq_impl.go:552] ["Rocksmq destroy consumer group successfully "] [topic=by-dev-rootcoord-dml_0] [group=by-dev-dataNode-6-by-dev-rootcoord-dml_0_440556259765662279v0] [elapsed=0]
[2023/04/07 09:08:27.644 +00:00] [INFO] [server/rocksmq_impl.go:552] ["Rocksmq destroy consumer group successfully "] [topic=by-dev-rootcoord-dml_10] [group=by-dev-dataNode-6-by-dev-rootcoord-dml_10_440556259765686118v0] [elapsed=0]
[2023/04/07 09:08:27.644 +00:00] [INFO] [server/rocksmq_impl.go:552] ["Rocksmq destroy consumer group successfully "] [topic=by-dev-rootcoord-dml_11] [group=by-dev-dataNode-6-by-dev-rootcoord-dml_11_440556259765686118v1] [elapsed=0]
[2023/04/07 09:08:27.644 +00:00] [INFO] [server/rocksmq_impl.go:552] ["Rocksmq destroy consumer group successfully "] [topic=by-dev-rootcoord-dml_1] [group=by-dev-dataNode-6-by-dev-rootcoord-dml_1_440556259765662279v1] [elapsed=0]

Anything else?

No response

did you check the memory usage? seems that the rootcoord is forced close and that might due to node over the memory budget

yanliang567 commented 1 year ago

/assign @fall-nebula please double check as suggest above

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

abeatbeyondlab commented 11 months ago

getting same issue here

hillolsarkersanofi commented 11 months ago

Same issue here

xiaofan-luan commented 11 months ago

Please open separate issues it you hit similar behaviour and the root cause might be very different. We need detailed serverside logs to investigate on it