milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.74k stars 2.85k forks source link

[Bug]: During the rolling upgrade, when upgrading mixcoord, mixcoord will panic once before completing the upgrade successfully. #36411

Open zhuwenxing opened 1 week ago

zhuwenxing commented 1 week ago

Is there an existing issue for this?

Environment

- Milvus version:2.4.3--> master-20240919-f6526121-amd64
- Deployment mode(standalone or cluster):mixcoord
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

image

[2024/09/20 03:34:09.202 +00:00] [ERROR] [querycoordv2/server.go:162] ["failed to activate standby server"] [error="stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace\n/workspace/source/internal/util/grpcclient/client.go:555 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n/workspace/source/internal/util/grpcclient/client.go:569 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/workspace/source/internal/distributed/rootcoord/client/client.go:107 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]\n/workspace/source/internal/distributed/rootcoord/client/client.go:183 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollection\n/workspace/source/internal/querycoordv2/meta/coordinator_broker.go:83 github.com/milvus-io/milvus/internal/querycoordv2/meta.(*CoordinatorBroker).DescribeCollection\n/workspace/source/internal/querycoordv2/meta/collection_manager.go:217 github.com/milvus-io/milvus/internal/querycoordv2/meta.(*CollectionManager).upgradeLoadFields\n/workspace/source/internal/querycoordv2/meta/collection_manager.go:162 github.com/milvus-io/milvus/internal/querycoordv2/meta.(*CollectionManager).Recover\n/workspace/source/internal/querycoordv2/server.go:358 github.com/milvus-io/milvus/internal/querycoordv2.(*Server).initMeta\n/workspace/source/internal/querycoordv2/server.go:252 github.com/milvus-io/milvus/internal/querycoordv2.(*Server).initQueryCoord: service not ready[mixture=15]: Initializing"] [errorVerbose="stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace: service not ready[mixture=15]: Initializing\n(1) attached stack trace\n  -- stack trace:\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n  | \t/workspace/source/internal/util/grpcclient/client.go:555\n  | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n  | \t/workspace/source/internal/util/grpcclient/client.go:569\n  | github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]\n  | \t/workspace/source/internal/distributed/rootcoord/client/client.go:107\n  | github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollection\n  | \t/workspace/source/internal/distributed/rootcoord/client/client.go:183\n  | github.com/milvus-io/milvus/internal/querycoordv2/meta.(*CoordinatorBroker).DescribeCollection\n  | \t/workspace/source/internal/querycoordv2/meta/coordinator_broker.go:83\n  | github.com/milvus-io/milvus/internal/querycoordv2/meta.(*CollectionManager).upgradeLoadFields\n  | \t/workspace/source/internal/querycoordv2/meta/collection_manager.go:217\n  | github.com/milvus-io/milvus/internal/querycoordv2/meta.(*CollectionManager).Recover\n  | \t/workspace/source/internal/querycoordv2/meta/collection_manager.go:162\n  | github.com/milvus-io/milvus/internal/querycoordv2.(*Server).initMeta\n  | \t/workspace/source/internal/querycoordv2/server.go:358\n  | github.com/milvus-io/milvus/internal/querycoordv2.(*Server).initQueryCoord\n  | \t/workspace/source/internal/querycoordv2/server.go:252\n  | github.com/milvus-io/milvus/internal/querycoordv2.(*Server).Init.func1\n  | \t/workspace/source/internal/querycoordv2/server.go:197\n  | github.com/milvus-io/milvus/internal/util/sessionutil.(*Session).ProcessActiveStandBy\n  | \t/workspace/source/internal/util/sessionutil/session_util.go:1105\n  | github.com/milvus-io/milvus/internal/querycoordv2.(*Server).Register.func2\n  | \t/workspace/source/internal/querycoordv2/server.go:161\n  | runtime.goexit\n  | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (2) stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace\n  | /workspace/source/internal/util/grpcclient/client.go:555 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n  | /workspace/source/internal/util/grpcclient/client.go:569 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n  | /workspace/source/internal/distributed/rootcoord/client/client.go:107 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]\n  | /workspace/source/internal/distributed/rootcoord/client/client.go:183 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollection\n  | /workspace/source/internal/querycoordv2/meta/coordinator_broker.go:83 github.com/milvus-io/milvus/internal/querycoordv2/meta.(*CoordinatorBroker).DescribeCollection\n  | /workspace/source/internal/querycoordv2/meta/collection_manager.go:217 github.com/milvus-io/milvus/internal/querycoordv2/meta.(*CollectionManager).upgradeLoadFields\n  | /workspace/source/internal/querycoordv2/meta/collection_manager.go:162 github.com/milvus-io/milvus/internal/querycoordv2/meta.(*CollectionManager).Recover\n  | /workspace/source/internal/querycoordv2/server.go:358 github.com/milvus-io/milvus/internal/querycoordv2.(*Server).initMeta\n  | /workspace/source/internal/querycoordv2/server.go:252 github.com/milvus-io/milvus/internal/querycoordv2.(*Server).initQueryCoord\nWraps: (3) service not ready[mixture=15]: Initializing\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) merr.milvusError"] [stack="github.com/milvus-io/milvus/internal/querycoordv2.(*Server).Register.func2\n\t/workspace/source/internal/querycoordv2/server.go:162"]
panic: stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:555 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:569 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/rootcoord/client/client.go:107 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/rootcoord/client/client.go:183 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).DescribeCollection
/workspace/source/internal/querycoordv2/meta/coordinator_broker.go:83 github.com/milvus-io/milvus/internal/querycoordv2/meta.(*CoordinatorBroker).DescribeCollection
/workspace/source/internal/querycoordv2/meta/collection_manager.go:217 github.com/milvus-io/milvus/internal/querycoordv2/meta.(*CollectionManager).upgradeLoadFields
/workspace/source/internal/querycoordv2/meta/collection_manager.go:162 github.com/milvus-io/milvus/internal/querycoordv2/meta.(*CollectionManager).Recover
/workspace/source/internal/querycoordv2/server.go:358 github.com/milvus-io/milvus/internal/querycoordv2.(*Server).initMeta
/workspace/source/internal/querycoordv2/server.go:252 github.com/milvus-io/milvus/internal/querycoordv2.(*Server).initQueryCoord: service not ready[mixture=15]: Initializing

goroutine 550 [running]:
panic({0x6644d00?, 0xc0057795f0?})
    /usr/local/go/src/runtime/panic.go:1017 +0x3ac fp=0xc003983f08 sp=0xc003983e58 pc=0x2179d8c
github.com/milvus-io/milvus/internal/querycoordv2.(*Server).Register.func2()
    /workspace/source/internal/querycoordv2/server.go:163 +0x157 fp=0xc003983fe0 sp=0xc003983f08 pc=0x5c4a3b7
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc003983fe8 sp=0xc003983fe0 pc=0x21b35e1
created by github.com/milvus-io/milvus/internal/querycoordv2.(*Server).Register in goroutine 367
    /workspace/source/internal/querycoordv2/server.go:160 +0xbb

goroutine 1 [chan receive, 1 minutes]:
runtime.gopark(0x104ba0c368cc4bf?, 0x18?, 0x18?, 0x0?, 0x69700a0?)
    /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00115d550 sp=0xc00115d530 pc=0x217ddae
runtime.chanrecv(0xc001a0c420, 0x0, 0x1)
    /usr/local/go/src/runtime/chan.go:583 +0x3cd fp=0xc00115d5c8 sp=0xc00115d550 pc=0x2145bad
runtime.chanrecv1(0xa09af88?, 0x69700a0?)
    /usr/local/go/src/runtime/chan.go:442 +0x12 fp=0xc00115d5f0 sp=0xc00115d5c8 pc=0x21457b2
github.com/milvus-io/milvus/cmd/roles.(*MilvusRoles).Run(0xc000391b30)
    /workspace/source/cmd/roles/roles.go:533 +0x1426 fp=0xc00115db70 sp=0xc00115d5f0 pc=0x5e197e6
github.com/milvus-io/milvus/cmd/milvus.(*run).execute(0xa09ec00?, {0xc0002cc000?, 0x7, 0x7}, 0xc000904cb0)
    /workspace/source/cmd/milvus/run.go:47 +0x2c9 fp=0xc00115dc40 sp=0xc00115db70 pc=0x5e25e69
github.com/milvus-io/milvus/cmd/milvus.RunMilvus({0xc0002cc000?, 0x7, 0x7})
    /workspace/source/cmd/milvus/milvus.go:60 +0x204 fp=0xc00115dcb8 sp=0xc00115dc40 pc=0x5e25b04
main.main()
    /workspace/source/cmd/main.go:97 +0x44a fp=0xc00115df40 sp=0xc00115dcb8 pc=0x5e2a6ea
runtime.main()
    /usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc00115dfe0 sp=0xc00115df40 pc=0x217d93b
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00115dfe8 sp=0xc00115dfe0 pc=0x21b35e1

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/rolling_update_for_operator_test_simple/detail/rolling_update_for_operator_test_simple/4938/pipeline log: artifacts-kafka-mixcoord-4938-server-logs.tar.gz

cluster:4am ns: chaos-testing pod info


[2024-09-20T03:59:16.844Z] + kubectl get pods -o wide

[2024-09-20T03:59:16.845Z] + grep kafka-mixcoord-4938

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-etcd-0                                       1/1     Running       0               40m     10.104.24.5     4am-node29   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-etcd-1                                       1/1     Running       0               40m     10.104.27.87    4am-node31   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-etcd-2                                       1/1     Running       0               40m     10.104.21.237   4am-node24   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-kafka-0                                      2/2     Running       1 (39m ago)     40m     10.104.24.9     4am-node29   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-kafka-1                                      2/2     Running       1 (39m ago)     40m     10.104.26.192   4am-node32   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-kafka-2                                      2/2     Running       1 (39m ago)     40m     10.104.23.219   4am-node27   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-kafka-exporter-bb87985bd-tztgr               1/1     Running       4 (39m ago)     40m     10.104.13.9     4am-node16   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-kafka-zookeeper-0                            1/1     Running       0               40m     10.104.24.8     4am-node29   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-kafka-zookeeper-1                            1/1     Running       0               40m     10.104.16.158   4am-node21   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-kafka-zookeeper-2                            1/1     Running       0               40m     10.104.18.210   4am-node25   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-milvus-datanode-68cd58d6df-4d29j             1/1     Running       0               18m     10.104.27.112   4am-node31   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-milvus-datanode-68cd58d6df-t645j             1/1     Running       0               17m     10.104.32.153   4am-node39   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-milvus-datanode-68cd58d6df-w9nz8             1/1     Running       0               16m     10.104.18.219   4am-node25   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-milvus-indexnode-79b9c79b4c-49v44            1/1     Running       0               31m     10.104.20.219   4am-node22   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-milvus-indexnode-79b9c79b4c-nzs6w            1/1     Running       0               30m     10.104.26.202   4am-node32   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-milvus-indexnode-79b9c79b4c-p7d9m            1/1     Running       0               29m     10.104.27.100   4am-node31   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-milvus-mixcoord-7bccf4ccf9-zvmfx             1/1     Running       1 (25m ago)     26m     10.104.32.144   4am-node39   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-milvus-proxy-79676b54dc-jx82z                1/1     Running       0               13m     10.104.18.233   4am-node25   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-milvus-querynode-1-76c6c9447f-7zbb7          1/1     Running       0               19m     10.104.20.244   4am-node22   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-milvus-querynode-1-76c6c9447f-sc44t          1/1     Running       0               18m     10.104.33.43    4am-node36   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-milvus-querynode-1-76c6c9447f-w28qg          1/1     Running       0               20m     10.104.27.109   4am-node31   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-minio-0                                      1/1     Running       0               40m     10.104.23.217   4am-node27   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-minio-1                                      1/1     Running       0               40m     10.104.24.7     4am-node29   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-minio-2                                      1/1     Running       0               40m     10.104.19.231   4am-node28   <none>           <none>

[2024-09-20T03:59:17.418Z] kafka-mixcoord-4938-minio-3                                      1/1     Running       0               40m     10.104.30.243   4am-node38   <none>           <none>

Anything else?

No response

xiaofan-luan commented 1 week ago

/assign @weiliu1031