Closed zhuwenxing closed 1 year ago
/unassign @yanliang567 /assign ok, working on it
@zhuwenxing patch pr has been merged. Could you please verify? /assign @zhuwenxing
The error log changed to fail to Search, QueryNode ID=19, reason=Search 21 failed, reason QueryNode 22 can't serve, recovering
.
And I take a deep look at logs, finding out that one querynode panic with error
[2023/02/20 22:08:06.100 +00:00] [DEBUG] [pulsar/pulsar_client.go:135] ["tr/create consumer"] [msg="create consumer done"] [duration=455.682623ms]
[2023/02/20 22:08:06.100 +00:00] [INFO] [msgstream/mq_msgstream.go:880] ["MsgStream begin to seek start msg: "] [channel=by-dev-rootcoord-dml_16] [MessageID=CBIQ7hcYACAA]
time="2023-02-20T22:08:06Z" level=info msg="Broker notification of Closed consumer: 121" local_addr="10.102.7.128:55020" remote_addr="pulsar://querynode-pod-kill-2066-pulsar-proxy:6650"
time="2023-02-20T22:08:06Z" level=info msg="[Reconnecting to broker in 117.469866ms]" consumerID=121 name=dgsbd subscription=by-dev-queryNode-439597179875298859-21 topic="persistent://public/default/by-dev-rootcoord-dml_16"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x2bb7110]
goroutine 101742 [running]:
github.com/milvus-io/milvus/internal/querynode.(*distribution).GetCurrent(0x0, {0x0?, 0x7f54fd140190?, 0xc001b25408?})
/go/src/github.com/milvus-io/milvus/internal/querynode/distribution.go:98 +0x50
github.com/milvus-io/milvus/internal/querynode.(*ShardCluster).GetSegmentInfos(0xc000263d40)
/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:866 +0x36
github.com/milvus-io/milvus/internal/querynode.(*QueryNode).GetDataDistribution(0xc00084e1e0, {0x3c4a380?, 0x3d1a1e0?}, 0xc0036c2750)
/go/src/github.com/milvus-io/milvus/internal/querynode/impl.go:1348 +0xd92
github.com/milvus-io/milvus/internal/distributed/querynode.(*Server).GetDataDistribution(0xf?, {0x41ebc58?, 0xc0036c2870?}, 0x10?)
/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/service.go:337 +0x2b
github.com/milvus-io/milvus/internal/proto/querypb._QueryNode_GetDataDistribution_Handler.func1({0x41ebc58, 0xc0036c2870}, {0x3c39ca0?, 0xc0036c2750})
/go/src/github.com/milvus-io/milvus/internal/proto/querypb/query_coord.pb.go:5811 +0x78
github.com/milvus-io/milvus/internal/util/logutil.UnaryTraceLoggerInterceptor({0x41ebc58?, 0xc0036c27e0?}, {0x3c39ca0, 0xc0036c2750}, 0x41d60c0?, 0xc002f5d920)
/go/src/github.com/milvus-io/milvus/internal/util/logutil/grpc_interceptor.go:22 +0x49
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x41ebc58?, 0xc0036c27e0?}, {0x3c39ca0?, 0xc0036c2750?})
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.UnaryServerInterceptor.func1({0x41ebc58, 0xc0036c2720}, {0x3c39ca0, 0xc0036c2750}, 0xc002f6fcc0?, 0xc002f6fce0)
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/tracing/opentracing/server_interceptors.go:38 +0x16a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x41ebc58?, 0xc0036c2720?}, {0x3c39ca0?, 0xc0036c2750?})
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x41ebc58, 0xc0036c2720}, {0x3c39ca0, 0xc0036c2750}, 0xc002d95af0?, 0x3a599e0?)
/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34 +0xbf
github.com/milvus-io/milvus/internal/proto/querypb._QueryNode_GetDataDistribution_Handler({0x3d1a1e0?, 0xc00055e0c0}, {0x41ebc58, 0xc0036c2720}, 0xc0031ac5a0, 0xc0014e40f0)
/go/src/github.com/milvus-io/milvus/internal/proto/querypb/query_coord.pb.go:5813 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00014f500, {0x41fc430, 0xc001684820}, 0xc002d76000, 0xc0014e4270, 0x56c4180, 0x0)
/go/pkg/mod/google.golang.org/grpc@v1.46.0/server.go:1283 +0xcfd
google.golang.org/grpc.(*Server).handleStream(0xc00014f500, {0x41fc430, 0xc001684820}, 0xc002d76000, 0x0)
/go/pkg/mod/google.golang.org/grpc@v1.46.0/server.go:1620 +0xa1b
google.golang.org/grpc.(*Server).serveStreams.func1.2()
/go/pkg/mod/google.golang.org/grpc@v1.46.0/server.go:922 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
/go/pkg/mod/google.golang.org/grpc@v1.46.0/server.go:920 +0x28a
If this new error is not caused by the same root cause, I would like to open a new issue to trace it.
/assign @congqixia
chaos type: pod-kill image tag: 2.2.0-20230220-cd2e7fa5 target pod: querynode failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-for-release-cron/detail/chaos-test-for-release-cron/2066/pipeline log: artifacts-querynode-pod-kill-2066-server-logs (1).tar.gz
shall be the same root cause of #22317
verified and passed with 2.2.0-20230222-ec20fec1
Is there an existing issue for this?
Environment
Current Behavior
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
chaos type: pod-kill image tag: 2.2.0-20230215-cad39ebf target pod: querynode failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-for-release-cron/detail/chaos-test-kafka-for-release-cron/2015/pipeline
log: artifacts-querynode-pod-kill-2015-server-logs.tar.gz artifacts-querynode-pod-kill-2015-pytest-logs.tar.gz
Anything else?
It is a stable reproduced issue with this image tag. But it works well with
2.2.0-20230214-c333ee8d
, so the problem is introduced yesterday.The PR https://github.com/milvus-io/milvus/pull/22154 is suspicious
@congqixia Can you please take a look