milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.53k stars 2.83k forks source link

[Bug]: Querynode panic when sending requests after upgrading v2.2.2 to 2.2.0-20230220-c8032f7e #22312

Closed zhuwenxing closed 1 year ago

zhuwenxing commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:v2.2.2 to 2.2.0-20230220-c8032f7e
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):kafka and pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

querynode panic

[2023/02/20 11:28:12.389 +00:00] [INFO] [querynode/watch_dm_channels_task.go:306] ["watchDMChannel, add check points info for flushed segments done"] [collectionID=439587113880209960] [flushedSegmentIDs="[]"]
[2023/02/20 11:28:12.389 +00:00] [INFO] [querynode/watch_dm_channels_task.go:333] ["watchDMChannel, add check points info for dropped segments done"] [collectionID=439587113880209960] [droppedSegmentIDs="[]"]
[2023/02/20 11:28:12.389 +00:00] [INFO] [querynode/data_sync_service.go:88] ["add DML flow graph"] [collectionID=439587113880209960] [channel=by-dev-rootcoord-dml_91_439587113880209960v1]
[2023/02/20 11:28:12.389 +00:00] [INFO] [querynode/watch_dm_channels_task.go:344] ["Query node add DML flow graphs"] [collectionID=439587113880209960] [channels="[by-dev-rootcoord-dml_91_439587113880209960v1]"]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x2bb6470]

goroutine 133969 [running]:
github.com/milvus-io/milvus/internal/querynode.(*distribution).GetCurrent(0x0, {0x0?, 0x7ff4dde4b2b0?, 0xc005655408?})
    /go/src/github.com/milvus-io/milvus/internal/querynode/distribution.go:98 +0x50
github.com/milvus-io/milvus/internal/querynode.(*ShardCluster).GetSegmentInfos(0xc0013bf380)
    /go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:866 +0x36
github.com/milvus-io/milvus/internal/querynode.(*QueryNode).GetDataDistribution(0xc000572690, {0x3c49220?, 0x3d19080?}, 0xc0058710e0)
    /go/src/github.com/milvus-io/milvus/internal/querynode/impl.go:1348 +0xd92
github.com/milvus-io/milvus/internal/distributed/querynode.(*Server).GetDataDistribution(0xf?, {0x41ea478?, 0xc0098c63c0?}, 0x10?)
    /go/src/github.com/milvus-io/milvus/internal/distributed/querynode/service.go:337 +0x2b
github.com/milvus-io/milvus/internal/proto/querypb._QueryNode_GetDataDistribution_Handler.func1({0x41ea478, 0xc0098c63c0}, {0x3c38b40?, 0xc0058710e0})
    /go/src/github.com/milvus-io/milvus/internal/proto/querypb/query_coord.pb.go:5811 +0x78
github.com/milvus-io/milvus/internal/util/logutil.UnaryTraceLoggerInterceptor({0x41ea478?, 0xc0098c6030?}, {0x3c38b40, 0xc0058710e0}, 0x41d48e0?, 0xc00683e198)
    /go/src/github.com/milvus-io/milvus/internal/util/logutil/grpc_interceptor.go:22 +0x49
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x41ea478?, 0xc0098c6030?}, {0x3c38b40?, 0xc0058710e0?})
    /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.UnaryServerInterceptor.func1({0x41ea478, 0xc005870cf0}, {0x3c38b40, 0xc0058710e0}, 0xc009fec140?, 0xc009fec160)
    /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/tracing/opentracing/server_interceptors.go:38 +0x16a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x41ea478?, 0xc005870cf0?}, {0x3c38b40?, 0xc0058710e0?})
    /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x41ea478, 0xc005870cf0}, {0x3c38b40, 0xc0058710e0}, 0xc009fd3af0?, 0x3a58860?)
    /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34 +0xbf
github.com/milvus-io/milvus/internal/proto/querypb._QueryNode_GetDataDistribution_Handler({0x3d19080?, 0xc000bf0540}, {0x41ea478, 0xc005870cf0}, 0xc0054fb080, 0xc0010008d0)
    /go/src/github.com/milvus-io/milvus/internal/proto/querypb/query_coord.pb.go:5813 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0004a1dc0, {0x41fac50, 0xc00173e1a0}, 0xc002fb7200, 0xc001000a50, 0x56c2180, 0x0)
    /go/pkg/mod/google.golang.org/grpc@v1.46.0/server.go:1283 +0xcfd
google.golang.org/grpc.(*Server).handleStream(0xc0004a1dc0, {0x41fac50, 0xc00173e1a0}, 0xc002fb7200, 0x0)
    /go/pkg/mod/google.golang.org/grpc@v1.46.0/server.go:1620 +0xa1b
google.golang.org/grpc.(*Server).serveStreams.func1.2()
    /go/pkg/mod/google.golang.org/grpc@v1.46.0/server.go:922 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
    /go/pkg/mod/google.golang.org/grpc@v1.46.0/server.go:920 +0x28a

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_for_release_cron/detail/deploy_test_for_release_cron/386/pipeline log:

artifacts-pulsar-cluster-upgrade-386-server-logs (1).tar.gz artifacts-pulsar-cluster-upgrade-386-pytest-logs.tar.gz

Anything else?

No response

zhuwenxing commented 1 year ago

It is a stable reproduced issue, so set it as critical

xiaofan-luan commented 1 year ago

/assign @congqixia

congqixia commented 1 year ago

ok, working on it /unassign @yanliang567

congqixia commented 1 year ago

This bug is caused by adding queryshard and setup shard cluster happens in different stage of WatchDmChannel task execution. When fetching data distribution, either condition needs to be checked before calling ShardCluster.GetSegmentInfos

congqixia commented 1 year ago

Could your please check whether bug persists after patch merged? /assign @zhuwenxing

zhuwenxing commented 1 year ago

verified and passed with 2.2.0-20230222-40878a86