milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.94k stars 2.95k forks source link

[Bug]: Search failed when running `test_e2e.py` #16775

Closed zhuwenxing closed 2 years ago

zhuwenxing commented 2 years ago

Is there an existing issue for this?

Environment

- Milvus version:master-20220505-333b439a
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): 2.1.0.dev37
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Each time the code executes to the search step, querynode restarts, causing the search to fail. querynode's log is below. The strange thing is that there is no release operation performed in the code, but the error is reported in the log.

[2022/05/05 06:34:08.212 +00:00] [DEBUG] [service.go:204] ["QueryNode start done ..."]
[2022/05/05 06:34:08.212 +00:00] [DEBUG] [query_node.go:54] ["QueryNode successfully started"]
[2022/05/05 06:34:08.698 +00:00] [DEBUG] [impl.go:384] ["releaseCollectionTask Enqueue done"] [collectionID=432991732513636353]
[2022/05/05 06:34:08.698 +00:00] [DEBUG] [task.go:700] ["Execute release collection task"] [collectionID=432991732513636353]
[2022/05/05 06:34:08.698 +00:00] [DEBUG] [task.go:701] ["release streaming"] [collectionID=432991732513636353]
[2022/05/05 06:34:08.708 +00:00] [DEBUG] [impl.go:384] ["releaseCollectionTask Enqueue done"] [collectionID=432991732513636353]
[2022/05/05 06:34:09.699 +00:00] [DEBUG] [task.go:705] ["Starting release collection..."] [collectionID=432991732513636353]
[2022/05/05 06:34:09.699 +00:00] [WARN] [task_scheduler.go:63] ["release collection failed, collectionID = 432991732513636353, err = collection hasn't been loaded or has been released, collection id = 432991732513636353"]
[2022/05/05 06:34:09.699 +00:00] [DEBUG] [task.go:700] ["Execute release collection task"] [collectionID=432991732513636353]
[2022/05/05 06:34:09.699 +00:00] [DEBUG] [task.go:701] ["release streaming"] [collectionID=432991732513636353]
[2022/05/05 06:34:09.699 +00:00] [ERROR] [impl.go:389] ["release collection failed, collectionID = 432991732513636353, err = collection hasn't been loaded or has been released, collection id = 432991732513636353"] [stack="github.com/milvus-io/milvus/internal/querynode.(*QueryNode).ReleaseCollection.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querynode/impl.go:389\ngithub.com/milvus-io/milvus/internal/querynode.(*QueryNode).ReleaseCollection\n\t/go/src/github.com/milvus-io/milvus/internal/querynode/impl.go:393\ngithub.com/milvus-io/milvus/internal/distributed/querynode.(*Server).ReleaseCollection\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/service.go:288\ngithub.com/milvus-io/milvus/internal/proto/querypb._QueryNode_ReleaseCollection_Handler.func1\n\t/go/src/github.com/milvus-io/milvus/internal/proto/querypb/query_coord.pb.go:4174\ngithub.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.UnaryServerInterceptor.func1\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/tracing/opentracing/server_interceptors.go:38\ngithub.com/milvus-io/milvus/internal/proto/querypb._QueryNode_ReleaseCollection_Handler\n\t/go/src/github.com/milvus-io/milvus/internal/proto/querypb/query_coord.pb.go:4176\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.38.0/server.go:934"]
[2022/05/05 06:34:10.699 +00:00] [DEBUG] [task.go:705] ["Starting release collection..."] [collectionID=432991732513636353]
[2022/05/05 06:34:10.700 +00:00] [WARN] [task_scheduler.go:63] ["release collection failed, collectionID = 432991732513636353, err = collection hasn't been loaded or has been released, collection id = 432991732513636353"]
[2022/05/05 06:34:10.700 +00:00] [ERROR] [impl.go:389] ["release collection failed, collectionID = 432991732513636353, err = collection hasn't been loaded or has been released, collection id = 432991732513636353"] [stack="github.com/milvus-io/milvus/internal/querynode.(*QueryNode).ReleaseCollection.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querynode/impl.go:389\ngithub.com/milvus-io/milvus/internal/querynode.(*QueryNode).ReleaseCollection\n\t/go/src/github.com/milvus-io/milvus/internal/querynode/impl.go:393\ngithub.com/milvus-io/milvus/internal/distributed/querynode.(*Server).ReleaseCollection\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/service.go:288\ngithub.com/milvus-io/milvus/internal/proto/querypb._QueryNode_ReleaseCollection_Handler.func1\n\t/go/src/github.com/milvus-io/milvus/internal/proto/querypb/query_coord.pb.go:4174\ngithub.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.UnaryServerInterceptor.func1\n\t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/tracing/opentracing/server_interceptors.go:38\ngithub.com/milvus-io/milvus/internal/proto/querypb._QueryNode_ReleaseCollection_Handler\n\t/go/src/github.com/milvus-io/milvus/internal/proto/querypb/query_coord.pb.go:4176\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.38.0/server.go:1286\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.38.0/server.go:1609\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.38.0/server.go:934"]
[2022/05/05 06:34:10.717 +00:00] [DEBUG] [query_node.go:454] ["get SealedSegmentsChangeInfo from etcd"] [infoID=432989358748860459]

Expected Behavior

test case passed

Steps To Reproduce

see https://github.com/milvus-io/milvus/blob/master/tests/python_client/testcases/test_e2e.py

Milvus Log

No response

Anything else?

logs: pod_log.zip

zhuwenxing commented 2 years ago

more log info test-e2e-05-05-14-51.zip

in the log file pod_log_previous/milvus-chaos-querynode-96fbf4fdf-g2ncd.log

[2022/05/05 06:34:05.495 +00:00] [DEBUG] [segment.go:313] ["do search on segment"] [segmentID=432991742724145154] [segmentType=3]
[2022/05/05 06:34:05.497 +00:00] [DEBUG] [query_shard.go:238] ["wait serviceable ts done"] [tsType="Delta tSafe"] ["guarantee ts"=432991750143082497] ["serviceable ts"=432991750155665409] [channel=by-dev-rootcoord-dml_3_432991732513636353v1]
[2022/05/05 06:34:05.497 +00:00] [DEBUG] [historical.go:169] ["search validated partitions"] [collectionID=432991732513636353] [partitionIDs="[432991732513636354]"]
[2022/05/05 06:34:05.497 +00:00] [DEBUG] [segment.go:313] ["do search on segment"] [segmentID=432991736511332354] [segmentType=3]
[2022/05/05 06:34:05.497 +00:00] [DEBUG] [segment.go:313] ["do search on segment"] [segmentID=432991736511332353] [segmentType=3]
[2022/05/05 06:34:05.497 +00:00] [DEBUG] [segment.go:313] ["do search on segment"] [segmentID=432991742724145153] [segmentType=3]
[2022/05/05 06:34:05.504 +00:00] [DEBUG] [reduce.go:74] ["start marshal..."] [collectionID=432991732513636353] [msgID=0] [reqSlices="[1]"]
[2022/05/05 06:34:05.504 +00:00] [DEBUG] [reduce.go:74] ["start marshal..."] [collectionID=432991732513636353] [msgID=0] [reqSlices="[1]"]
Assert "result_offset_end <= nq * topK" at /go/src/github.com/milvus-io/milvus/internal/core/src/segcore/reduce_c.cpp:185
 => illegal result_offset_end when GetSearchResultDataSlice, result_offset_end = 2, nq = 1, topk = 1Assert "result_offset_end <= nq * topK" at /go/src/github.com/milvus-io/milvus/internal/core/src/segcore/reduce_c.cpp:185
 => illegal result_offset_end when GetSearchResultDataSlice, result_offset_end = 2, nq = 1, topk = 1

free(): invalid pointer
free(): invalid pointer
SIGABRT: abort
PC=0x7fbc12c80e87 m=3 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: unknown pc 0x7fbc12c80e87
yanliang567 commented 2 years ago

@sunby could you please take a look at this issue? /assign @sunby /unassign

bigsheeper commented 2 years ago

/assign

bigsheeper commented 2 years ago

may same as #16769

bigsheeper commented 2 years ago

This issue is caused by appending wrong topk result when reduce search result. And this only happens when topK and nq are very small.

bigsheeper commented 2 years ago

would fixed by #16815

zhuwenxing commented 2 years ago

verified and passed in version master-20220506-67ce206c