milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.35k stars 2.82k forks source link

[Bug]: querynode restarts due to `SIGSEGV: segmentation violation` after etcd follower pod failure chaos test #35483

Open zhuwenxing opened 3 weeks ago

zhuwenxing commented 3 weeks ago

Is there an existing issue for this?

Environment

- Milvus version:master-20240814-c42976ee-amd64
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

I20240814 09:38:05.100278  6108 SegmentSealedImpl.cpp:108] [SERVER][LoadVecIndex][milvus] Before setting field_bit for field index, fieldID:111. segmentID:451838354631885067, 
I20240814 09:38:05.100486  6108 SegmentSealedImpl.cpp:125] [SERVER][LoadVecIndex][milvus] Has load vec index done, fieldID:111. segmentID:451838354631885067, 
[2024/08/14 09:38:05.100 +00:00] [INFO] [segments/segment.go:1207] ["updateSegmentIndex done"] [traceID=d3b3e901a43f7bf13fa720efa7d76e14] [collectionID=451838354629667593] [partitionID=451838354629667594] [segmentID=451838354631885067] [fieldID=111]
I20240814 09:38:05.100801  6111 load_index_c.cpp:236] [SERVER][AppendIndexV2][milvus] [collection=451838354629667593][segment=451838354631885067][field=100][enable_mmap=false] load index 451838354629667625
[2024-08-14T09:38:05Z INFO  tantivy::indexer::segment_updater] save metas
add<folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<>&&, folly::futures::detail::State)::<lambda(folly::Executor::KeepAlive<>&&)> >
    /root/.conan/data/folly/2023.10.30.08/milvus/dev/build/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/folly/Executor.h:186 pc=0x7f6b52b2334c
operator()<folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<>&&, folly::futures::detail::State)::<lambda(folly::Executor::KeepAlive<>&&)> >
    /root/.conan/data/folly/2023.10.30.08/milvus/dev/build/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/folly/futures/detail/Core.cpp:583 pc=0x7f6b52b2334c
_ZN5folly7futures6detail8CoreBase10doCallbackEONS_8Executor9KeepAliveIS3_EENS1_5StateE
    /root/.conan/data/folly/2023.10.30.08/milvus/dev/build/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/folly/futures/detail/Core.cpp:608 pc=0x7f6b52b2334c
_ZN5folly7futures6detail8CoreBase12setCallback_EONS_8FunctionIFvRS2_ONS_8Executor9KeepAliveIS5_EEPNS_17exception_wrapperEEEEOSt10shared_ptrINS_14RequestContextEENS1_18InlineContinuationE
    /root/.conan/data/folly/2023.10.30.08/milvus/dev/build/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/folly/futures/detail/Core.cpp:468 pc=0x7f6b52b24053
I20240814 09:38:05.205202  6111 load_index_c.cpp:300] [SERVER][AppendIndexV2][milvus] [collection=451838354629667593][segment=451838354631885067][field=100][enable_mmap=false] load index 451838354629667625 done
[2024/08/14 09:38:05.205 +00:00] [INFO] [segments/segment.go:1207] ["updateSegmentIndex done"] [traceID=d3b3e901a43f7bf13fa720efa7d76e14] [collectionID=451838354629667593] [partitionID=451838354629667594] [segmentID=451838354631885067] [fieldID=100]
I20240814 09:38:05.205718  6111 load_index_c.cpp:236] [SERVER][AppendIndexV2][milvus] [collection=451838354629667593][segment=451838354631885067][field=101][enable_mmap=false] load index 451838354629667646
[2024-08-14T09:38:05Z INFO  tantivy::indexer::segment_updater] save metas
setCallback<folly::futures::detail::FutureBase<folly::Unit>::thenImplementation<folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void> >(folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>&&, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void>, folly::futures::detail::InlineContinuation)::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)> >
    /root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/detail/Core.h:632 pc=0x7f6b59d86277
setCallback_<folly::futures::detail::FutureBase<folly::Unit>::thenImplementation<folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void> >(folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>&&, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void>, folly::futures::detail::InlineContinuation)::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)> >
    /root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future-inl.h:310 pc=0x7f6b59d86277
setCallback_<folly::futures::detail::FutureBase<folly::Unit>::thenImplementation<folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void> >(folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>&&, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void>, folly::futures::detail::InlineContinuation)::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)> >
    /root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future-inl.h:318 pc=0x7f6b59d86277
thenImplementation<folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void> >
    /root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future-inl.h:379 pc=0x7f6b59d86277
thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>
    /root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future-inl.h:945 pc=0x7f6b59d86277
then<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>
    /root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future.h:1240 pc=0x7f6b59d86277
asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >
    /workspace/source/internal/core/src/futures/Future.h:188 pc=0x7f6b59d86277
async<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >
    /workspace/source/internal/core/src/futures/Future.h:98 pc=0x7f6b59d86277
AsyncSearch
    /workspace/source/internal/core/src/segcore/segment_c.cpp:121 pc=0x7f6b59d86277
_cgo_548efe5569b7_Cfunc_AsyncSearch
    /tmp/go-build/cgo-gcc-prolog:121 pc=0x501a1ec
runtime.asmcgocall
    /usr/local/go/src/runtime/asm_amd64.s:872 pc=0x1ef4087

SIGSEGV: segmentation violation
PC=0x7f6b52987c89 m=3092 sigcode=1
signal arrived during cgo execution

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/17071/pipeline

log: artifacts-etcd-followers-pod-failure-17071-server-logs.tar.gz

Anything else?

No response

binbinlv commented 3 weeks ago

/assign @weiliu1031 could you please have a look? Thanks

xiaofan-luan commented 3 weeks ago

@zhuwenxing is this only a issue on master? Is this on ARM or X86?

xiaofan-luan commented 3 weeks ago

@zhuwenxing

please make sure you are using the version with no clusterIP to do etcd kills test. I some some error comes etcd is not connected. Check with @LoveEachDay and make sure you use the correct setup.

ideally we shouldn't see panic on this etcd connect failed

[2024/08/14 09:08:21.394 +00:00] [DEBUG] [querynode/service.go:118] ["QueryNode connect to etcd failed"] [error="context deadline exceeded"] [2024/08/14 09:08:21.394 +00:00] [ERROR] [components/query_node.go:56] ["QueryNode starts error"] [error="context deadline exceeded"] [stack="github.com/milvus-io/milvus/cmd/components.(*QueryNode).Run\n\t/workspace/source/cmd/components/query_node.go:56\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/workspace/source/cmd/roles/roles.go:121"] panic: context deadline exceeded

Still checking the SigSeg issue

chyezh commented 3 weeks ago

@zhuwenxing Is these issue reproduced?

chyezh commented 3 weeks ago

/assign chyezh

zhuwenxing commented 3 weeks ago

@xiaofan-luan only reproduced in master. It's AMD because the testing cluster consists of AMD machines.

@LoveEachDay instance is created by helm version milvus-4.2.5.tgz, can you help to check the setup.

@chyezh It is not a stable reproduced issue. for now, it only happened once.

LoveEachDay commented 3 weeks ago

image Using three headless-service address for three etcd members with etcd 3.5.14.

cqy123456 commented 3 weeks ago

crash in a async search in segment 451838354632090846

[2024/08/14 09:37:56.144 +00:00] [DEBUG] [segments/segment.go:499] ["search segment..."] [traceID=0afd39eaef3452ce9e8c8832ac9a6c58] [collectionID=451838354629467151] [segmentID=451838354632090846] [segmentType=Sealed] [withIndex=false]

SIGNAL CATCH BY NON-GO SIGNAL HANDLER
SIGNO: 11; SIGNAME: Segmentation fault; SI_CODE: 1; SI_ADDR: 0x7f6864980050

but this segment still in loading:

[2024/08/14 09:37:58.006 +00:00] [INFO] [segments/segment_loader.go:541] ["start loading remote..."] [traceID=0b7133f7d6374b23acbde92672342745] [collectionID=451838354629467151] [segmentIDs="[451838354632090846]"] [segmentNum=1]
[2024/08/14 09:37:58.006 +00:00] [INFO] [segments/segment_loader.go:551] ["loading bloom filter for remote..."] [traceID=0b7133f7d6374b23acbde92672342745] [collectionID=451838354629467151] [segmentIDs="[451838354632090846]"]
[2024/08/14 09:37:58.015 +00:00] [INFO] [segments/segment_loader.go:945] ["Successfully load pk stats"] [traceID=0b7133f7d6374b23acbde92672342745] [segmentID=451838354632090846] [time=9.151753ms] [size=34304]
chyezh commented 3 weeks ago

load segment has been done.

[2024/08/14 09:37:54.495 +00:00] [INFO] [querynodev2/services.go:492] ["load segments done..."] [traceID=5ed136892591447ab531c9fa37abd7d9] [collectionID=451838354629467151] [partitionID=451838354629467152] [shard=by-dev-rootcoord-dml_1_451838354629467151v0] [segmentID=451838354632090846] [level=L1] [currentNodeID=3] [segments="[451838354632090846]"]

load delete data at 09:37:58

xiaofan-luan commented 2 weeks ago

any progress?

chyezh commented 2 weeks ago

any progress?

Make asan available for milvus binary and image #35627, and trying to reproduce it.

chyezh commented 2 weeks ago

and some odr violation #35549,#35633 is found and fixed #35610, but not make sure whether it's related to this issue.

chyezh commented 2 weeks ago

Find an assertion failure when reproducing.

milvus: /go/src/github.com/milvus-io/milvus/internal/core/src/exec/expression/EvalCtx.h:36: milvus::exec::EvalCtx::EvalCtx(milvus::exec::ExecContext*, milvus::exec::ExprSet*, milvus::RowVector*): Assertion `expr_set_ != nullptr' failed.
chyezh commented 2 weeks ago

Find an assertion failure when reproducing.

milvus: /go/src/github.com/milvus-io/milvus/internal/core/src/exec/expression/EvalCtx.h:36: milvus::exec::EvalCtx::EvalCtx(milvus::exec::ExecContext*, milvus::exec::ExprSet*, milvus::RowVector*): Assertion `expr_set_ != nullptr' failed.

It's another unrelated issue, see #35771. doing reproduce again after the fix.

chyezh commented 2 weeks ago

@zhuwenxing

please make sure you are using the version with no clusterIP to do etcd kills test. I some some error comes etcd is not connected. Check with @LoveEachDay and make sure you use the correct setup.

ideally we shouldn't see panic on this etcd connect failed

[2024/08/14 09:08:21.394 +00:00] [DEBUG] [querynode/service.go:118] ["QueryNode connect to etcd failed"] [error="context deadline exceeded"] [2024/08/14 09:08:21.394 +00:00] [ERROR] [components/query_node.go:56] ["QueryNode starts error"] [error="context deadline exceeded"] [stack="github.com/milvus-io/milvus/cmd/components.(*QueryNode).Run\n\t/workspace/source/cmd/components/query_node.go:56\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/workspace/source/cmd/roles/roles.go:121"] panic: context deadline exceeded

Still checking the SigSeg issue

It happens when testing initialization, etcd is not ready yet, and no etcd chaos have been injected. Therefore, it meets expectations.

[2024-08-14T09:07:23.151Z] + helm install --wait --debug --timeout 600s etcd-followers-pod-failure-17071 milvus/milvus --set image.all.repository=harbor.milvus.io/milvus/milvus --set image.all.tag=master-20240814-c42976ee-amd64 --set metrics.serviceMonitor.enabled=true --set etcd.metrics.enabled=true --set etcd.metrics.podMonitor.enabled=true --set etcd.metrics.podMonitor.namespace=chaos-testing --set quotaAndLimits.enabled=false -f ../cluster-values.yaml -n=chaos-testing
[2024-08-14T09:07:23.154Z] WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
[2024-08-14T09:07:23.154Z] WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
[2024-08-14T09:07:23.154Z] install.go:178: [debug] Original chart version: ""
[2024-08-14T09:07:24.083Z] install.go:195: [debug] CHART PATH: /root/.cache/helm/repository/milvus-4.2.4.tgz
[2024-08-14T09:07:24.083Z] 
[2024-08-14T09:07:25.011Z] client.go:128: [debug] creating 42 resource(s)
[2024-08-14T09:07:25.267Z] wait.go:48: [debug] beginning wait for 42 resources with timeout of 10m0s
[2024-08-14T09:07:26.191Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:29.453Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:31.970Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:34.491Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:37.757Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:40.271Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:43.550Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:46.072Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:48.643Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:51.904Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:54.417Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:56.929Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:00.194Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:02.721Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:05.544Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:08.061Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:11.335Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:13.851Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:17.119Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:19.633Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:22.152Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:25.424Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:27.939Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:30.455Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:33.720Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:36.233Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:39.498Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:42.013Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:44.525Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:47.794Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
chyezh commented 2 weeks ago

Can't reproduce after odr fixed and enable the asan.

https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/17504/pipeline/ https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/17505/pipeline/ https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/17506/pipeline/