yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.88k stars 1.05k forks source link

[CDCSDK] Regression: Long running CDC with nemesis tserver crashes seen in recent builds from 2.17.3.0-b65 #16405

Closed shamanthchandra-yb closed 1 year ago

shamanthchandra-yb commented 1 year ago

Jira Link: DB-5814

Description

http://stress.dev.yugabyte.com/stress_test/50923e44-a52c-4c00-9e6e-6724a5b8207e (Core dumps, series of) -> test_cdc_lru_nemesis_postgres_debezium

http://stress.dev.yugabyte.com/stress_test/2422a7cd-6362-4e31-9062-7b06626056b2 -> test_cdc_lru_nemesis_postgres_cdcsdk

Common backtrace seen:

thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV
  * frame #0: 0x000055cc3ef28f09 yb-tserver`yb::cdc::PopulateCDCSDKIntentRecord(yb::OpId const&, yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::__1::vector<yb::docdb::IntentKeyValueForCDC, std::__1::allocator<yb::docdb::IntentKeyValueForCDC>> const&, yb::cdc::StreamMetadata const&, std::__1::shared_ptr<yb::tablet::TabletPeer> const&, std::__1::unordered_map<unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>> const&, std::__1::unordered_map<unsigned int, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB>>, std::__1::hash<unsigned int>, std::__1::equal_to<unsigned int>, std::__1::allocator<std::__1::pair<unsigned int const, std::__1::vector<yb::master::PgAttributePB, std::__1::allocator<yb::master::PgAttributePB>>>>> const&, std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, yb::cdc::SchemaDetails, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, yb::cdc::SchemaDetails>>>*, yb::cdc::GetChangesResponsePB*, yb::ScopedTrackedConsumption*, unsigned int*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, unsigned long const&, yb::client::YBClient*) [inlined] yb::DocHybridTime::hybrid_time(this=0x0000000000000090) const at doc_hybrid_time.h:102:43
    frame #1: 0x000055cc3ef28f09 yb-tserver`yb::cdc::PopulateCDCSDKIntentRecord(op_id=0x00007f328278a2b0, transaction_id=0x00007f328278a750, intents=0x00007f328278a460, metadata=0x00007f328278ad30, tablet_peer=std::__1::shared_ptr<yb::tablet::TabletPeer>::element_type @ 0x000055cc56d3d500, enum_oid_label_map=0x00007f328278ac68, composite_atts_map=0x00007f328278acc8, cached_schema_details=0x00007f328278adc0, resp=0x000055cd1c491a78, consumption=0x00007f328278a4d0, write_id=0x00007f3282789e40, reverse_index_key="", commit_time=0x00007f328278a4b0, client=0x000055cc5697ed88) at cdcsdk_producer.cc:462:34
    frame #2: 0x000055cc3ef30b3b yb-tserver`yb::cdc::ProcessIntents(op_id=0x00007f328278a2b0, transaction_id=0x00007f328278a750, metadata=0x00007f328278ad30, enum_oid_label_map=0x00007f328278ac68, composite_atts_map=0x00007f328278acc8, resp=0x000055cd1c491a78, consumption=0x00007f328278a4d0, checkpoint=0x00007f328278a848, tablet_peer=std::__1::shared_ptr<yb::tablet::TabletPeer>::element_type @ 0x000055cc56d3d500, keyValueIntents=0x00007f328278a460, stream_state=0x00007f328278a2f0, client=0x000055cc5697ed88, cached_schema_details=0x00007f328278adc0, commit_time=0x00007f328278a4b0) at cdcsdk_producer.cc:1203:3
    frame #3: 0x000055cc3ef35a6a yb-tserver`yb::cdc::GetChangesForCDCSDK(stream_id="", tablet_id="73e980be825e40e28456f3ce96622370", from_op_id=0x00007f328278abe0, stream_metadata=0x00007f328278ad30, tablet_peer=std::__1::shared_ptr<yb::tablet::TabletPeer>::element_type @ 0x000055cc56d3d500, mem_tracker=std::__1::shared_ptr<yb::MemTracker>::element_type @ 0x000055cc5818ce20, enum_oid_label_map=0x00007f328278ac68, composite_atts_map=0x00007f328278acc8, client=0x000055cc5697ed88, msgs_holder=0x00007f328278aac0, resp=0x000055cd1c491a78, commit_timestamp=0x00007f328278ab38, cached_schema_details=0x00007f328278adc0, last_streamed_op_id=0x00007f328278aa00, last_readable_opid_index=0x00007f328278abc0, colocated_table_id="", deadline=yb::CoarseTimePoint @ 0x00007f328278a9f8) at cdcsdk_producer.cc:1471:5
    frame #4: 0x000055cc3eee60d4 yb-tserver`yb::cdc::CDCServiceImpl::GetChanges(this=0x000055cc56c66020, req=0x000055cd1c491a20, resp=0x000055cd1c491a78, context=RpcContext @ 0x00007f328278afe0) at cdc_service.cc:1677:14
    frame #5: 0x000055cc3ef91352 yb-tserver`std::__1::__function::__func<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] yb::cdc::CDCServiceIf::InitMethods(this=<unavailable>, req=<unavailable>, resp=<unavailable>, rpc_context=RpcContext @ 0x00007f328278afa0)::$_3::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::cdc::GetChangesRequestPB const*, yb::cdc::GetChangesResponsePB*, yb::rpc::RpcContext)::operator()(yb::cdc::GetChangesRequestPB const*, yb::cdc::GetChangesResponsePB*, yb::rpc::RpcContext) const at cdc_service.service.cc:375:9
    frame #6: 0x000055cc3ef91314 yb-tserver`std::__1::__function::__func<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) at local_call.h:116:7
    frame #7: 0x000055cc3ef91002 yb-tserver`std::__1::__function::__func<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] yb::cdc::CDCServiceIf::InitMethods(this=<unavailable>, call=nullptr)::$_3::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>) const at cdc_service.service.cc:373:7
    frame #8: 0x000055cc3ef90f85 yb-tserver`std::__1::__function::__func<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] decltype(__f=<unavailable>, __args=<unavailable>)::$_3&>()(std::declval<std::__1::shared_ptr<yb::rpc::InboundCall>>())) std::__1::__invoke[abi:v15003]<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3&, std::__1::shared_ptr<yb::rpc::InboundCall>>(yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3&, std::__1::shared_ptr<yb::rpc::InboundCall>&&) at invoke.h:394:23
    frame #9: 0x000055cc3ef90f64 yb-tserver`std::__1::__function::__func<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call<yb::cdc::CDCServiceIf::InitMethods(__args=<unavailable>, __args=<unavailable>)::$_3&, std::__1::shared_ptr<yb::rpc::InboundCall>>(yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3&, std::__1::shared_ptr<yb::rpc::InboundCall>&&) at invoke.h:479:9
    frame #10: 0x000055cc3ef90f64 yb-tserver`std::__1::__function::__func<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) [inlined] std::__1::__function::__alloc_func<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator(this=<unavailable>, __arg=<unavailable>)[abi:v15003](std::__1::shared_ptr<yb::rpc::InboundCall>&&) at function.h:185:16
    frame #11: 0x000055cc3ef90f64 yb-tserver`std::__1::__function::__func<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3, std::__1::allocator<yb::cdc::CDCServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_3>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator(this=<unavailable>, __arg=<unavailable>)(std::__1::shared_ptr<yb::rpc::InboundCall>&&) at function.h:359:12
    frame #12: 0x000055cc3ef9383f yb-tserver`yb::cdc::CDCServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::__function::__value_func<void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator(this=<unavailable>, __args=nullptr)[abi:v15003](std::__1::shared_ptr<yb::rpc::InboundCall>&&) const at function.h:512:16
    frame #13: 0x000055cc3ef93820 yb-tserver`yb::cdc::CDCServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) [inlined] std::__1::function<void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator(this=<unavailable>, __arg=nullptr)(std::__1::shared_ptr<yb::rpc::InboundCall>) const at function.h:1197:12
    frame #14: 0x000055cc3ef93820 yb-tserver`yb::cdc::CDCServiceIf::Handle(this=<unavailable>, call=<unavailable>) at cdc_service.service.cc:313:3
    frame #15: 0x000055cc3fd039be yb-tserver`yb::rpc::ServicePoolImpl::Handle(this=0x000055cc569d6fc0, incoming=nullptr) at service_pool.cc:263:19
    frame #16: 0x000055cc3fc44f2f yb-tserver`yb::rpc::InboundCall::InboundCallTask::Run(this=<unavailable>) at inbound_call.cc:236:13
    frame #17: 0x000055cc3fd12333 yb-tserver`yb::rpc::(anonymous namespace)::Worker::Execute(this=0x000055cc751cc150) at thread_pool.cc:104:15
    frame #18: 0x000055cc403add5f yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator(this=0x000055cc595097f0)[abi:v15003]() const at function.h:512:16
    frame #19: 0x000055cc403add4c yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator(this=0x000055cc595097f0)() const at function.h:1197:12
    frame #20: 0x000055cc403add4c yb-tserver`yb::Thread::SuperviseThread(arg=0x000055cc595097a0) at thread.cc:800:3
    frame #21: 0x00007f32b2ca3694 libpthread.so.0`start_thread(arg=0x00007f3282793700) at pthread_create.c:333
    frame #22: 0x00007f32b31a541d libc.so.6`__clone at clone.S:109

Source connector version

Latest

1.9.5.y.16

Connector configuration

adding yb connector stream_id='cb1da6179bcd47e6addd79886baa43af' db_name='cdc_2c0482' connector_host='172.151.23.228' table_list=['test_cdc_193180', 'test_cdc_f4027b'] 2023-03-11 02:18:08,765:DEBUG: add connector

connector_name='ybconnector_cdc_2c0482_test_cdc_193180_test_cdc_f4027b' stream_id='cb1da6179bcd47e6addd79886baa43af' db_name='cdc_2c0482' connector_host='172.151.23.228' table_list=['test_cdc_193180', 'test_cdc_f4027b'] {'name': 'ybconnector_cdc_2c0482_test_cdc_193180_test_cdc_f4027b', 'config': {'connector.class': 'io.debezium.connector.yugabytedb.YugabyteDBConnector', 'database.hostname': '172.151.30.169', 'database.master.addresses': '172.151.23.72:7100,172.151.30.169:7100,172.151.17.106:7100', 'database.port': 5433, 'database.masterhost': '172.151.30.169', 'database.masterport': '7100', 'database.user': 'yugabyte', 'database.password': 'yugabyte', 'database.dbname': 'cdc_2c0482', 'database.server.name': 'db_cdc', 'database.streamid': 'cb1da6179bcd47e6addd79886baa43af', 'snapshot.mode': 'never', 'admin.operation.timeout.ms': 600000, 'socket.read.timeout.ms': 600000, 'max.connector.retries': '10', 'operation.timeout.ms': 600000, 'topic.creation.default.compression.type': 'lz4', 'topic.creation.default.cleanup.policy': 'delete', 'topic.creation.default.partitions': 2, 'topic.creation.default.replication.factor': '1', 'tasks.max': '5', 'table.include.list': 'public.test_cdc_193180,public.test_cdc_f4027b'}}

YugabyteDB version

2.17.3.0-b65

Warning: Please confirm that this issue does not contain any sensitive information

shamanthchandra-yb commented 1 year ago

This issue is fixed in latest master. Hence closed. @adithya-yb please let me know if this was open to track something.