zettadb / cluster_mgr

Clust_mgr is an important compnent of KunlunBase. It provides a HTTP API for KunlunBase users to do cluster management, provisioning and monitor work, so that uses can install a cluster, a kunlun-server node, a storage shard or a kunlun-storage node by calling such APIs. Such capability enables users to integrate KunlunBase management and provisioning as part of their existing application or GUIs. Cluster_mgr also provide other important cluster maintenance background work to make sure the KunlunBase clusters it serves can work efficiently and reliably.
http://www.kunlunbase.com
Apache License 2.0
10 stars 2 forks source link

Crash of cluster_mgr on std::string::assign #21

Open jd-zhang opened 2 years ago

jd-zhang commented 2 years ago

Issue migrated from trac ticket # 793

component: cluster manager | priority: major

2022-06-02 13:44:14: zhangjindong@zettadb.com created the issue


stack:

(gdb) bt
#0  0x00007f785833ab0d in std::string::assign(std::string const&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x0000000000547c4c in kunlun_rbr::CAsyncMysql::ConnectImpl (this=0x7f77f4003420)
    at /home/kunlun/releasebuild/cluster_mgr/src/cluster_rbr/async_mysql.cc:347
#2  0x00000000005475ec in kunlun_rbr::CAsyncMysql::Connect (this=0x7f77f4003420, isReconnect=true)
    at /home/kunlun/releasebuild/cluster_mgr/src/cluster_rbr/async_mysql.cc:265
#3  0x000000000055bbcf in kunlun_rbr::NewConnStat (node=0x39ed420, mysql_mgr=0x7f77f4000b60)
    at /home/kunlun/releasebuild/cluster_mgr/src/cluster_rbr/refresh_shard.cc:74
#4  0x000000000055c800 in kunlun_rbr::CRefreshShard::HandleImpl (this=0x3b17360, shard=0x3aa11b0)
    at /home/kunlun/releasebuild/cluster_mgr/src/cluster_rbr/refresh_shard.cc:193
#5  0x0000000000565c9f in kunlun_rbr::CRefreshShard::run (this=0x3b17360)
    at /home/kunlun/releasebuild/cluster_mgr/src/cluster_rbr/refresh_shard.cc:1223
#6  0x000000000066a747 in kunlun::ZThread::threadEntry (handler=0x3b17360)
    at /home/kunlun/zettalib/src/zthread/zthread.cc:37
#7  0x00007f7858457609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8  0x00007f785800c133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) frame 1
#1  0x0000000000547c4c in kunlun_rbr::CAsyncMysql::ConnectImpl (this=0x7f77f4003420)
    at /home/kunlun/releasebuild/cluster_mgr/src/cluster_rbr/async_mysql.cc:347
347     /home/kunlun/releasebuild/cluster_mgr/src/cluster_rbr/async_mysql.cc: No such file or directory.
(gdb) p mysql_raw_
$1 = (kunlun_rbr::StatMysql *) 0x7f77f40044d0
(gdb) p *mysql_raw_
$2 # {status_kunlun_rbr::A_UNINITIALIZE, mysql_ # 0x7f77f4004670, result_0x0, ret_ # 0x0, err_0,
  row_ # 0x0, host_<error reading variable: Cannot access memory at address 0xffffffffffffffe8>,
  mysql_manager_ # 0x0, mysql_ares_0x7f77f4004580, pending_sql_ # {count0, start # 0x0, end0x0},
  list_lock_ # {_vptr.AtomicLock0x0, flag_ # {<std::__atomic_flag_base>{
        _M_i # false}, <No data fields>}}, mysql_timeout_3, mysql_rd_timeout_ # 3, mysql_wr_timeout_3,
  connect_type_ = kunlun::TCP_CONNECTION,
  socket_file_ = <error reading variable: Cannot access memory at address 0xffffffffffffffe8>,
  charset_ = <error reading variable: Cannot access memory at address 0xffffffffffffffe8>,
  user_ # 0x7f77f4004630 "clustmgr", passwd_0x7f77f4004650 "clustmgr_pwd", reconnect_ # 0, mysql_fd_0}
jd-zhang commented 2 years ago

2022-06-02 13:52:26: zhangjindong@zettadb.com

jd-zhang commented 2 years ago

2022-06-02 13:52:26: zhangjindong@zettadb.com

jd-zhang commented 2 years ago

2022-06-02 13:52:26: zhangjindong@zettadb.com commented


mysql_raw里面的 host, socket_file, charset_三个成员,都是类成员,这种成员在内存被calloc清0后,貌似依赖于具体编译器的处理,在cygwin和ubuntu下,我看挺正常的,能显示出这种

$1 # {str1{static npos = <optimized out>,
    _M_dataplus # {<std::allocator<char>>{<__gnu_cxx::new_allocator<char>> # {<No data fields>}, <No data fields>}, _M_p0x0}, _M_string_length # 0, {_M_local_buf'\000' <repeats 15 times>,
      _M_allocated_capacity # 0}}, str2{static npos = <optimized out>,
    _M_dataplus # { <std::allocator<char>>{<__gnu_cxx::new_allocator<char>> # {<No data fields>}, <No data fields>}, _M_p0x0}, _M_string_length # 0, {_M_local_buf'\000' <repeats 15 times>,
      _M_allocated_capacity = 0}} }

放到编译用的centos7机器下,编译后以后显示的内容就是内存不可访问, 类似

$1 # { str1<error reading variable: Cannot access memory at address 0xffffffffffffffe8>,
  str2 = <error reading variable: Cannot access memory at address 0xffffffffffffffe8> }
jd-zhang commented 2 years ago

2022-06-02 13:57:55: zhangjindong@zettadb.com commented


more gdb info:

(gdb) p mysql_raw_
$1 = (kunlun_rbr::StatMysql *) 0x7f77f40044d0
(gdb) p *mysql_raw_
$2 # {status_kunlun_rbr::A_UNINITIALIZE, mysql_ # 0x7f77f4004670, result_0x0, ret_ # 0x0, err_0,
  row_ # 0x0, host_<error reading variable: Cannot access memory at address 0xffffffffffffffe8>,
  mysql_manager_ # 0x0, mysql_ares_0x7f77f4004580, pending_sql_ # {count0, start # 0x0, end0x0},
  list_lock_ # {_vptr.AtomicLock0x0, flag_ # {<std::__atomic_flag_base>{
        _M_i # false}, <No data fields>}}, mysql_timeout_3, mysql_rd_timeout_ # 3, mysql_wr_timeout_3,
  connect_type_ = kunlun::TCP_CONNECTION,
  socket_file_ = <error reading variable: Cannot access memory at address 0xffffffffffffffe8>,
  charset_ = <error reading variable: Cannot access memory at address 0xffffffffffffffe8>,
  user_ # 0x7f77f4004630 "clustmgr", passwd_0x7f77f4004650 "clustmgr_pwd", reconnect_ # 0, mysql_fd_0}