zettadb / cluster_mgr

Clust_mgr is an important compnent of KunlunBase. It provides a HTTP API for KunlunBase users to do cluster management, provisioning and monitor work, so that uses can install a cluster, a kunlun-server node, a storage shard or a kunlun-storage node by calling such APIs. Such capability enables users to integrate KunlunBase management and provisioning as part of their existing application or GUIs. Cluster_mgr also provide other important cluster maintenance background work to make sure the KunlunBase clusters it serves can work efficiently and reliably.
http://www.kunlunbase.com
Apache License 2.0
10 stars 2 forks source link

when rbr cluster add_nodes,cluster mgr return" get_cluster_shard_variable error" #26

Open jd-zhang opened 2 years ago

jd-zhang commented 2 years ago

Issue migrated from trac ticket # 725

component: cluster manager | priority: major

2022-05-23 17:06:34: hellen@zettadb.com created the issue


1.info of rbr cluster , {"comps":"1","cpu_cores":"8","dbcfg":"1","ha_mode":"rbr","innodb_size":"1","max_connections":"6","max_storage_size":"20","nodes":"3","shards":"2"}

2.add nodes { "version":"1.0", "job_id":"", "job_type":"add_nodes", "user_name":"kunlun_test", "timestamp":"202205131532", "paras":{ "cluster_name":"cluster_1653292418_000007", "shard_name":"shard1", "nodes":"2", "machinelist": [ {"hostaddr":"192.168.0.129"} ] } } 3.the log of cluster mgr: Mon May 23 15:55:44 2022 tid:0x20ed0c [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/http_server/http_server.cc:300 GenerateRequest]: Http post: { "version":"1.0", "job_id":"", "job_type":"add_nodes", "user_name":"kunlun_test", "timestamp":"202205131532", "paras":{ "cluster_name":"cluster_1653292418_000007", "shard_name":"shard1", "nodes":"2", "machinelist": [ {"hostaddr":"192.168.0.129"} ] } }

Mon May 23 15:55:44 2022 tid:0x20ed24 [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/cluster_mission/cluster_mission.cc:1557 addNodes]: add nodes start Mon May 23 15:55:44 2022 tid:0x20ed24 [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/cluster_mission/cluster_mission.cc:4061 get_user_name]: current user=kunlun Mon May 23 15:55:44 2022 tid:0x20ed24 [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/http_server/node_channel.cc:79 initNodeChannelMap]: Invalid Network address: pseudo_server_useless:NULL. Will ignore Mon May 23 15:55:44 2022 tid:0x20ed24 [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/cluster_mission/cluster_mission.cc:1632 addNodes]: task_num=8 Mon May 23 15:55:44 2022 tid:0x20ed24 [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/cluster_mission/cluster_mission.cc:1644 addNodes]: vec_machine.size()=1 Mon May 23 15:55:44 2022 tid:0x20ed24 [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/cluster_mission/cluster_mission.cc:1688 addNodes]: update machine path size start Mon May 23 15:55:44 2022 tid:0x20ed0d [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/request_framework/remoteTask.cc:28 CallBC]: General Remote Task CallBC(): task cluster_task_0 response from 192.168.0.129:59002 is: {"cluster_mgr_request_id":"14","info":{"path0":[{"free":616122,"path":"/home/kunlun/testmgr0.9.2/storage_datadir","used":8374}],"path1":[{"free":616122,"path":"/home/kunlun/testmgr0.9.2/storage_logdir","used":50}],"path2":[{"free":616122,"path":"/home/kunlun/testmgr0.9.2/storage_waldir","used":6528}],"path3":[{"free":616122,"path":"/home/kunlun/testmgr0.9.2/server_datadir","used":88}]},"status":"success","task_spec_info":"cluster_task_0"} Mon May 23 15:55:44 2022 tid:0x20ed0d [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/kl_mentain/machine_info.cc:155 update_machine_on_meta]: str_sql=UPDATE server_nodes_stats set datadir_used=8374,datadir_avail=616122,log_dir_used=50,log_dir_avail=616122,wal_log_dir_used=6528,wal_log_dir_avail=616122,comp_datadir_used=88,comp_datadir_avail=616122 where id=(select id from server_nodes where hostaddr='192.168.0.129') Mon May 23 15:55:44 2022 tid:0x20ed0d [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/cluster_mission/cluster_mission.cc:3316 backup_nodes]: backup shard working Mon May 23 15:55:44 2022 tid:0x20ed0d [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/request_framework/remoteTask.cc:100 TaskReportImpl]: Task Info Report, NotDefined Mon May 23 15:55:50 2022 tid:0x20ed0b [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/request_framework/remoteTask.cc:28 CallBC]: General Remote Task CallBC(): task cluster_task_1 response from 192.168.0.129:59002 is: {"cluster_mgr_request_id":"14","info":{"path":"/kunlun/backup/xtrabackup//shard1/_xtrabackup_coldfile_I192#168#0#129_P58608_D2022#05#23T15#55#49.tgz\n","shard_id":"13"},"status":"success","task_spec_info":"cluster_task_1"} Mon May 23 15:55:50 2022 tid:0x20ed0b [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/cluster_mission/cluster_mission.cc:492 AddNodesCallBack]: BACKUP_STORAGE task_incomplete = 0 Mon May 23 15:55:50 2022 tid:0x20ed0b [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/cluster_mission/cluster_mission.cc:3200 update_backup_nodes]: update_backup_nodes Mon May 23 15:55:50 2022 tid:0x20ed0d [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/kl_mentain/machine_info.cc:335 check_machine_port_idle]: check_machine_port_idle={"cluster_mgr_request_id":"check_port_idle","info":{"port":58626},"status":"success","task_spec_info":"check_port_idle"} Mon May 23 15:55:50 2022 tid:0x20ed0c [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/kl_mentain/machine_info.cc:335 check_machine_port_idle]: check_machine_port_idle={"cluster_mgr_request_id":"check_port_idle","info":{"port":58629},"status":"success","task_spec_info":"check_port_idle"} Mon May 23 15:55:52 2022 tid:0x20ed0c [ERROR] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/cluster_mission/cluster_mission.cc:2755 create_shard_nodes]: get_cluster_shard_variable error Mon May 23 15:55:52 2022 tid:0x20ed0c [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/cluster_mission/cluster_mission.cc:3243 update_backup_nodes]: install storage start Mon May 23 15:55:52 2022 tid:0x20ed0c [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/request_framework/remoteTask.cc:100 TaskReportImpl]: Task Info Report, NotDefined Mon May 23 15:55:52 2022 tid:0x20ed0c [INFO] [/home/kunlun/program_binaries/test_rbr/cluster_mgr_0513/src/request_framework/remoteTask.cc:100 TaskReportImpl]: Task Info Report, NotDefined

4.the log of node mgr Mon May 23 15:55:44 2022 tid:0x20ed78 [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/server_http/server_http.cc:113 Emit]: get original request from cluster_mgr: {"cluster_mgr_request_id":"14","job_type":"get_paths_space","paras":{"path0":"/home/kunlun/testmgr0.9.2/storage_datadir","path1":"/home/kunlun/testmgr0.9.2/storage_logdir","path2":"/home/kunlun/testmgr0.9.2/storage_waldir","path3":"/home/kunlun/testmgr0.9.2/server_datadir"},"task_spec_info":"cluster_task_0"} Mon May 23 15:55:44 2022 tid:0x20ed78 [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/instance_info.cc:780 get_path_free]: get_path_free str_cmd : df /home/kunlun/testmgr0.9.2/storage_datadir Mon May 23 15:55:44 2022 tid:0x20ed78 [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/instance_info.cc:747 get_path_used]: get_path_used str_cmd : du --max-depth=0 /home/kunlun/testmgr0.9.2/storage_datadir/instance_data/data_dir_path Mon May 23 15:55:44 2022 tid:0x20ed78 [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/instance_info.cc:780 get_path_free]: get_path_free str_cmd : df /home/kunlun/testmgr0.9.2/storage_logdir Mon May 23 15:55:44 2022 tid:0x20ed78 [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/instance_info.cc:747 get_path_used]: get_path_used str_cmd : du --max-depth=0 /home/kunlun/testmgr0.9.2/storage_logdir/instance_data/log_dir_path Mon May 23 15:55:44 2022 tid:0x20ed78 [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/instance_info.cc:780 get_path_free]: get_path_free str_cmd : df /home/kunlun/testmgr0.9.2/storage_waldir Mon May 23 15:55:44 2022 tid:0x20ed78 [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/instance_info.cc:747 get_path_used]: get_path_used str_cmd : du --max-depth=0 /home/kunlun/testmgr0.9.2/storage_waldir/instance_data/innodb_log_dir_path Mon May 23 15:55:44 2022 tid:0x20ed78 [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/instance_info.cc:780 get_path_free]: get_path_free str_cmd : df /home/kunlun/testmgr0.9.2/server_datadir Mon May 23 15:55:44 2022 tid:0x20ed78 [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/instance_info.cc:747 get_path_used]: get_path_used str_cmd : du --max-depth=0 /home/kunlun/testmgr0.9.2/server_datadir/instance_data/comp_datadir Mon May 23 15:55:44 2022 tid:0x20ed7b [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/server_http/server_http.cc:113 Emit]: get original request from cluster_mgr: {"cluster_mgr_request_id":"14","job_type":"backup_shard","paras":{"backup_storage":"hdfs://192.168.0.129:57030","cluster_name":"","ip":"192.168.0.129","port":58608,"shard_id":13,"shard_name":"shard1"},"task_spec_info":"cluster_task_1"} Mon May 23 15:55:44 2022 tid:0x20ed7b [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/job.cc:1233 job_backup_shard]: backup shard start Mon May 23 15:55:44 2022 tid:0x20ed7b [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/job.cc:1247 job_backup_shard]: job_backup_shard cmd backup -backuptype=storage -port=58608 -clustername= -shardname=shard1 -HdfsNameNodeService=hdfs://192.168.0.129:57030 Mon May 23 15:55:50 2022 tid:0x20ed7b [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/job.cc:1269 job_backup_shard]: popen: /home/kunlun/testmgr0.9.2/kunlun-node-manager-0.9.2/bin/data/backup-anonymousCluster-1653292544/coldback.tgz

Mon May 23 15:55:50 2022 tid:0x20ed7b [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/job.cc:1269 job_backup_shard]: popen: /kunlun/backup/xtrabackup//shard1/_xtrabackup_coldfile_I192#168#0#129_P58608_D2022#05#23T15#55#49.tgz

Mon May 23 15:55:50 2022 tid:0x20ed7b [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/job.cc:1306 job_backup_shard]: backup successfully Mon May 23 15:55:50 2022 tid:0x20ed7d [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/server_http/server_http.cc:113 Emit]: get original request from cluster_mgr: {"cluster_mgr_request_id":"check_port_idle","job_type":"check_port_idle","paras":{"port":58626,"step":3},"task_spec_info":"check_port_idle"} Mon May 23 15:55:50 2022 tid:0x20ed7d [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/instance_info.cc:968 check_port_idle]: check_port_idle str_cmd : netstat -anp | grep 58626 Mon May 23 15:55:50 2022 tid:0x20ed7d [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/instance_info.cc:968 check_port_idle]: check_port_idle str_cmd : netstat -anp | grep 58626 Mon May 23 15:55:50 2022 tid:0x20ed7d [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/instance_info.cc:968 check_port_idle]: check_port_idle str_cmd : netstat -anp | grep 58626 Mon May 23 15:55:50 2022 tid:0x20ed7f [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/server_http/server_http.cc:113 Emit]: get original request from cluster_mgr: {"cluster_mgr_request_id":"check_port_idle","job_type":"check_port_idle","paras":{"port":58629,"step":3},"task_spec_info":"check_port_idle"} Mon May 23 15:55:50 2022 tid:0x20ed7f [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/instance_info.cc:968 check_port_idle]: check_port_idle str_cmd : netstat -anp | grep 58629 Mon May 23 15:55:50 2022 tid:0x20ed7f [INFO] [/home/kunlun/program_binaries/test_rbr/node_mgr_0513/src/instance_info.cc:968 check_port_idle]: check_port_idle str_cmd : netstat -anp | grep 58629

ps:验证mgr是可以增加nodes成功

jd-zhang commented 2 years ago

2022-05-23 17:10:01: @chaojie1979 commented


mgr增加node, 和 rbr增加node 需要设置的参数不一样

jd-zhang commented 2 years ago

2022-05-24 10:09:07: @chaojie1979 changed owner from chaojie to barney