yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.94k stars 1.06k forks source link

[yugabyted] After the master connection is lost, the tserver cannot be reconnected #24455

Open Jennyism opened 6 days ago

Jennyism commented 6 days ago

Jira Link: DB-13361

Description

Yugabyted starts the service successfully, and after a few seconds, the corresponding service goes down. Here's what happened when you started it and what went wrong: 00149d8555987bbd4951cbcf76c0789

The following is a log of when the service goes down.

[yugabyted start] 2024-10-15 14:39:49,871 INFO:  | 0.0s | cmd = start using config file: /home/yugabyte/yb_data/conf/yugabyted.conf (args.config=None)
[yugabyted start] 2024-10-15 14:39:49,872 INFO:  | 0.0s | Found directory /home/yugabyte/bin for file openssl_proxy.sh
[yugabyted start] 2024-10-15 14:39:49,872 INFO:  | 0.0s | Found directory /home/yugabyte/bin for file yb-admin
[yugabyted start] 2024-10-15 14:39:49,872 INFO:  | 0.0s | Starting first primary node. Using acd2f333-4214-421c-8ee1-1396c1bc3cba as placement_uuid
[yugabyted start] 2024-10-15 14:39:49,872 INFO:  | 0.0s | Starting yugabyted...
[yugabyted start] 2024-10-15 14:39:49,872 INFO:  | 0.0s | yugabyted started running with PID 7.
[yugabyted start] 2024-10-15 14:39:49,872 INFO:  | 0.0s | Found directory /home/yugabyte/bin for file yb-master
[yugabyted start] 2024-10-15 14:39:49,872 INFO:  | 0.0s | Found directory /home/yugabyte/bin for file yb-tserver
[yugabyted start] 2024-10-15 14:39:49,873 INFO:  | 0.0s | Found directory /home/yugabyte/bin for file post_install.sh
[yugabyted start] 2024-10-15 14:39:49,873 INFO:  | 0.0s | Running the post-installation script /home/yugabyte/bin/post_install.sh (may be a no-op)
[yugabyted start] 2024-10-15 14:39:49,887 INFO:  | 0.0s | Successfully ran the post-installation script.
[yugabyted start] 2024-10-15 14:39:49,887 INFO:  | 0.0s | About to start master with cmd /home/yugabyte/bin/yb-master --stop_on_parent_termination --undefok=stop_on_parent_termination --fs_data_dirs=/home/yugabyte/yb_data/data --webserver_interface=172.17.0.4 --metrics_snapshotter_tserver_metrics_whitelist=handler_latency_yb_tserver_TabletServerService_Read_count,handler_latency_yb_tserver_TabletServerService_Write_count,handler_latency_yb_tserver_TabletServerService_Read_sum,handler_latency_yb_tserver_TabletServerService_Write_sum,disk_usage,cpu_usage,node_up --yb_num_shards_per_tserver=1 --ysql_num_shards_per_tserver=1 --placement_cloud=cloud1 --placement_region=datacenter1 --placement_zone=rack1 --rpc_bind_addresses=172.17.0.4:7100 --server_broadcast_addresses=172.17.0.4:7100 --replication_factor=1 --use_initial_sys_catalog_snapshot --server_dump_info_path=/home/yugabyte/yb_data/data/master-info --master_enable_metrics_snapshotter=true --webserver_port=7000 --default_memory_limit_to_ram_ratio=0.35 --instance_uuid_override=d55c73de8dc44c82819f64147dca94ac --master_addresses=172.17.0.4:7100 --cluster_uuid=92bfcab2-6be9-438d-a253-419c25c2f15d
[yugabyted start] 2024-10-15 14:39:49,891 INFO:  | 0.0s | master started running with PID 16.
[yugabyted start] 2024-10-15 14:39:49,891 INFO:  | 0.0s | Node was a member of some cluster before. Skipping master setup
[yugabyted start] 2024-10-15 14:39:49,892 INFO:  | 0.0s | Got full master addrs list: ['172.17.0.4:7100']
[yugabyted start] 2024-10-15 14:39:49,892 INFO:  | 0.0s | Old master flag is: ['--tserver_master_addrs=172.17.0.4:7100'] and new master flag is: --tserver_master_addrs=172.17.0.4:7100
[yugabyted start] 2024-10-15 14:39:49,892 INFO:  | 0.0s | About to start tserver with cmd /home/yugabyte/bin/yb-tserver --stop_on_parent_termination --undefok=stop_on_parent_termination --fs_data_dirs=/home/yugabyte/yb_data/data --webserver_interface=172.17.0.4 --metrics_snapshotter_tserver_metrics_whitelist=handler_latency_yb_tserver_TabletServerService_Read_count,handler_latency_yb_tserver_TabletServerService_Write_count,handler_latency_yb_tserver_TabletServerService_Read_sum,handler_latency_yb_tserver_TabletServerService_Write_sum,disk_usage,cpu_usage,node_up --yb_num_shards_per_tserver=1 --ysql_num_shards_per_tserver=1 --placement_cloud=cloud1--placement_region=datacenter1 --placement_zone=rack1 --rpc_bind_addresses=172.17.0.4:9100 --server_broadcast_addresses=172.17.0.4:9100 --cql_proxy_bind_address=172.17.0.4:9042 --server_dump_info_path=/home/yugabyte/yb_data/data/tserver-info --start_pgsql_proxy --pgsql_proxy_bind_address=172.17.0.4:5433 --tserver_enable_metrics_snapshotter=true --metrics_snapshotter_interval_ms=11000 --webserver_port=9000 --default_memory_limit_to_ram_ratio=0.6 --instance_uuid_override=784fb38ec06a4317b664124d2b5bc531 --start_redis_proxy=false --placement_uuid=acd2f333-4214-421c-8ee1-1396c1bc3cba --tserver_master_addrs=172.17.0.4:7100
[yugabyted start] 2024-10-15 14:39:49,895 INFO:  | 0.0s | tserver started running with PID 17.
[yugabyted start] 2024-10-15 14:39:49,896 INFO:  | 0.0s | Node was a member of some cluster before. Skipping tserver setup
[yugabyted start] 2024-10-15 14:39:49,974 INFO:  | 0.1s | Found directory /home/yugabyte/bin for file yugabyted-ui
[yugabyted start] 2024-10-15 14:39:49,975 INFO:  | 0.1s | About to start yugabyted-ui with cmd /home/yugabyte/bin/yugabyted-ui -database_host=172.17.0.4 -master_ui_port=7000 -tserver_ui_port=9000 -warnings=transparent_hugepages|ntp/chrony|insecure
[yugabyted start] 2024-10-15 14:39:49,979 INFO:  | 0.1s | yugabyted-ui started running with PID 32.
[yugabyted start] 2024-10-15 14:39:50,082 INFO:  | 0.2s | Master address list updated, new list:
[yugabyted start] 2024-10-15 14:39:50,095 INFO:  | 0.2s | run_process: cmd: ['/home/yugabyte/bin/yb-admin', '--master_addresses', '', 'get_universe_config']
[yugabyted start] 2024-10-15 14:39:50,178 INFO:  | 0.3s | run_process returned 1:
OUT >>

<< ERR >>
Illegal state (yb/client/client-internal.cc:2593): Unable to establish connection to leader master at []. Please verify the addresses and check if server is up, orif you're missing --certs_dir_name.

: Could not locate the leader master: Unable to determine master addresses

<<
[yugabyted start] 2024-10-15 14:39:50,249 INFO:  | 0.4s | thread-uml: current masters ['']
[yugabyted start] 2024-10-15 14:39:50,250 INFO:  | 0.4s | thread-uml: Unable to query for all masters list, keeping masters list: ['']
[yugabyted start] 2024-10-15 14:39:58,742 INFO:  | 8.9s | Callhome failed: HTTP Error 405: Method Not Allowed
[yugabyted start] 2024-10-15 14:39:58,751 INFO:  | 8.9s | Callhome failed: HTTP Error 405: Method Not Allowed
[yugabyted start] 2024-10-15 14:40:08,755 ERROR:  | 18.9s | tserver died unexpectedly. Restarting...
[yugabyted start] 2024-10-15 14:40:08,755 INFO:  | 18.9s | Got full master addrs list: ['', '172.17.0.4:7100']
[yugabyted start] 2024-10-15 14:40:08,755 INFO:  | 18.9s | Old master flag is: ['--tserver_master_addrs=172.17.0.4:7100'] and new master flag is: --tserver_master_addrs=,172.17.0.4:7100
[yugabyted start] 2024-10-15 14:40:08,756 INFO:  | 18.9s | About to start tserver with cmd /home/yugabyte/bin/yb-tserver --stop_on_parent_termination --undefok=stop_on_parent_termination --fs_data_dirs=/home/yugabyte/yb_data/data --webserver_interface=172.17.0.4 --metrics_snapshotter_tserver_metrics_whitelist=handler_latency_yb_tserver_TabletServerService_Read_count,handler_latency_yb_tserver_TabletServerService_Write_count,handler_latency_yb_tserver_TabletServerService_Read_sum,handler_latency_yb_tserver_TabletServerService_Write_sum,disk_usage,cpu_usage,node_up --yb_num_shards_per_tserver=1 --ysql_num_shards_per_tserver=1 --placement_cloud=cloud1 --placement_region=datacenter1 --placement_zone=rack1 --rpc_bind_addresses=172.17.0.4:9100 --server_broadcast_addresses=172.17.0.4:9100 --cql_proxy_bind_address=172.17.0.4:9042 --server_dump_info_path=/home/yugabyte/yb_data/data/tserver-info --start_pgsql_proxy --pgsql_proxy_bind_address=172.17.0.4:5433 --tserver_enable_metrics_snapshotter=true --metrics_snapshotter_interval_ms=11000 --webserver_port=9000 --default_memory_limit_to_ram_ratio=0.6 --instance_uuid_override=784fb38ec06a4317b664124d2b5bc531 --start_redis_proxy=false --placement_uuid=acd2f333-4214-421c-8ee1-1396c1bc3cba --tserver_master_addrs=,172.17.0.4:7100
[yugabyted start] 2024-10-15 14:40:08,760 INFO:  | 18.9s | tserver started running with PID 291.
[yugabyted start] 2024-10-15 14:40:12,339 INFO:  | 22.5s | Callhome failed: HTTP Error 405: Method Not Allowed
[yugabyted start] 2024-10-15 14:40:50,310 INFO:  | 60.4s | thread-uml: current masters ['', '172.17.0.4:7100']
[yugabyted start] 2024-10-15 14:40:50,311 INFO:  | 60.4s | thread-uml: Unable to query for all masters list, keeping masters list: ['', '172.17.0.4:7100']
[yugabyted start] 2024-10-15 14:41:02,288 INFO:  | 72.4s | Callhome failed: HTTP Error 405: Method Not Allowed
[yugabyted start] 2024-10-15 14:41:50,372 INFO:  | 120.5s | thread-uml: current masters ['', '172.17.0.4:7100']
[yugabyted start] 2024-10-15 14:41:50,372 INFO:  | 120.5s | thread-uml: Unable to query for all masters list, keeping masters list: ['', '172.17.0.4:7100']
[yugabyted start] 2024-10-15 14:42:05,821 INFO:  | 136.0s | Callhome failed: HTTP Error 405: Method Not Allowed
[yugabyted start] 2024-10-15 14:42:50,432 INFO:  | 180.6s | thread-uml: current masters ['', '172.17.0.4:7100']
[yugabyted start] 2024-10-15 14:42:50,433 INFO:  | 180.6s | thread-uml: Unable to query for all masters list, keeping masters list: ['', '172.17.0.4:7100']
[yugabyted start] 2024-10-15 14:43:09,442 INFO:  | 199.6s | Callhome failed: HTTP Error 405: Method Not Allowed
[yugabyted start] 2024-10-15 14:43:50,493 INFO:  | 240.6s | thread-uml: current masters ['', '172.17.0.4:7100']
[yugabyted start] 2024-10-15 14:43:50,494 INFO:  | 240.6s | thread-uml: Unable to query for all masters list, keeping masters list: ['', '172.17.0.4:7100']
[yugabyted start] 2024-10-15 14:44:12,961 INFO:  | 263.1s | Callhome failed: HTTP Error 405: Method Not Allowed

Also, I found these logs: 562d97dd4cbac727e8038ed3b96a4bc

Warning: Please confirm that this issue does not contain any sensitive information

ddorian commented 6 days ago

Hi @Jennyism

Please don't write texts as screenshots. What version are you using?

Dengminer commented 6 days ago

@ddorian Version : 2.19.3.0-b140

ddorian commented 6 days ago

That version is a preview release and a bit old and not meant for production. Can you use a newer one? Either a stable release or a newer preview release that has fixes.

Dengminer commented 6 days ago

Hi @ddorian Will my data be affected after the new version? Can you help recommend a stable version?

ddorian commented 6 days ago

You should backup data before upgrading. For upgrades see: https://docs.yugabyte.com/preview/manage/upgrade-deployment/

On which version to use see https://docs.yugabyte.com/preview/releases/#recommended-release-series-for-projects

Dengminer commented 6 days ago

OK, thanks.