yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.99k stars 1.07k forks source link

Unable to run yugabyted #23101

Open cody151 opened 4 months ago

cody151 commented 4 months ago

Jira Link: DB-12034

Description

Hi there seems to be an issue on Ubuntu Server 22.04 when running the yugabyted start command

  File "/home/yas/yugabyte/yugabyte-2.21.1.0/./bin/yugabyted", line 4546, in setup_master
    master_uuids = retry_op_with_argument(self.get_master_uuids, master_addrs)
  File "/home/yas/yugabyte/yugabyte-2.21.1.0/./bin/yugabyted", line 8501, in retry_op_with_argument
    raise RuntimeError("Failed after retrying operation for {} secs.".format(
RuntimeError: Failed after retrying operation for 188.91273045539856 secs.

For more information, check the logs in /home/yas/var/logs
[yugabyted start] 2024-07-02 23:59:46,874 INFO:  | 189.1s | Shutting down...
[yugabyted start] 2024-07-03 00:06:55,322 INFO:  | 0.1s | Running yugabyted command: './bin/yugabyted start'
[yugabyted start] 2024-07-03 00:06:55,340 INFO:  | 0.1s | cmd = start using config file: /home/yas/var/conf/yugabyted.conf
[yugabyted start] 2024-07-03 00:06:55,340 INFO:  | 0.1s | Found directory /home/yas/yugabyte/yugabyte-2.21.1.0/bin for file openssl_proxy.sh
[yugabyted start] 2024-07-03 00:06:55,340 INFO:  | 0.1s | Found directory /home/yas/yugabyte/yugabyte-2.21.1.0/bin for file yb-admin
[yugabyted start] 2024-07-03 00:06:55,340 INFO:  | 0.1s | Starting first primary node. Using 9f0996ab-4954-4f5a-8596-3614cd56aa5d as placement_uuid
[yugabyted start] 2024-07-03 00:06:55,343 INFO:  | 0.1s | Starting yugabyted...
[yugabyted start] 2024-07-03 00:06:55,353 INFO:  | 0.1s | Daemon grandchild process begins execution.
[yugabyted start] 2024-07-03 00:06:55,368 INFO:  | 0.1s | yugabyted started running with PID 917.
[yugabyted start] 2024-07-03 00:06:55,369 INFO:  | 0.1s | Found directory /home/yas/yugabyte/yugabyte-2.21.1.0/bin for file yb-master
[yugabyted start] 2024-07-03 00:06:55,370 INFO:  | 0.1s | Found directory /home/yas/yugabyte/yugabyte-2.21.1.0/bin for file yb-tserver
[yugabyted start] 2024-07-03 00:06:55,371 INFO:  | 0.1s | Found files ['master-info', 'yb-data'] in data dir /home/yas/var/data from possibly failed initialization. Removing...
[yugabyted start] 2024-07-03 00:06:55,434 INFO:  | 0.2s | Changed RLIMIT_NOFILE from 1024 to 1048576
[yugabyted start] 2024-07-03 00:06:55,434 ERROR:  | 0.2s | Error changing RLIMIT_NPROC from 7398 to 12000: current limit exceeds maximum limit
[yugabyted start] 2024-07-03 00:06:55,438 INFO:  | 0.2s | Found directory /home/yas/yugabyte/yugabyte-2.21.1.0/bin for file post_install.sh
[yugabyted start] 2024-07-03 00:06:55,438 INFO:  | 0.2s | Running the post-installation script /home/yas/yugabyte/yugabyte-2.21.1.0/bin/post_install.sh (may be a no-op)
[yugabyted start] 2024-07-03 00:06:55,499 INFO:  | 0.2s | Successfully ran the post-installation script.
[yugabyted start] 2024-07-03 00:06:55,500 INFO:  | 0.2s | About to start master with cmd /home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-master --stop_on_parent_termination --undefok=stop_on_parent_termination --fs_data_dirs=/home/ya>[yugabyted start] 2024-07-03 00:06:55,540 INFO:  | 0.3s | master started running with PID 925.
[yugabyted start] 2024-07-03 00:06:55,541 ERROR:  | 0.3s | Failed to create symlink from /home/yas/var/data/yb-data/master/logs to /home/yas/var/logs/master
[yugabyted start] 2024-07-03 00:06:55,541 INFO:  | 0.3s | Waiting for master
[yugabyted start] 2024-07-03 00:06:55,541 INFO:  | 0.3s | run_process: cmd: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters']
[yugabyted start] 2024-07-03 00:07:05,576 INFO:  | 10.3s | run_process: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters'] timeout expired for command:
[yugabyted start] 2024-07-03 00:07:06,078 INFO:  | 10.8s | run_process: cmd: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters']
[yugabyted start] 2024-07-03 00:07:16,103 INFO:  | 20.8s | run_process: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters'] timeout expired for command:
[yugabyted start] 2024-07-03 00:07:16,604 INFO:  | 21.3s | run_process: cmd: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters']
[yugabyted start] 2024-07-03 00:07:26,617 INFO:  | 31.4s | run_process: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters'] timeout expired for command:
[yugabyted start] 2024-07-03 00:07:27,118 INFO:  | 31.9s | run_process: cmd: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters']
[yugabyted start] 2024-07-03 00:07:37,129 INFO:  | 41.9s | run_process: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters'] timeout expired for command:
[yugabyted start] 2024-07-03 00:07:37,631 INFO:  | 42.4s | run_process: cmd: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters']
[yugabyted start] 2024-07-03 00:07:47,652 INFO:  | 52.4s | run_process: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters'] timeout expired for command:
[yugabyted start] 2024-07-03 00:07:48,154 INFO:  | 52.9s | run_process: cmd: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters']
[yugabyted start] 2024-07-03 00:07:58,164 INFO:  | 62.9s | run_process: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters'] timeout expired for command:
[yugabyted start] 2024-07-03 00:07:58,666 INFO:  | 63.4s | run_process: cmd: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters']
[yugabyted start] 2024-07-03 00:08:08,677 INFO:  | 73.4s | run_process: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters'] timeout expired for command:
[yugabyted start] 2024-07-03 00:08:09,179 INFO:  | 73.9s | run_process: cmd: ['/home/yas/yugabyte/yugabyte-2.21.1.0/bin/yb-admin', '--master_addresses', '127.0.1.1:7100', 'list_all_masters']

Warning: Please confirm that this issue does not contain any sensitive information

ddorian commented 4 months ago

Hi @cody151

Please use the collect_logs command https://docs.yugabyte.com/preview/reference/configuration/yugabyted/#collect-logs and upload the zipped logs.

cody151 commented 4 months ago

Hi @cody151

Please use the collect_logs command https://docs.yugabyte.com/preview/reference/configuration/yugabyted/#collect-logs and upload the zipped logs.

Hi thanks for your reply, I tried this but doesn't seem to work

yas@zen-db-worker:~/yugabyte/yugabyte-2.21.1.0$ ./bin/yugabyted start --collect_logs
Error: unrecognized arguments: --collect_logs.
+--------------------------------------------------------------------------------------------------+
|                              Yugabyted CLI: YugabyteDB command line                              |
+--------------------------------------------------------------------------------------------------+
YugabyteDB command-line interface for creating and configuring YugabyteDB cluster.

Usage: yugabyted [command] [flags]

To start YugabyteDB cluster, run 'yugabyted start'.

Find more information at: https://docs.yugabyte.com/preview/reference/configuration/yugabyted/

Commands:
  start                   Start YugabyteDB cluster.
  stop                    Stop running YugabyteDB cluster.
  destroy                 Destroy YugabyteDB cluster and remove data.
  backup                  Back up a database.
  restore                 Restore a database.
  status                  Print status of YugabyteDB cluster.
  version                 Release version of YugabyteDB cluster.
  collect_logs            Collect and package logs for troubleshooting.
  connect                 Connect to YugabyteDB cluster through the CLI.
  demo                    Load and interact with preset demo data.
  cert                    Generate SSL certificates
  configure               Configure data placement, toggle encryption at rest or run point-in-time recovery operations on the cluster.
  configure_read_replica  Configure/Modify/Delete a read replica cluster.

Flags:
  -h, --help              show this help message and exit

Run 'yugabyted [command] -h' for help with specific commands.
yas@zen-db-worker:~/yugabyte/yugabyte-2.21.1.0$ ./bin/yugabyted start -collect_logs
Error: unrecognized arguments: -collect_logs.
+--------------------------------------------------------------------------------------------------+
|                              Yugabyted CLI: YugabyteDB command line                              |
+--------------------------------------------------------------------------------------------------+
YugabyteDB command-line interface for creating and configuring YugabyteDB cluster.

Usage: yugabyted [command] [flags]

To start YugabyteDB cluster, run 'yugabyted start'.

Find more information at: https://docs.yugabyte.com/preview/reference/configuration/yugabyted/

Commands:
  start                   Start YugabyteDB cluster.
  stop                    Stop running YugabyteDB cluster.
  destroy                 Destroy YugabyteDB cluster and remove data.
  backup                  Back up a database.
  restore                 Restore a database.
  status                  Print status of YugabyteDB cluster.
  version                 Release version of YugabyteDB cluster.
  collect_logs            Collect and package logs for troubleshooting.
  connect                 Connect to YugabyteDB cluster through the CLI.
  demo                    Load and interact with preset demo data.
  cert                    Generate SSL certificates
  configure               Configure data placement, toggle encryption at rest or run point-in-time recovery operations on the cluster.
  configure_read_replica  Configure/Modify/Delete a read replica cluster.

Flags:
  -h, --help              show this help message and exit

Run 'yugabyted [command] -h' for help with specific commands.
cody151 commented 4 months ago

Note: it's a completely fresh ubuntu server 22.04 installation, nothing else installed at all running as a VM in proxmox CPU Type set to "host", 2GB ram, 32gb disk space, 2 cores

ddorian commented 4 months ago

@cody151 it's a command yugabyted collect_logs, without - in front of it.

cody151 commented 4 months ago

@ddorian yas@zen-db-worker:~/yugabyte/yugabyte-2.21.1.0$ ./bin/yugabyted collect_logs ERROR: No YugabyteDB node is running in the data_dir /home/yas/var/data For more information, check the logs in /home/yas/var/logs

ddorian commented 4 months ago

Can you zip everything in /home/yas/var/logs and upload?

cody151 commented 4 months ago

Can you zip everything in /home/yas/var/logs and upload?

sure I'll do that, is there any sensitive info I need to cleanse?

ddorian commented 4 months ago

Nope.

cody151 commented 4 months ago

yb-var-logs.zip Please see the attached logs from /home/yas/var/logs @ddorian

ddorian commented 4 months ago

Can you fix:

Error changing RLIMIT_NPROC from 7398 to 12000: current limit exceeds maximum limit

See here how to fix https://docs.yugabyte.com/preview/deploy/manual-deployment/system-config/#ulimits

cody151 commented 4 months ago

Can you fix:

Error changing RLIMIT_NPROC from 7398 to 12000: current limit exceeds maximum limit

See here how to fix https://docs.yugabyte.com/preview/deploy/manual-deployment/system-config/#ulimits

Sure thanks, I'll try this now @ddorian I'm currently following this guide "https://docs.yugabyte.com/preview/quick-start/linux/" which made no mention of it but I did see that mentioned in other guides.

cody151 commented 4 months ago

@ddorian I've added the following in /etc/security/limits.conf (I also tried manually changing it with ulimit -n 12000)

*                -       core            unlimited
*                -       data            unlimited
*                -       fsize           unlimited
*                -       sigpending      119934
*                -       memlock         64
*                -       rss             unlimited
*                -       nofile          1048576
*                -       msgqueue        819200
*                -       stack           8192
*                -       cpu             unlimited
*                -       nproc           12000
*                -       locks           unlimited

Restarted the system but still see the error message

yas@zen-db-worker:~/yugabyte/yugabyte-2.21.1.0$ ./bin/yugabyted start
Starting yugabyted...
Found files ['master-info', 'yb-data'] in data dir /home/yas/var/data from possibly failed initialization. Removing...
/ Starting the YugabyteDB Processes...Failed to setup master. Exception: Traceback (most recent call last):
  File "/home/yas/yugabyte/yugabyte-2.21.1.0/./bin/yugabyted", line 4546, in setup_master
    master_uuids = retry_op_with_argument(self.get_master_uuids, master_addrs)
  File "/home/yas/yugabyte/yugabyte-2.21.1.0/./bin/yugabyted", line 8501, in retry_op_with_argument
    raise RuntimeError("Failed after retrying operation for {} secs.".format(
RuntimeError: Failed after retrying operation for 188.86116194725037 secs.

For more information, check the logs in /home/yas/var/logs

Please see the attached log files from /home/yas/var/logs yb-var-logs.zip

ddorian commented 4 months ago

Can you do yugabyted destroy and start again?

cody151 commented 4 months ago

Can you do yugabyted destroy and start again?

same issue

yas@zen-db-worker:~/yugabyte/yugabyte-2.21.1.0$ ./bin/yugabyted destroy
Deleted logs at /home/yas/var/logs.
Deleted data at /home/yas/var/data.
Deleted conf at /home/yas/var/conf.
yas@zen-db-worker:~/yugabyte/yugabyte-2.21.1.0$ ./bin/yugabyted start
Starting yugabyted...
| Starting the YugabyteDB Processes...Failed to setup master. Exception: Traceback (most recent call last):
  File "/home/yas/yugabyte/yugabyte-2.21.1.0/./bin/yugabyted", line 4546, in setup_master
    master_uuids = retry_op_with_argument(self.get_master_uuids, master_addrs)
  File "/home/yas/yugabyte/yugabyte-2.21.1.0/./bin/yugabyted", line 8501, in retry_op_with_argument
    raise RuntimeError("Failed after retrying operation for {} secs.".format(
RuntimeError: Failed after retrying operation for 188.9328351020813 secs.

For more information, check the logs in /home/yas/var/logs
cody151 commented 4 months ago

I'm confused because it is a fresh Ubuntu Server 22.04 installation with literally nothing else installed @ddorian

ddorian commented 4 months ago

In you master.err you have:

*** Aborted at 1720011181 (unix time) try "date -d @1720011181" if you are using GNU date ***
PC: @                0x0 (unknown)

Do you get a core dump generated?

cody151 commented 4 months ago

In you master.err you have:

*** Aborted at 1720011181 (unix time) try "date -d @1720011181" if you are using GNU date ***
PC: @                0x0 (unknown)

Do you get a core dump generated?

I'm not sure what that is, this is what I currently see in the directory @ddorian

yas@zen-db-worker:~/yugabyte/yugabyte-2.21.1.0$ ls
auto_flags.json  lib               openssl-config  pylib  tools              version_metadata.json  yb-var-logs.zip
bin              master_flags.xml  postgres        share  tserver_flags.xml  www
ddorian commented 4 months ago

Do some searches on core dump in your distro like example:

https://stackoverflow.com/questions/6152232/how-to-generate-core-dump-file-in-ubuntu

https://blog.meinside.dev/Where-are-my-Core-Dump-Files/

ddorian commented 4 months ago

Otherwise, try to run it manually if you get a better error: https://docs.yugabyte.com/preview/deploy/manual-deployment/

cody151 commented 4 months ago

Do some searches on core dump in your distro like example:

https://stackoverflow.com/questions/6152232/how-to-generate-core-dump-file-in-ubuntu

https://blog.meinside.dev/Where-are-my-Core-Dump-Files/

don't think this is the issue

yas@zen-db-worker:~/yugabyte/yugabyte-2.21.1.0$ ulimit -c
unlimited
yas@zen-db-worker:~/yugabyte/yugabyte-2.21.1.0$ ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 119934
max locked memory           (kbytes, -l) 64
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1048576
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 12000
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited
cody151 commented 4 months ago

Otherwise, try to run it manually if you get a better error: https://docs.yugabyte.com/preview/deploy/manual-deployment/

Is there any recommended Distro that it's confirmed working on. I'd rather not spend hours troubleshooting if there's an actual distro it's confirmed working on @ddorian

ddorian commented 4 months ago

That's a supported OS https://docs.yugabyte.com/preview/reference/configuration/operating-systems/.

But I'm not getting any error from you to really troubleshoot.

Sometimes users might have weird cpus or virtualization but there has to be an error.

ddorian commented 4 months ago

don't think this is the issue

Please read the full page. You didn't understand the meaning of "core dumps files generated and stored in the sytem".

cody151 commented 4 months ago

c

That's a supported OS https://docs.yugabyte.com/preview/reference/configuration/operating-systems/.

But I'm not getting any error from you to really troubleshoot.

Sometimes users might have weird cpus or virtualization but there has to be an error.

Yeah I mean it just doesn't seem to work whether I set it to x86 cpu type or "host" cpu type in proxmox VM settings, it is an intel Xeon server CPU, bios for the VM is "SeaBIOS"

ddorian commented 4 months ago

I wrote 2 ways we can continue to troubleshoot this:

  1. Reading a bit about core dumps and checking if they are created.

  2. Trying the manual deployment so we can see the error(s) better somehow.

Debugalicious commented 3 months ago

@csw2d I had a similar issue with my Proxmox containers, caused by incorrect locale settings and resource configuration. Here's how I fixed it:

First, generate the correct locale:

sudo locale-gen en_US.UTF-8
sudo update-locale

Then, make sure to use CPU Limit in your container's resources and leave Cores as unlimited. This is crucial because Proxmox handles resource allocation differently, and without this setting, it just won't work.

image

Finally, run yugabyted start with a clean base_dir and cross your fingers.