Failed to start Etcd Server

vitabaks / postgresql_cluster

Automated database platform for PostgreSQL® A modern, open-source alternative to cloud-managed databases.

https://postgresql-cluster.org

MIT License

1.83k stars 418 forks source link

Failed to start Etcd Server #809

Closed DevOps-Youtube-Channel closed 1 day ago

DevOps-Youtube-Channel commented 5 days ago

Hello During the installation of the etcd server I receive an error, I am attaching screenshots. Why does this happen?

VM-1-2-3 character: 16 GB RAM, 4 CPU, 20 GB SSD

vitabaks commented 5 days ago

Hi @DevOps-Youtube-Channel

Please attach etcd conf and log file

cat /etc/etcd/etcd.conf

sudo journalctl -u etcd | head -n 50

P.S. Do all VMs have unique hostnames?

DevOps-Youtube-Channel commented 4 days ago

yes you are right, all virtual machines have the same hostname: farrukh

DevOps-Youtube-Channel commented 4 days ago

One more question here: should I change the IP addresses and so on? In general, in which files should I change the values or something?

vitabaks commented 4 days ago

yes you are right, all virtual machines have the same hostname: farrukh

This is not the first time. I will add a pre-check so that the deployment does not start if the hostnames are the same.

vitabaks commented 4 days ago

One more question here: should I change the IP addresses and so on? In general, in which files should I change the values or something?

In inventory you describe the addresses of your servers and which components will be installed on which servers (e.q., etcd on a dedicated server or on database servers)

Try use the UI console if command-line control seems difficult for you. Doc: https://postgresql-cluster.org/docs/deployment/your-own-machines

DevOps-Youtube-Channel commented 4 days ago

for some reason the cluster is being created but there is nothing in operations, what is this connected with?

DevOps-Youtube-Channel commented 4 days ago

vitabaks commented 4 days ago

for some reason the cluster is being created but there is nothing in operations, what is this connected with?

Could you check for an entry in the operations table?

docker exec pg-console psql -U postgres -c "select count(*) from operations"

DevOps-Youtube-Channel commented 4 days ago

vitabaks commented 4 days ago

Please attach Console API log

docker exec pg-console cat /var/log/supervisor/pg-console-api-stdout.log | gzip > pg-console-api-stdout.log.gz

vitabaks commented 4 days ago

{"level":"error","app":"pg_console","version":"2.0.0","module":"docker_manager","cid":"c67b129f-a894-4fd2-b114-3031782192d3","error":"Post \"http://%2Fvar%2Frun%2Fdocker.sock/v1.45/images/create?fromImage=vitabaks%2Fpostgresql_cluster&tag=latest\": context canceled","docker_image":"vitabaks/postgresql_cluster:latest","time":"2024-11-18T07:56:45Z","message":"failed to pull docker image"} {"level":"error","app":"pg_console","version":"2.0.0","cid":"c67b129f-a894-4fd2-b114-3031782192d3","error":"context canceled","time":"2024-11-18T07:56:45Z","message":"failed to update cluster"}

failed to pull docker image

@DevOps-Youtube-Channel Perhaps there are problems with the Internet? The automation image did not load. Try to pull it manually:

docker pull vitabaks/postgresql_cluster:latest

P.S. I downloaded the log and deleted it from comments.

DevOps-Youtube-Channel commented 4 days ago

I use it too

DevOps-Youtube-Channel commented 3 days ago

yes it was a problem with the image since I deleted it then downloaded it back and everything worked

DevOps-Youtube-Channel commented 3 days ago

I just got one failed from 3 hosts

DevOps-Youtube-Channel commented 3 days ago

I also can’t find the haproxy and patroni services installed on the hosts

vitabaks commented 3 days ago

Judging by the log, the cluster deployment failed because the installation of packages was interrupted. Unfortunately, I don’t have the full log for a detailed error analysis, but I suspect there may be issues with repository access for downloading packages. Try to create the cluster again.

P.S. Please attach logs in text format in the future for easier analysis.

DevOps-Youtube-Channel commented 3 days ago

I did everything again from scratch, after line 444 I got this error: 192.168.217.135 this is my virtual machine that is running a docker container, most likely my computer simply does not have enough resources (I think RAM) they want for each VM I allocated 16 GB RAM, 4 CPU

DevOps-Youtube-Channel commented 3 days ago

Can you tell me which log file I should attach here?

vitabaks commented 3 days ago

There is an error at the time of creating a 4GB swap file. Do you have enough disk size on your VMs?

the log that you send as a screenshot does not allow us to see the entire text of the error. Just copy the text content of the log.

DevOps-Youtube-Channel commented 2 days ago

Everything was installed without failed, only one thing confuses me that I got it in one place, ignore is it critical? I'm attaching the error:

fatal: [192.168.217.134]: FAILED! => {"msg": "Timeout (62s) waiting for privilege escalation prompt: "} ...ignoring

DevOps-Youtube-Channel commented 2 days ago

Everything was installed without failed, only one thing confuses me that I got it in one place, ignore is it critical? I'm attaching the error:

fatal: [192.168.217.134]: FAILED! => {"msg": "Timeout (62s) waiting for privilege escalation prompt: "} ...ignoring

vitabaks commented 1 day ago

Everything was installed without failed, only one thing confuses me that I got it in one place, ignore is it critical?

The error occurred because Ansible timed out while waiting for a privilege escalation prompt on the host 192.168.217.134. However, the playbook is configured to ignore non-critical errors like this one (...ignoring) and continued the deployment.

The automation code is designed in such a way that any error critical to the functioning of the cluster will not be ignored. If the deployment completed successfully, then everything is fine.

DevOps-Youtube-Channel commented 1 day ago

Thanks a lot of