hapostgres / pg_auto_failover

Postgres extension and service for automated failover and high-availability
Other
1.12k stars 115 forks source link

Conflicting key value violates exclusion constraint "same_system_identifier_within_group" - delete a system that was not registered correctly #787

Closed nenominal closed 1 year ago

nenominal commented 3 years ago

I had some issues with adding a new node and it seems that the data is registered in the pg_autofaiolver database. I failed to find where is located so I can delete it and try to add the node again.

Can someone point me to right direction?

2021-08-27 11:22:09.536 EEST [3212269] LOG:  Registering node 11 "volta" (volta.xxx.lan:5432) to formation "default" with replication quorum false and candidate priority 0 [0]
2021-08-27 11:22:09.536 EEST [3212269] STATEMENT:  SELECT * FROM pgautofailover.register_node($1, $2, $3, $4, $5, $6, $7, $8, $9::pgautofailover.replication_state, $10, $11, $12, $13)
2021-08-27 11:22:09.546 EEST [3212269] ERROR:  conflicting key value violates exclusion constraint "same_system_identifier_within_group"
2021-08-27 11:22:09.546 EEST [3212269] DETAIL:  Key (formationid, groupid, sysidentifier)=(default, 0, 7000752489655012967) conflicts with existing key (formationid, groupid, sysidentifier)=(default, 0, 6950947499912605210).
2021-08-27 11:22:09.546 EEST [3212269] STATEMENT:  COMMIT
2021-08-27 11:25:11.027 EEST [3213727] LOG:  Registering node 12 "volta" (volta.xxx.lan:5432) to formation "default" with replication quorum false and candidate priority 0 [0]
2021-08-27 11:25:11.027 EEST [3213727] STATEMENT:  SELECT * FROM pgautofailover.register_node($1, $2, $3, $4, $5, $6, $7, $8, $9::pgautofailover.replication_state, $10, $11, $12, $13)
2021-08-27 11:25:11.047 EEST [3213727] ERROR:  conflicting key value violates exclusion constraint "same_system_identifier_within_group"
2021-08-27 11:25:11.047 EEST [3213727] DETAIL:  Key (formationid, groupid, sysidentifier)=(default, 0, 7000752489655012967) conflicts with existing key (formationid, groupid, sysidentifier)=(default, 0, 6950947499912605210).
2021-08-27 11:25:11.047 EEST [3213727] STATEMENT:  COMMIT
DimCitus commented 3 years ago

Hi @nenominal ; thanks for your interest in pg_auto_failover. What happens here is that the new node "volta" with system identifier "7000752489655012967" is being registered in the formation "default" where another node has been registered already in the same group, and with system identifier "6950947499912605210".

In other words, you're trying to add a standby node to an existing primary node, and the standby node has its own PGDATA that is NOT a copy of the primary node's PGDATA. It was not done from a primary's backup, with the pg_basebackup tool, or with another way to create a physical copy of the primary data directory.

Please make sure that this node should be a standby in the same formation and group as the pre-existing node, and if that's your intention you could permanently remove the data directory on the "volta" node and then try again. I will see how to improve our error message there.

nenominal commented 3 years ago

Hi @DimCitus, thanks for taking the time to explain in detail.

I've found the issue, my mistake was that I forgot that I don't need to initialize the postgres db server before adding the node to the cluster.

I emptied the folders in the PGDATA and now I successfully added the new node.

Rookie mistake but thanks for clarifications, this is how I remembered :)

farooq9884 commented 8 months ago

Hi Dimcitus, Hope you are doing well! Thanks for the clarification.

Quick question:

  1. can we add existing normal replication setup which running on different servers to pg_autoctl cluster without breaking streaming replication
  2. Can we change monitor node name from "monitor" to some other some other

If yes, thn how can we achieve kindly put your comments on it ..

Thanks