t-matsuo / resource-agents

pgsql RA(ocf resource agent) for Pacemaker and PostgreSQL streaming replication. See https://github.com/t-matsuo/resource-agents/wiki
https://github.com/t-matsuo/resource-agents/wiki
GNU General Public License v2.0
118 stars 11 forks source link

the postgresql on master node can not start up? #15

Closed xbzhang closed 12 years ago

xbzhang commented 12 years ago

configure: 2 machine,ip:192.168.4.104 and 192.168.4.105,host name: h105 and h104, master postgres on h105.

my configuer of pacemaker : node $id="01211558-f31b-445a-8a97-615bf30fec35" h104 \
attributes pgsql-data-status="DISCONNECT"
node $id="3ed6fbd6-567f-4104-8447-6ef620599369" h105 \
attributes pgsql-data-status="LATEST"
primitive postgresql ocf:heartbeat:pgsql \
params pgctl="/var/lib/pgsql/pginstall/bin/pg_ctl" psql="/var/lib/pgsql/pginstall/bin/psql" pgdata="/var/lib/pgsql/data" logfile="/var/lib/pgsql/data/postgresql.log" start_opt="-p 5432" pgdba="postgres" rep_mode="async" node_list="h105 h104" master_ip="192.168.4.105" \
op start interval="0s" timeout="60s" on-fail="restart" \
op monitor interval="7s" timeout="60s" on-fail="restart" \
op promote interval="0s" timeout="60s" on-fail="restart" \
op demote interval="0s" timeout="60s" on-fail="block" \
op stop interval="0s" timeout="60s" on-fail="block"
ms msPostgres postgresql \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
location rsc_location msPostgres \
rule $id="rsc_location-rule" $role="Master" 200: #uname eq h105 \
rule $id="rsc_location-rule-0" $role="Master" 100: #uname eq h104
property $id="cib-bootstrap-options" \
dc-version="1.0.11-9af47ddebcad19e35a61b2a20301dc038018e8e8" \
cluster-infrastructure="Heartbeat" \
crmd-transition-delay="0s" \
stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1341305646" \ expected-quorum-votes="2" \ symmetric-cluster="true" \ startup-fencing="false" rsc_defaults $id="rsc-options" \ resource-stickiness="INFINITY" \ migration-threshold="1"

start up pacemaker on h105 :/etc/init.d/the heartbeat start

then the following errors:

Last updated: Wed Jul 11 16:15:57 2012 Stack: Heartbeat Current DC: h105 (3ed6fbd6-567f-4104-8447-6ef620599369) - partition with quorum Version: 1.0.11-9af47ddebcad19e35a61b2a20301dc038018e8e8 2 Nodes configured, 2 expected votes

1 Resources configured.

Online: [ h105 ] OFFLINE: [ h104 ]

Node Attributes:

Failed actions: postgresql:0_start_0 (node=h105, call=3, rc=-2, status=Timed Out): unknown exec error

After look up the log of postgresql ,I found that the the postgresql on h105 started as standby mode.And there was the recovery.conf file in data directory. why the postgresql started as standby mode? How do I configure pacemaker to make postgresql start as master?

t-matsuo commented 12 years ago

Because

  1. Pacemkaer can't transit from Stopped to Master directly.
  2. The RA can't judge which node should be Master when PostgreSQL is stopped. So it checks data consistency and old and new in standby mode.
xbzhang commented 12 years ago

Thank you for your reply! In my case ,firstly,pacemaker will start up the postgresql with the slave,and then it promote postgresql to master.But the RA will automatically create the recovery.conf file when the postgresql was slave on h105, so the postgresql start up as recovery mode ,and it try to connect the primary postgresql, but the primay postgresql is itself. That was the result postgresql can not start up and was the result that pacemaker report the ERROR:" postgresql:0_start_0 (node=h105, call=3, rc=-2, status=Timed Out): unknown exec error". why dose RA automatically create the recovery.conf ?How to solve the problem that postgresql ,which shoud be master, connect to itself when it start up as slave mode?

t-matsuo commented 12 years ago

PostgreSQL can run as a standby independently and it can accepts read-only query. Sure it says that it cannot connect to primary, but it's no problem.

Do you succeed to construct Primary/Standby with replication manually ? Probably PostgreSQL needs manual construction once to run as a standby independetly.

xbzhang commented 12 years ago

The problem had been solved,thanks very much! I had succeed to construct Primary/Standby with replication manually.

crm_mon

Last updated: Thu Jul 12 17:06:56 2012 Stack: Heartbeat Current DC: h105 (3ed6fbd6-567f-4104-8447-6ef620599369) - partition with quorum Version: 1.0.11-9af47ddebcad19e35a61b2a20301dc038018e8e8 2 Nodes configured, 2 expected votes

1 Resources configured.

Online: [ h104 h105 ]

Master/Slave Set: msPostgres Masters: [ h105 ] Slaves: [ h104 ]

t-matsuo commented 12 years ago

It's my pleasure.