t-matsuo / resource-agents

pgsql RA(ocf resource agent) for Pacemaker and PostgreSQL streaming replication. See https://github.com/t-matsuo/resource-agents/wiki
https://github.com/t-matsuo/resource-agents/wiki
GNU General Public License v2.0
118 stars 11 forks source link

cause data loss when using 3 nodes or higher #24

Closed t-matsuo closed 11 years ago

t-matsuo commented 11 years ago

When using 3 nodes or higher, rep_mode=sync causes data loss.

Scenario

node1 : Master (PRI) node2 : Slave (HS:sync) node3 : Slave (HS:potential)

  1. Network is broken on node2 -> PostgreSQL of node1 sends data to node3 and reports to client before updating attributes of node3
  2. node1 is broken -> Pacemaker promotes node2 because node2 is still HS:sync -> cause data loss
greenx commented 11 years ago

Why? Because when die noda1 - the quorum should be lost and all resource must be stoped.

t-matsuo commented 11 years ago

PostgreSQL can detect failure and switch synced node independently of pacemaker. Please see http://www.gossamer-threads.com/lists/linuxha/pacemaker/85077?do=post_view_threaded#85077

t-matsuo commented 11 years ago

fixed https://github.com/ClusterLabs/resource-agents/commit/55494b5052f540030938733ec4729cc37ac64a8c