redhat-openstack / puppet-pacemaker

Puppet modules to manage pacemaker with corosync
9 stars 25 forks source link

/etc/corosync/corosync.conf not handled #7

Closed KlavsKlavsen closed 10 years ago

KlavsKlavsen commented 10 years ago

so on CentOS 6, corosync service won't start, because it's missing.

It does create /etc/cluster/cluster.conf - so perhaps that should have been /etc/corosync/corosync.conf on el6 ?

KlavsKlavsen commented 10 years ago

It seems that it should use pcs tool to configure corosync.. but it creates /etc/cluster/cluster.conf instead of the required /etc/corosync/corosync.conf..

radez commented 10 years ago

This module was written compatible with EL 6.5. There were new features added to psc in 6.5 that made managing pacemaker much more puppet compatible. I think there was some pre-6.5 compatible code you could try at a6f7118d38741d39376d5bfeaabc43a904b30900

KlavsKlavsen commented 10 years ago

I ran it on 6.5 (with pcs) and it didn't generate corosync.conf - and as far as I can gather in the manifests, it doesn't handle that file - but corosync service on EL6.5 (which it tries to start) requires it.

radez commented 10 years ago

I don't have a corosync.conf on the cluster I built the other day. I don't think you should start corosync directly. Let pacemaker do it, so service pacemaker start should do everything you need.

KlavsKlavsen commented 10 years ago

I don't start corosync directly. pacemaker module installed it - and then I had only one failure from the puppet run.. that the corosync service wouldn't start. which lead to me discovering that corosync.conf was complaining about missinga corosync.conf.. once that was created (by me) - it ran just fine.

radez commented 10 years ago

hm, I'm not sure what's happening, I'll do a bit of investigation as see if I can figure out if there's a delta between RHEL and CentOS 6.5 that's causing this. I'll be in touch with you soon.

KlavsKlavsen commented 10 years ago

That's very kind of you. From what I can gather - there's nothing in the module that creates a corosync.conf, but mine wanted to start it (and as I understand it - I need corosync?).

The config I tried was this: class {"pacemaker::corosync": cluster_name => "group1", cluster_members => "192.168.10.235 192.168.10.236", } class {"pacemaker::resource::ip": ip_address => "192.168.10.232", group => "group11", }

}

KlavsKlavsen commented 10 years ago

and I added this, to create corosync.conf (using puppetlabs corosync module): class { 'corosync': enable_secauth => true,

authkey => '/var/lib/puppet/ssl/certs/ca.pem',

#authkey           => 'puppet:///profile/enableit/corosync-group1-authkey',
authkey           => '/etc/corosync/authkey',
bind_address      => $ipaddress,
multicast_address => '239.1.1.2',

}

which made things start. Wether or not it actually works, is yet to be tested :)

KlavsKlavsen commented 10 years ago

hmm. I just tested on a blank server - just to be sure there hadn't been anything weird on my part.

With the config above - it now does NOT complain.. (and btw. pcs cluster start - FAILS - if corosync service is already running - so perhaps you improve your module - by disabling and stopping corosync service).

it does however NOT list the ip resource that I have defined anywhere.. (in output of: ip addr list).. wasn't that supposed to setup the IP as well?

radez commented 10 years ago

Oh I think I see a possible issue You don't need the corosync module... corosync::service and cs_primitive are not in my pacemaker module. You can leave those out and use the resource defines to setup the vips and services...

I have a good example for you that I've been working on the past couple days on the TryStack openstack cluster. Let me commit it for you to see.

radez commented 10 years ago

So just use the pacemaker module and not the corosync module then do something like this: https://github.com/trystack/puppet-trystack/blob/master/manifests/highavailability.pp

you can see I'm managing qpid and haproxy, my mysql config is missing a parameter for the data-dir to go along with the shared storage, I'll be adding that in the next day or two.

let me know if this helps.

radez commented 10 years ago

One more.... I just used a "basic server" centos 6.5 install and did these steps to get the cluster started:

  1. install git
  2. clone the puppet-pacemaker module
  3. clone the puppetlabs-firewall module
  4. install the puppetlabs release repo
  5. install puppet
  6. puppet apply a file that looks like: (my host is 192.168.122.238) class {"pacemaker::corosync": cluster_name => "group1", cluster_members => "192.168.122.238 192.168.10.235 192.168.10.236", } pacemaker::resource::ip { "ip-192.168.10.232": ip_address => "192.168.10.232", }

when that finished I got this:

[root@localhost ~]# pcs status Cluster name: group1 WARNING: no stonith devices and stonith-enabled is not false Last updated: Tue Jan 14 21:32:20 2014 Last change: Tue Jan 14 21:32:16 2014 via cibadmin on 192.168.122.238 Stack: cman Current DC: 192.168.122.238 - partition WITHOUT quorum Version: 1.1.10-14.el6_5.1-368c726 3 Nodes configured 1 Resources configured

Node 192.168.10.235: UNCLEAN (offline) Node 192.168.10.236: UNCLEAN (offline) Online: [ 192.168.122.238 ]

Full list of resources: ip-192.168.10.232 (ocf::heartbeat:IPaddr2): Stopped

So stonith would have to be configured or disabled to get resources to start, but that seemed to work ok.

KlavsKlavsen commented 10 years ago

Thank you for your help.

Something isn't quite right.

pcs status

Stack: cman Current DC: 192.168.10.235 - partition with quorum Version: 1.1.10-14.el6_5.1-368c726 3 Nodes configured 1 Resources configured

Online: [ 192.168.10.235 192.168.10.236 ] OFFLINE: [ jacen.example.dk ]

Full list of resources:

Resource Group: group1 ip-192.168.10.232 (ocf::heartbeat:IPaddr2): Started 192.168.10.235

puppet config:

class {"pacemaker::corosync": cluster_name => "group1", cluster_members => "192.168.10.235 192.168.10.236", } class {"pacemaker::resource::ip": ip_address => "192.168.10.232", group => "group1", }

class {'pacemaker::stonith': disable => true, }

Could it be some "old config" which hasn't been properly removed somewhere? jacen.example.dk isn't mentioned in /etc/cluster/cluster.conf..

KlavsKlavsen commented 10 years ago

and now to figure out how to make it failover a drbd device.. :)

radez commented 10 years ago

cluster.conf isn't authoritative anymore, use pcs to remove it as a cluster host

radez commented 10 years ago

Let me know if drbd support is missing from the puppet module and we can get it added