Closed KlavsKlavsen closed 10 years ago
It seems that it should use pcs tool to configure corosync.. but it creates /etc/cluster/cluster.conf instead of the required /etc/corosync/corosync.conf..
This module was written compatible with EL 6.5. There were new features added to psc in 6.5 that made managing pacemaker much more puppet compatible. I think there was some pre-6.5 compatible code you could try at a6f7118d38741d39376d5bfeaabc43a904b30900
I ran it on 6.5 (with pcs) and it didn't generate corosync.conf - and as far as I can gather in the manifests, it doesn't handle that file - but corosync service on EL6.5 (which it tries to start) requires it.
I don't have a corosync.conf on the cluster I built the other day. I don't think you should start corosync directly. Let pacemaker do it, so service pacemaker start should do everything you need.
I don't start corosync directly. pacemaker module installed it - and then I had only one failure from the puppet run.. that the corosync service wouldn't start. which lead to me discovering that corosync.conf was complaining about missinga corosync.conf.. once that was created (by me) - it ran just fine.
hm, I'm not sure what's happening, I'll do a bit of investigation as see if I can figure out if there's a delta between RHEL and CentOS 6.5 that's causing this. I'll be in touch with you soon.
That's very kind of you. From what I can gather - there's nothing in the module that creates a corosync.conf, but mine wanted to start it (and as I understand it - I need corosync?).
The config I tried was this: class {"pacemaker::corosync": cluster_name => "group1", cluster_members => "192.168.10.235 192.168.10.236", } class {"pacemaker::resource::ip": ip_address => "192.168.10.232", group => "group11", }
}
and I added this, to create corosync.conf (using puppetlabs corosync module): class { 'corosync': enable_secauth => true,
#authkey => 'puppet:///profile/enableit/corosync-group1-authkey',
authkey => '/etc/corosync/authkey',
bind_address => $ipaddress,
multicast_address => '239.1.1.2',
}
which made things start. Wether or not it actually works, is yet to be tested :)
hmm. I just tested on a blank server - just to be sure there hadn't been anything weird on my part.
With the config above - it now does NOT complain.. (and btw. pcs cluster start - FAILS - if corosync service is already running - so perhaps you improve your module - by disabling and stopping corosync service).
it does however NOT list the ip resource that I have defined anywhere.. (in output of: ip addr list).. wasn't that supposed to setup the IP as well?
Oh I think I see a possible issue You don't need the corosync module... corosync::service and cs_primitive are not in my pacemaker module. You can leave those out and use the resource defines to setup the vips and services...
I have a good example for you that I've been working on the past couple days on the TryStack openstack cluster. Let me commit it for you to see.
So just use the pacemaker module and not the corosync module then do something like this: https://github.com/trystack/puppet-trystack/blob/master/manifests/highavailability.pp
you can see I'm managing qpid and haproxy, my mysql config is missing a parameter for the data-dir to go along with the shared storage, I'll be adding that in the next day or two.
let me know if this helps.
One more.... I just used a "basic server" centos 6.5 install and did these steps to get the cluster started:
when that finished I got this:
[root@localhost ~]# pcs status Cluster name: group1 WARNING: no stonith devices and stonith-enabled is not false Last updated: Tue Jan 14 21:32:20 2014 Last change: Tue Jan 14 21:32:16 2014 via cibadmin on 192.168.122.238 Stack: cman Current DC: 192.168.122.238 - partition WITHOUT quorum Version: 1.1.10-14.el6_5.1-368c726 3 Nodes configured 1 Resources configured
Node 192.168.10.235: UNCLEAN (offline) Node 192.168.10.236: UNCLEAN (offline) Online: [ 192.168.122.238 ]
Full list of resources: ip-192.168.10.232 (ocf::heartbeat:IPaddr2): Stopped
So stonith would have to be configured or disabled to get resources to start, but that seemed to work ok.
Thank you for your help.
Something isn't quite right.
Stack: cman Current DC: 192.168.10.235 - partition with quorum Version: 1.1.10-14.el6_5.1-368c726 3 Nodes configured 1 Resources configured
Online: [ 192.168.10.235 192.168.10.236 ] OFFLINE: [ jacen.example.dk ]
Full list of resources:
Resource Group: group1 ip-192.168.10.232 (ocf::heartbeat:IPaddr2): Started 192.168.10.235
class {"pacemaker::corosync": cluster_name => "group1", cluster_members => "192.168.10.235 192.168.10.236", } class {"pacemaker::resource::ip": ip_address => "192.168.10.232", group => "group1", }
class {'pacemaker::stonith': disable => true, }
Could it be some "old config" which hasn't been properly removed somewhere? jacen.example.dk isn't mentioned in /etc/cluster/cluster.conf..
and now to figure out how to make it failover a drbd device.. :)
cluster.conf isn't authoritative anymore, use pcs to remove it as a cluster host
Let me know if drbd support is missing from the puppet module and we can get it added
so on CentOS 6, corosync service won't start, because it's missing.
It does create /etc/cluster/cluster.conf - so perhaps that should have been /etc/corosync/corosync.conf on el6 ?