rasto / lcmc

Pacemaker/DRBD/KVM/LVM Cluster GUI
Other
102 stars 21 forks source link

Stuck on "Waiting for Pacemaker..." on Ubuntu 20.04. #76

Closed ak2766 closed 1 year ago

ak2766 commented 2 years ago

First and foremost - thanks @rasto - this is an awesome project - ❤!

I'm experimenting with Pacemaker/Corosync, DRBD, NFS, and IPaddr2.

I created a DRBD cluster manually and all is working. However, I figured there must be a more intuitive method of deployment as I may have to repeat this process many a times. I then went perusing the net for a GUI - this one has the most eye candy and as we all know, eye candy always wins - , imho. So I cloned the repo and built it from source and I can launch the interface.

Looks magnifique...

However, as you can see in the above image, it is stuck waiting for Pacemaker. Where do I start troubleshooting? I launched from the command and so far I see this in the console:

Console log ```shell $ java -jar ./target/packages/LCMC.jar DEBUG : [30s] lcmc.cluster.infrastructure.ssh.Authentication: authenticate: rsa key auth successful INFO : setConnected: 127.0.0.1: connection established DEBUG : [31s] lcmc.cluster.infrastructure.ssh.Authentication: authenticate: rsa key auth successful INFO : setConnected: 127.0.0.1: connection established DEBUG : [32s] lcmc.cluster.ui.ClusterBrowser: updateHeartbeatDrbdThread: load cluster DEBUG : [35s] lcmc.crm.domain.CrmXml: CRMXML: cluster loaded WARN : unknown unit: null WARN : unknown unit: WARN : unknown unit: Bytes WARN : unknown unit: Sectors DEBUG : [35s] lcmc.cluster.ui.ClusterBrowser: updateAvailableServices: start DEBUG : [36s] lcmc.cluster.ui.ClusterBrowser: updateServerStatus: k8s-storage-2.forticode.com loading done DEBUG : [36s] lcmc.cluster.ui.ClusterBrowser: updateServerStatus: k8s-storage-1.forticode.com loading done WARN : failure: awsk8s: Pacemaker status not available DEBUG : [40s] lcmc.crm.domain.CrmXml: CRMXML: RAs loaded ```

Also, while poking around, I see that it appears to look for logs in the wrong place - /var/log/messages instead of /var/log/syslog on Ubuntu.

Let me know how I can help troubleshoot. I'm willing to do whatever it takes - even video conference if need be.

Cheers, ak.

One other thing: I doubt it has anything to do with the issue, but in case it does, I'm using SSH Port forwarding since the nodes are running in AWS and are completely isolated from the internet barring the one instance that I'm using as a jump box.

ak2766 commented 2 years ago

Color me confused.

I've just finished creating a new cluster configuration for HAproxy and associated floating IP, also on Ubuntu 20.04.4LTS (w/ SSH tunneling as well), and this time everything came up just fine. Pacemaker has no complaints this time around!

I'll have to go back and look at the DRBD one and see if it's something I've misconfigured despite the cluster operating as it should - odd.

ak2766 commented 1 year ago

I thought I'd add a comment here for anyone else having this issue and wondering why one of the nodes never comes up.

When using LCMC to create the cluster, it uses short hostname - hostname -s - for ring0_address: in the nodelist section of the /etc/corosync/corosync.conf config file. This appears to cause an issue with Corosync not being able to determine which host is which being that there's also another requirement to have the short hostname associated with both 127.0.1.1 and the nodes routable address in the etc/hosts file.

/etc/hosts ``` ... node1 | node2 127.0.1.1 node1 | 127.0.1.1 node2 10.0.0.1 node1 | 10.0.0.1 node1 10.0.0.2 node2 | 10.0.0.2 node2 ```
Wrong config in /etc/corosync/corosync.conf ``` ... nodelist { node { nodeid: 1 ring0_addr: node1 } node { nodeid: 2 ring0_addr: node2 } } ```
Correct config in /etc/corosync/corosync.conf ``` ... nodelist { node { nodeid: 1 ring0_addr: 10.0.0.1 } node { nodeid: 2 ring0_addr: 10.0.0.2 } } ```

I hope this helps someone else in the future.