sosreport / sos

A unified tool for collecting system logs and other debug information
http://sos.rtfd.org
GNU General Public License v2.0
513 stars 541 forks source link

[qdevice] New plugin to collect information on quorum devices used by Pacemaker clusters #3774

Closed ghost closed 2 months ago

ghost commented 2 months ago

New plugin to collect information on quorum devices used by Pacemaker clusters.

Most of the information needed to troubleshoot issues in that component is already collected by other plugins like logs, networking, or firewall so just one pcs qdevice... command is needed.

packit-as-a-service[bot] commented 2 months ago

Congratulations! One of the builds has completed. :champagne:

You can install the built RPMs by following these steps:

Please note that the RPMs should be used only in a testing environment.

arif-ali commented 2 months ago

there was no need to close the other PR, you could of squashed and force-pushed

Related: #3773

ghost commented 2 months ago

Yeah, I know... my bad... too many things on my plate right now.

TurboTurtle commented 2 months ago

This should just go into the existing pacemaker plugin, where other pcs commands are collected. I don't see why this would need to be a new plugin.

ghost commented 2 months ago

This should just go into the existing pacemaker plugin, where other pcs commands are collected. I don't see why this would need to be a new plugin.

This must be a new plugin because the system that hosts the quorum device is NOT part of the cluster. It is a separate system. The info collected in a cluster node by pacemaker plugin does not apply to a quorum device and vice versa.

jcastill commented 2 months ago

@jblancoes but the quorum device is indeed part of the pacemaker cluster. When you run pcs commands you expect to see information about the quorum device, i.e. if you run pcs quorum status - in fact we already capture this command in the pacemaker plugin, so it would be natural to capture pcs qdevice status net --full there as well. What's the actual advantage of having this single command in a separated plugin? Apart from the quorum being a different system?

The info collected in a cluster node by pacemaker plugin does not apply to a quorum device and vice versa.

And from the pacemaker plugin point of view, that's ok. In a quorum system, we'll capture whatever is present (files, commands, services, etc), and the same will happen in a "regular" node. And commands and files (like logs) that are in both types of cluster servers will be captured in both cases.

ghost commented 2 months ago

@jcastill pcs qdevice... commands don't work in cluster nodes because the corosync-qnetd component is not meant to be installed in the cluster nodes.

This is a quick example...

[root@fastvm-rhel-8-10-138 ~]# pcs qdevice status net --full Error: unable to run command /usr/bin/corosync-qnetd-tool -s -v: No such file or directory: '/usr/bin/corosync-qnetd-tool' [root@fastvm-rhel-8-10-138 ~]#

The same thing happens if you run regular cluster-related pcs whatever commands in a quorum device.

I am part of a team that supports the entire Pacemaker suite so I know what I am talking about. Do whatever you want... approve or reject the PR but I am done with this useless discussion.

arif-ali commented 2 months ago

Hi @jblancoes

We appreciate your comments and contribution to the project. So please be patient with our comments and reviews, we are not trying to be difficult here.

Let me give you an example of a similar plugin that could be triggered on 2 different type of systems, but are collected in the same place, openstack_nova. Now nova-conductor will have logs and commands that nova-compute will have or not, but we still have all the collection data in the same plugin. I am sure I can find others too.

Sos does not fail or cause problems if the command or files do not exist, it safely ignores them.

On that basis, as this command is from the pacemaker project itself, I will also have a +1 from on adding the package to the package tuple in the pacemaker plugin, as well as the command there.

I hope that will make sense.

ghost commented 2 months ago

Hello @arif-ali, I get your point but following that argument, how do you explain the corosync plugin? If you want to keep all the cluster-related components under the same umbrella, all info collected in corosync should be moved to pacemaker too.

I could buy your argument if you tell me to add it to the corosync plugin (the quorum device is not a Pacemaker component but a Corosync one) but your colleagues are stuck with pacemaker this and that... which in my opinion is wrong.

I have no energy to keep this useless discussion. You can reject the PR and we're done.

TurboTurtle commented 2 months ago

You can check the hostility at the door. Arif was trying to be even-keeled with you, but if you're going to remain petulant we'd rather not deal with it from the start.

SoS is a collaborative project, something I'd expect a Red Hatter to understand. We support over 300 distinct packages and projects, and we can't be expected to be intimately familiar with every niche piece of software. We aren't "stuck" on pacemaker, it was a suggestion based on being a pcs command, and had you made any attempt at explaining the alternative for corosync, this likely would be already merged, as corosync was my next suggestion along with Arif. At the end of the day if it truly fits nowhere else, a new plugin can be acceptable but we simply don't see whole new plugins for singular commands which is what makes this a bit unusual.

ghost commented 2 months ago

I can only thank @arif-ali for his constructive comment... he tried to understand what I said and made a good argument. I only said that corosync is a better fit than pacemaker for that command if you don't want to create a new plugin.

@TurboTurtle once again you are wrong... what a surprise... it is becoming the norm. It seems you like to insult people who try to collaborate on projects you are involved in.... what a great example of the kind of person you are. Am I petulant? better to be petulant than arrogant and ignorant like you. Please ignore my comments from now... I don't want or need to deal with people like you.

ghost commented 2 months ago

Please reject the PR... I no longer want to contribute to the project.

My apologies to @arif-ali and @jcastill if I have made you waste some time. Thanks for all.