krig / salt-module-crmsh

Salt Execution Module for Pacemaker clusters using crmsh
Apache License 2.0
2 stars 1 forks source link

dev: add configure_show function #1

Closed liangxin1300 closed 6 years ago

liangxin1300 commented 6 years ago

first try for salt crmsh module.

Regards, xin

liangxin1300 commented 6 years ago

hi, krig, does this demo make sense? If so, I will continue:) I think the process of mapping crmsh's API into salt is a good opportunity to find some closet logic problems and some space to improve for crmsh

Regards, xin

krig commented 6 years ago

Hmm, we should run tests for the module in some way. Maybe we can copy the container-based test runner for crmsh.

liangxin1300 commented 6 years ago

in commit dev: add configure_property function, I defined _dc() function, with it, some functions will reject to run if this minion is not DC node. Is it suitable doing this way?

liangxin1300 commented 6 years ago

Hi, krig, I think you can merge these commits firstly, and give patches on them directly, it will be more efficient than reviewing:)

krig commented 6 years ago

Yeah, I think the _dc check ought to work. I don't know if there can be a situation when there is no DC..

liangxin1300 commented 6 years ago

Hi, krig, after read some modules in salt like zypper.py, I add the commit : dev: change api defines I think this commit can make api rules more clear, and maybe more pythonic? :)

krig commented 6 years ago

I don't think checking for DC and failing like this is sufficient.

We need more complicated logic. We need the list of machines that the command is executed on, pick one, and only actually perform the command on that one. The others should only pretend to execute the command. Somehow we need to get the expanded minion list that the command is being executed on, so we can match that against the list of nodes in the cluster, deterministically pick one node from that list and only actually execute the command on that one. The others should somehow wait for that one to finish and then also report the same result.

The wait-and-finish part should be possible using the event bus in salt. Getting the list of minions, I don't know...

liangxin1300 commented 6 years ago

Hi, krig: I have read some codes of pcs and run with pcs in salt environment, and I found that in salt module, pcs never considered this problem in salt level. For example,

salt '*' pcs.resource_create resource_id='vip' resource_type='ocf:heartbeat:IPaddr2' resource_options="['ip=10.10.10.123']"

This command will be run at all nodes of cluster(I have two nodes: f26-1 and f26-2, DC is f26-1), sometimes, salt master complaint nothing; or sometimes, salt master said: Error: 'vip' already exists(one of nodes, seems random)

I wrote some print log in pcs code file, sometimes, both nodes even run at "etree.SubElement", correspond to complain nothing in salt master; sometimes, one node raise error after "does_id_exist" checking, correspond to the salt master second sense at above.

#################################### For crmsh, I used salt to run:

salt '*' cmd.run 'crm configure primitive id=dummy ocf:heartbeat:Dummy meta target-role=Stopped

This command will also run at both nodes. Sometimes, one node complaint nothing, the other one said: ERROR: Cannot create primitive with ID 'dummy': Found existing primitive with same ID. Sometimes, one node complaint nothing, the other one said: Call cib_apply_diff failed (-203): Update does not conform to the configured schema, and dumped the whole cib file(It's not elegant)

I traced the code, and found that at the second sense, both nodes passed through the exist id checking(existing = self.find_object(obj_id) in new_object, cibconfig.py).

I'm not sure whether there have some logic problems in crmsh itself, or at least we can supress cibdiff to dump the whole cib file, it's not elegant

Regards, xin

krig commented 6 years ago

I think the pcs module in salt is quite bad :) We can do better!

liangxin1300 commented 6 years ago

How about 'let minimum id node run command'? We can get minimum id from cibfile, but this policy maybe cause conflict with corosync quorum policy setting.

From my view, for most senses, I think checking DC to run the command is enough. We can use this way temporarily.

Krig, let's put focus on API definition, if these current defines make sense, please merge them firstly, then I can continue creating another PRs based on these commits.

krig commented 6 years ago

I don't know if the DC check as it works now makes sense. It puts the responsibility on the user to check which node is the DC and only run the command on that node.

I don't know what you mean by conflicting with corosync quorum policy?

liangxin1300 commented 6 years ago

It puts the responsibility on the user to check

I don't think so. In a valid HA cluster, user run commands on any node is OK; User don't need to remember whether use "salt 'node1' crmsh.command XXX" or "salt 'node2' ...", they can use "salt '*'" to manage cluster, because we already help user to select which node to run commands.

conflicting with corosync quorum policy

If we 'let minimum id node run command', how about corosync let the maximum nodeid node have quorum?:)

krig commented 6 years ago

Except if the other nodes report errors, that is very confusing and not acceptable in my opinion. It would only be OK if all nodes reported success or failure deterministically.

I still don't see how corosync quorum setting is a conflict, it has no direct relationship to CIB configuration?

liangxin1300 commented 6 years ago

I still don't see how corosync quorum setting is a conflict, it has no direct relationship to CIB configuration?

In split-brain situation, corosync let maximum node id got quorum, but salt will continue managing cluster node which only has minimum id, that will cause confict.

krig commented 6 years ago

Hrm. Yes, that is no good either.

liangxin1300 commented 6 years ago

I think these would be nice enough: (cluster have two nodes: node1 and node2) salt 'node1' crmsh.subcommand option1 xxx, then node1 run the command salt 'node2' crmsh.subcommand option1 xxx, then node2 run the command salt '*' crmsh.subcommand option1 xxx, then node1 or node2, just one node, run the command

but how to achieve this?:)

liangxin1300 commented 6 years ago

We can put roger, bin, nick or gao-yan into this group and talk about how to do these:)

It's time for me to go home, see you tomorrow!

krig commented 6 years ago

Yes, that is the question :) I have been reading the salt documentation to find a way to get the list of targetted nodes. If we had that, we could figure out which node to run the commands in the cluster.

But I haven't found a way to get it yet. Maybe the solution is to ask the salt developers.

krig commented 6 years ago

Hi Xin, so I did ask some salt experts. And I think I have been over-complicating the execution module. We should remove the DC check and just map the command directly to crmsh.

The best idea so far for making cluster configuration natural with salt is to develop a salt proxy minion which represents the whole cluster. So instead of targetting individual nodes as minions, one would target the whole cluster as a minion. It is then up to the proxy to apply the commands to the nodes in the cluster as appropriate.

liangxin1300 commented 6 years ago

Yeah, I will look at the document of salt proxy, learn how to implement it for HA cluster:)

krig commented 6 years ago

I'll merge this PR and remove the DC checks :)