Closed marcelmamula closed 8 months ago
Are there any ongoing efforts for SUSE support, or is this free range to pickup?
Not that I know of - @sean-freeman or @berndfinger might know.
ha_cluster and idempotency?
The role is not currently idempotent for all operations - not sure which are idempotent and which are not - I don't know if you can make the crmsh implementation idempotent without making the pcs implemetation idempotent
@richm I had started porting some code in line with the mentioned Ansible Role draft (keeping to the scope of ha_cluster
to accept config and execute, i.e. not adding any of the commands for a specific platform). However there were other priorities in the SAP LinuxLab Open-Source initiative that needed attention first, so I would probably classify this as free range.
I would probably suggest that ha_cluster
is psuedo-idempotent in the sense that, if you execute with the same variables you will arrive at the same outcome. However if on multiple runs some variable is changed, it will alter the config file, and the cibadmin will (without mercy or checks) replace any existing config with that new definition.
Thank you both for your comments, I will proceed with with work on my end to see how this can be implemented.
if you execute with the same variables you will arrive at the same outcome. However if on multiple runs some variable is changed, it will alter the config file, and the cibadmin will (without mercy or checks) replace any existing config with that new definition.
That's correct. When you run the role repeatedly with the same variables, you'll get the same outcome and the role should not even restart cluster daemons and thus it should not disrupt cluster operations. However, the role removes any cluster configuration not specified in the role variables. This applies to CIB, corosync and other cluster components. Changing some components' (corosync, sbd) configuration results in restarting those daemons.
@sean-freeman @tomjelinek @berndfinger Lot of idempotency is coming from fact that cluster config will be rebuilt using all those CIB tasks and then remain untouched if they are same.
How would this work combined with sap_install if you re-run them both @sean-freeman ? sap_ha_pacemaker_cluster is where all resources and pacemaker config is created and I dont see anywhere ha_cluster_resource_primitives or __sap_ha_pacemaker_cluster_resource_primitives are defined in examples, so I would assume it would result in CIB being replaced regardless.
If this destructive behavior in combination with sap_ha_pacemaker_cluster is expected, then there is not much value for us in implementing CIB steps under crmsh.
@tomjelinek Thinking about my comment / on re-visit to this GH Issue, it is a good idea to give the end-user a choice on how ha_cluster
handles config changes?
The following would warrant a separate GH Issue to be raised.
Perhaps "Feature Request: ha_cluster_config_change_then_halt
" would protect an end-user in case of config drift / the variables changed by accident (for whatever reason), and the end-user performs re-run but does not want to replace the working config. By default I imagine such a variable would be false
, to preserve the "psuedo-idempotence" and not create a breaking change.
In case such a feature exists already and I cannot see it, please excuse me and please correct me.
As @marcelmamula perceives, and other end-users may also, the behaviour of ha_cluster
could be considered "destructive" as there is no safety net in case any variable was changed.
@marcelmamula Best to move any continued discussion specific to SAP, into the sap_install
GH Issues or Discussions. Will answer here for posterity.
On it's own, the Ansible Role sap_ha_pacemaker_cluster
is relatively benign but does the heavy lifting on inputs/config that should be executed, and all the execution to configure Linux Pacemaker is done by Ansible Role ha_cluster
(at the moment some is done from pcs
PCMK Shell and other commands at the cib
PCMK homogenous binaries level).
The Ansible Role sap_ha_pacemaker_cluster
really does 2 things only:
ha_cluster
to be executed successfullyThe 'private' var (prefix __
) referenced __sap_ha_pacemaker_cluster_resource_primitives
, is generated on-the-fly based upon an end-user request to configure different SAP Software scenarios (SAP HANA, SAP NetWeaver ABAP ASCS/ERS etc), and a given target Infrastructure Platform (e.g. AWS, GCP, MS AZ, IBM Cloud, IBM PowerVC etc etc).
So the same applies.... if an end-user:
sap_ha_pacemaker_cluster
, then ha_cluster
is executed and nothing changes (as @tomjelinek confirms)sap_ha_pacemaker_cluster
, then ha_cluster
is executed and the new config pushed.... which could be destructive (that would depend on exactly what vars were altered).Thank you @sean-freeman for lengthy explanation even though I managed to get to explanations in meantime.
@tomjelinek @sean-freeman I was asking about destructiveness and idempotence mainly because whole cib creation sequence is using pcs working on temporary configuration xml, leaving whole cluster intact. This is not possible with SUSE crm unless you start doing some extensive jinja2 changes without running crm configure
But end is most likely same, because even crm_diff at end of cib build considers all changes as changes and pushes them as patch.
@marcelmamula Can you expand your explanation? From what I investigated (a long time ago), any pcs
command has an equal crmsh
command?
Looking in the repository, there are a limited number of pcs
shell commands in use. See equivilant below for crmsh
:
# PCS Shell # CRMSH Shell
pcs cluster crm node
pcs constraint <type> crm configure <type>
pcs property crm configure property
pcs qdevice crm cluster init qdevice
pcs quorum corosync-quorumtool / corosync-qnetd-tool
pcs resource crm ra
pcs status crm status
pcs stonith crm ra
Or do you mean it is not possible for crmsh
to create/export each command to XML configurations and build the whole configuration to push to cib
?
Yes @sean-freeman , My comment was mainly about -f to do changes in xml separate from live cluster cib.
@marcelmamula Isn't 'Shadow CRB usage' the equivalent ?
@sean-freeman You are correct, shadow cib is possible, but it is limited to interactive crm configure mode, because you have to do all changes before commit. I tried using it on command line as ansible would, but shadow was not kept after commands.
Update: I am testing with -c and it seems to be working, I will update if it is usable. crm configure cib new shd crm -c shd configure show
@tomjelinek I have started going through Contribution documentation and I have hit snag with tox availability on SUSE. Would it be fine with doing linting check and leave rest of tests for git actions when PR is submitted?
Also ha_cluster is not using FQDN module names so all new tasks I added contain them to avoid linting errors. Are you planning to switch to FQDN roles in future?
@marcelmamula As long as your code passes CI eventually, it doesn't really matter to me whether you test locally or use git actions in a PR.
I think that not using FQDN names in a role for its own modules is a system roles convention.
@richm Could you comment on these two questions?
@tomjelinek Thank you. My FQDN question was mainly about usual ansible.builtin modules that are triggering warning.
@sean-freeman
Perhaps "Feature Request: ha_cluster_config_change_then_halt" would protect an end-user in case of config drift / the variables changed by accident (for whatever reason), and the end-user performs re-run but does not want to replace the working config. By default I imagine such a variable would be false, to preserve the "psuedo-idempotence" and not create a breaking change.
In case such a feature exists already and I cannot see it, please excuse me and please correct me.
This looks like Ansible check mode to me. The ha_cluster role supports check mode, even though some tasks are impossible to be done in check mode. You also need to know a bit about how the role works, that it builds configuration in temp files and then pushes it to cluster / nodes.
@marcelmamula Sorry for being late to the pcs -f
/ crm -c
discussion. Does the crm -c
approach work for you?
I don't think you need to strictly follow pcs at all costs. Some parts of the role are very pcs specific, e.g. pcsd configuration. There was no demand for crmsh support when the role got started, so it wasn't designed with crmsh in mind.
@marcelmamula Sorry for being late to the
pcs -f
/crm -c
discussion. Does thecrm -c
approach work for you?I don't think you need to strictly follow pcs at all costs. Some parts of the role are very pcs specific, e.g. pcsd configuration. There was no demand for crmsh support when the role got started, so it wasn't designed with crmsh in mind.
@tomjelinek I have tested out multiple approaches and settled down with crm -c and crm_diff on cib vs shadow. It seems to be working good including rc == 1 check.
Example for cib group:
ansible.builtin.command:
cmd: |
crm -c {{ __ha_cluster_crm_shadow }} configure group
{{ resource_group.id | quote }} {% for resource in resource_group.resource_ids %}
{{ resource | quote }} {% endfor %} \
{% if resource_group.meta_attrs[0].attrs | default(False) %}
meta {% for attr in resource_group.meta_attrs[0].attrs -%}
{{ attr.name | quote }}={{ attr.value | quote }} {% endfor %}
{% endif %}
There are some pcs specific like tickets, sets and others so I will leave them as empty placeholders for now or remove completely wherever possible.
@tomjelinek I have started going through Contribution documentation and I have hit snag with tox availability on SUSE.
you can't install python pip
, then do pip install --user tox
, on SUSE?
Would it be fine with doing linting check and leave rest of tests for git actions when PR is submitted?
Yes, I suppose so, but it will be painful to debug, if you find some odd issue, and have to iterate repeatedly by pushing new code to a PR, and cannot reproduce the error locally.
Also ha_cluster is not using FQDN module names so all new tasks I added contain them to avoid linting errors. Are you planning to switch to FQDN roles in future?
In general, you can use the short name for Ansible built-in modules/plugins, and use the FQCN for non-built-in modules/plugins. The Ansible team have told me that short names for built-ins will be supported for the foreseeable future. If/when Ansible changes to require the use of FQCN for everything, we will have tooling to automatically convert everything, because we will have to do that for the other 29 system roles at the same time.
@tomjelinek Thank you. My FQDN question was mainly about usual ansible.builtin modules that are triggering warning.
That should not be happening with ansible-lint because we suppress that warning - https://github.com/linux-system-roles/ha_cluster/blob/main/.ansible-lint#L16 What is the message you are seeing? Is it from ansible-lint?
@richm @tomjelinek FQDN: It seems my lint config was inherited from top folder, which does not have fqcn-builtins. Do you have any issue with me keeping all new files with fqdn?
Tox/Testing:
pip3 tox can be installed fine, issue starts with Use yum or dnf to install standard-test-roles-inventory-qemu
since this is in fedora repo only. That package seems to be requirement to actually test using tox.
@richm @tomjelinek FQDN: It seems my lint config was inherited from top folder,
Not sure what you mean by that.
which does not have fqcn-builtins. Do you have any issue with me keeping all new files with fqdn?
Yes, you can use ansible.builtin.xxx
instead of xxx
. That should not cause any issues.
Tox/Testing: pip3 tox can be installed fine, issue starts with
Use yum or dnf to install
since this is in fedora repo only. That package seems to be requirement to actually test using tox.
It is a requirement to use qemu. But it is possible to install the qemu and kvm and seabios packages separately. So if you want to be able to run
tox -e qemu-ansible-core-2.16 -- --image-file /path/to/suse.qcow2 tests/tests_crmsh.yml
then you will need to install the qemu, kvm, seabios, whatever other packages you need to run qcow2 based VMs on your development platform.
You do not need standard-test-roles-inventory-qemu
in order to run tox -e collection,ansible-lint-collection
Thank you @sean-freeman @richm @tomjelinek for https://github.com/linux-system-roles/ha_cluster/pull/186. I will close this issue and further changes will go separately from this original issue.
Hello Team,
I am working on sap-linuxlab (community.sap_install) and our plan is to make sure that role sap_ha_pacemaker_cluster, which consumes fedora.linux_system_roles, is correctly working on SUSE systems.
I noticed that groundwork for adoption of non-pcs steps was already done thanks to Sean in #122, so adoption would consist of:
There are few things that I wanted to ask for clarification, before proceeding with any changes in fork: