linux-system-roles / ha_cluster

Provide automation for Cluster - High Availability management
https://linux-system-roles.github.io/ha_cluster/
MIT License
18 stars 22 forks source link

ha_cluster

ansible-lint.yml ansible-test.yml codeql.yml markdownlint.yml python-unit-test.yml shellcheck.yml tft.yml tft_citest_bad.yml woke.yml

An Ansible role for managing High Availability Clustering.

Limitations

Requirements

See below

Collection requirements

The role requires the firewall role and the selinux role from the fedora.linux_system_roles collection, if ha_cluster_manage_firewall and ha_cluster_manage_selinux is set to true, respectively. Please see also ha_cluster_manage_firewall and ha_cluster_manage_selinux.

If the ha_cluster is a role from the fedora.linux_system_roles collection or from the Fedora RPM package, the requirement is already satisfied.

If you need to manage rpm-ostree systems, you will need to install additional collections. Please run the following command line to install the collections.

ansible-galaxy collection install -r meta/collection-requirements.yml

Role Variables

Defined in defaults/main.yml

ha_cluster_enable_repos

boolean, default: true

RHEL and CentOS only, enable repositories containing needed packages

ha_cluster_enable_repos_resilient_storage

boolean, default: false

RHEL and CentOS only, enable repositories containing resilient storage packages, such as dlm or gfs2. For this option to take effect, ha_cluster_enable_repos must be set to true.

ha_cluster_manage_firewall

boolean, default: false

Manage the firewall high-availability service as well as the fence-virt port. When ha_cluster_manage_firewall is true, the firewall high-availability service and fence-virt port are enabled. When ha_cluster_manage_firewall is false, the ha_cluster role does not manage the firewall.

NOTE: ha_cluster_manage_firewall is limited to adding ports. It cannot be used for removing ports. If you want to remove ports, you will need to use the firewall system role directly.

NOTE: The version of the ha_cluster role is 1.7.5 or older, the firewall was configured by default if the firewalld was available when the ha_cluster role was executed. In the newer version, it does not happen unless ha_cluster_manage_firewall is set to true.

ha_cluster_manage_selinux

boolean, default: false

Manage the ports belonging to the firewall high-availability service using the selinux role. When ha_cluster_manage_selinux is true, the ports belonging to the firewall high-availability service are associated with the selinux port type cluster_port_t. When ha_cluster_manage_selinux is false, the ha_cluster role does not manage the selinux.

NOTE: The firewall configuration is prerequisite for managing selinux. If the firewall is not installed, managing selinux policy is skipped.

NOTE: ha_cluster_manage_selinux is limited to adding policy. It cannot be used for removing policy. If you want to remove policy, you will need to use the selinux system role directly.

ha_cluster_cluster_present

boolean, default: true

If set to true, HA cluster will be configured on the hosts according to other variables. If set to false, all HA Cluster configuration will be purged from target hosts.

ha_cluster_start_on_boot

boolean, default: true

If set to true, cluster services will be configured to start on boot. If set to false, cluster services will be configured not to start on boot.

ha_cluster_install_cloud_agents

boolean, default: false

The role automatically installs needed HA Cluster packages. However, resource and fence agents for cloud environments are not installed by default on RHEL. If you need those to be installed, set this variable to true. Alternatively, you can specify those packages in ha_cluster_fence_agent_packages and ha_cluster_extra_packages variables.

ha_cluster_fence_agent_packages

list of fence agent packages to install, default: fence-agents-all, fence-virt

ha_cluster_extra_packages

list of additional packages to be installed, default: no packages

This variable can be used to install additional packages not installed automatically by the role, for example custom resource agents.

It is possible to specify fence agents here as well. However, ha_cluster_fence_agent_packages is preferred for that, so that its default value is overridden.

ha_cluster_use_latest_packages

boolean, default: false

If set to true, all packages will be installed with latest version. If set to false, existing packages will not be updated.

ha_cluster_hacluster_password

string, no default - must be specified

Password of the hacluster user. This user has full access to a cluster. It is recommended to vault encrypt the value, see https://docs.ansible.com/ansible/latest/user_guide/vault.html for details.

ha_cluster_hacluster_qdevice_password

string, no default - optional

Needed only if a ha_cluster_quorum is configured to use a qdevice of type net AND password of the hacluster user on the qdevice is different from ha_cluster_hacluster_password. This user has full access to a cluster. It is recommended to vault encrypt the value, see https://docs.ansible.com/ansible/latest/user_guide/vault.html for details.

ha_cluster_corosync_key_src

path to Corosync authkey file, default: null

Authentication and encryption key for Corosync communication. It is highly recommended to have a unique value for each cluster. The key should be 256 bytes of random data.

If value is provided, it is recommended to vault encrypt it. See https://docs.ansible.com/ansible/latest/user_guide/vault.html for details.

If no key is specified, a key already present on the nodes will be used. If nodes don't have the same key, a key from one node will be distributed to other nodes so that all nodes have the same key. If no node has a key, a new key will be generated and distributed to the nodes.

If this variable is set, ha_cluster_regenerate_keys is ignored for this key.

ha_cluster_pacemaker_key_src

path to Pacemaker authkey file, default: null

Authentication and encryption key for Pacemaker communication. It is highly recommended to have a unique value for each cluster. The key should be 256 bytes of random data.

If value is provided, it is recommended to vault encrypt it. See https://docs.ansible.com/ansible/latest/user_guide/vault.html for details.

If no key is specified, a key already present on the nodes will be used. If nodes don't have the same key, a key from one node will be distributed to other nodes so that all nodes have the same key. If no node has a key, a new key will be generated and distributed to the nodes.

If this variable is set, ha_cluster_regenerate_keys is ignored for this key.

ha_cluster_fence_virt_key_src

path to fence-virt or fence-xvm pre-shared key file, default: null

Authentication key for fence-virt or fence-xvm fence agent.

If value is provided, it is recommended to vault encrypt it. See https://docs.ansible.com/ansible/latest/user_guide/vault.html for details.

If no key is specified, a key already present on the nodes will be used. If nodes don't have the same key, a key from one node will be distributed to other nodes so that all nodes have the same key. If no node has a key, a new key will be generated and distributed to the nodes.

If this variable is set, ha_cluster_regenerate_keys is ignored for this key.

If you let the role to generate new key, you are supposed to copy the key to your nodes' hypervisor to ensure that fencing works.

ha_cluster_pcsd_public_key_src, ha_cluster_pcsd_private_key_src

path to pcsd TLS certificate and key, default: null

TLS certificate and private key for pcsd. If this is not specified, a certificate - key pair already present on the nodes will be used. If certificate - key pair is not present, a random new one will be generated.

If private key value is provided, it is recommended to vault encrypt it. See https://docs.ansible.com/ansible/latest/user_guide/vault.html for details.

If these variables are set, ha_cluster_regenerate_keys is ignored for this certificate - key pair.

ha_cluster_pcsd_certificates

If there is no pcsd private key and certificate, there are two ways to create them.

One way is by setting ha_cluster_pcsd_certificates variable. Another way is by setting none of ha_cluster_pcsd_public_key_src and ha_cluster_pcsd_private_key_src and ha_cluster_pcsd_certificates.

If ha_cluster_pcsd_certificates is provided, the certificate role is internally used and it creates the private key and certificate for pcsd as defined. If none of the variables are provided, the ha_cluster role will create pcsd certificates via pcsd itself.

The value of ha_cluster_pcsd_certificates is set to the variable certificate_requests in the certificate role. For more information, see the certificate_requests section in the certificate role documentation.

The default value is [].

NOTE: The certificate role, unless using IPA and joining the systems to an IPA domain, creates self-signed certificates, so you will need to explicitly configure trust, which is not currently supported by the system roles.

NOTE: When you set ha_cluster_pcsd_certificates, you must not set ha_cluster_pcsd_public_key_src and ha_cluster_pcsd_private_key_src variables.

NOTE: When you set ha_cluster_pcsd_certificates, ha_cluster_regenerate_keys is ignored for this certificate - key pair.

ha_cluster_regenerate_keys

boolean, default: false

If this is set to true, pre-shared keys and TLS certificates will be regenerated. See also: ha_cluster_corosync_key_src, ha_cluster_pacemaker_key_src, ha_cluster_fence_virt_key_src, ha_cluster_pcsd_public_key_src, ha_cluster_pcsd_private_key_src ha_cluster_pcsd_certificates

ha_cluster_pcs_permission_list

structure and default value:

ha_cluster_pcs_permission_list:
  - type: group
    name: haclient
    allow_list:
      - grant
      - read
      - write

This configures permissions to manage a cluster using pcsd. The items are as follows:

ha_cluster_cluster_name

string, default: my-cluster

Name of the cluster.

ha_cluster_transport

structure, default: no settings

ha_cluster_transport:
  type: knet
  options:
    - name: option1_name
      value: option1_value
    - name: option2_name
      value: option2_value
  links:
    -
      - name: option1_name
        value: option1_value
      - name: option2_name
        value: option2_value
    -
      - name: option1_name
        value: option1_value
      - name: option2_name
        value: option2_value
  compression:
    - name: option1_name
      value: option1_value
    - name: option2_name
      value: option2_value
  crypto:
    - name: option1_name
      value: option1_value
    - name: option2_name
      value: option2_value

For a list of allowed options, see pcs -h cluster setup or pcs(8) man page, section 'cluster', command 'setup'. For a detailed description, see corosync.conf(5) man page.

You may take a look at an example.

ha_cluster_totem

structure, default: no totem settings

ha_cluster_totem:
  options:
    - name: option1_name
      value: option1_value
    - name: option2_name
      value: option2_value

Corosync totem configuration. For a list of allowed options, see pcs -h cluster setup or pcs(8) man page, section 'cluster', command 'setup'. For a detailed description, see corosync.conf(5) man page.

You may take a look at an example.

ha_cluster_quorum

structure, default: no quorum settings

ha_cluster_quorum:
  options:
    - name: option1_name
      value: option1_value
    - name: option2_name
      value: option2_value
  device:
    model: string
    model_options:
      - name: option1_name
        value: option1_value
      - name: option2_name
        value: option2_value
    generic_options:
      - name: option1_name
        value: option1_value
      - name: option2_name
        value: option2_value
    heuristics_options:
      - name: option1_name
        value: option1_value
      - name: option2_name
        value: option2_value

Cluster quorum configuration. The items are as follows:

Quorum device options are documented in corosync-qdevice(8) man page; generic options are sync_timeout and timeout, for model net options check the quorum.device.net section, for heuristics options see the quorum.device.heuristics section.

To regenerate quorum device TLS certificate, set the ha_cluster_regenerate_keys variable to true.

You may take a look at a quorum example and a quorum device example.

ha_cluster_sbd_enabled

boolean, default: false

Defines whether to use SBD.

You may take a look at an example.

ha_cluster_sbd_options

list, default: []

List of name-value dictionaries specifying SBD options. See sbd(8) man page, section 'Configuration via environment' for their description. Supported options are:

You may take a look at an example.

Watchdog and SBD devices can be configured on a node to node basis in two variables:

ha_cluster_node_options

structure, default: no node options

ha_cluster_node_options:
  - node_name: node1
    pcs_address: node1-address
    corosync_addresses:
      - 192.168.1.11
      - 192.168.2.11
    sbd_watchdog_modules:
      - module1
      - module2
    sbd_watchdog_modules_blocklist:
      - module3
    sbd_watchdog: /dev/watchdog2
    sbd_devices:
      - /dev/disk/by-id/000001
      - /dev/disk/by-id/000002
      - /dev/disk/by-id/000003
    attributes:
      - attrs:
          - name: attribute1
            value: value1_node1
          - name: attribute2
            value: value2_node1
    utilization:
      - attrs:
          - name: utilization1
            value: value1_node1
          - name: utilization2
            value: value2_node1
  - node_name: node2
    pcs_address: node2-address:2224
    corosync_addresses:
      - 192.168.1.12
      - 192.168.2.12
    sbd_watchdog_modules:
      - module1
    sbd_watchdog_modules_blocklist:
      - module3
    sbd_watchdog: /dev/watchdog1
    sbd_devices:
      - /dev/disk/by-id/000001
      - /dev/disk/by-id/000002
      - /dev/disk/by-id/000003
    attributes:
      - attrs:
          - name: attribute1
            value: value1_node2
          - name: attribute2
            value: value2_node2
    utilization:
      - attrs:
          - name: utilization1
            value: value1_node2
          - name: utilization2
            value: value2_node2

This variable defines various settings which vary from cluster node to cluster node.

Note: Use an inventory or playbook hosts to specify which nodes form the cluster. This variable merely sets options for the specified nodes.

The items are as follows:

You may take a look at examples:

ha_cluster_cluster_properties

structure, default: no properties

ha_cluster_cluster_properties:
  - attrs:
      - name: property1_name
        value: property1_value
      - name: property2_name
        value: property2_value

List of sets of cluster properties - Pacemaker cluster-wide configuration. Currently, only one set is supported, so the first set is used and the rest are ignored.

You may take a look at an example.

ha_cluster_resource_primitives

structure, default: no resources

ha_cluster_resource_primitives:
  - id: resource-id
    agent: resource-agent
    instance_attrs:
      - attrs:
          - name: attribute1_name
            value: attribute1_value
          - name: attribute2_name
            value: attribute2_value
    meta_attrs:
      - attrs:
          - name: meta_attribute1_name
            value: meta_attribute1_value
          - name: meta_attribute2_name
            value: meta_attribute2_value
    copy_operations_from_agent: bool
    operations:
      - action: operation1-action
        attrs:
          - name: operation1_attribute1_name
            value: operation1_attribute1_value
          - name: operation1_attribute2_name
            value: operation1_attribute2_value
      - action: operation2-action
        attrs:
          - name: operation2_attribute1_name
            value: operation2_attribute1_value
          - name: operation2_attribute2_name
            value: operation2_attribute2_value
    utilization:
      - attrs:
          - name: utilization1_name
            value: utilization1_value
          - name: utilization2_name
            value: utilization2_value

This variable defines Pacemaker resources (including stonith) configured by the role. The items are as follows:

You may take a look at examples:

ha_cluster_resource_groups

structure, default: no resource groups

ha_cluster_resource_groups:
  - id: group-id
    resource_ids:
      - resource1-id
      - resource2-id
    meta_attrs:
      - attrs:
          - name: group_meta_attribute1_name
            value: group_meta_attribute1_value
          - name: group_meta_attribute2_name
            value: group_meta_attribute2_value

This variable defines resource groups. The items are as follows:

You may take a look at an example.

ha_cluster_resource_clones

structure, default: no resource clones

ha_cluster_resource_clones:
  - resource_id: resource-to-be-cloned
    promotable: true
    id: custom-clone-id
    meta_attrs:
      - attrs:
          - name: clone_meta_attribute1_name
            value: clone_meta_attribute1_value
          - name: clone_meta_attribute2_name
            value: clone_meta_attribute2_value

This variable defines resource clones. The items are as follows:

You may take a look at an example.

ha_cluster_resource_bundles

structure, default: no bundle resources

- id: bundle-id
  resource_id: resource-id
  container:
    type: container-type
    options:
      - name: container_option1_name
        value: container_option1_value
      - name: container_option2_name
        value: container_option2_value
  network_options:
      - name: network_option1_name
        value: network_option1_value
      - name: network_option2_name
        value: network_option2_value
  port_map:
    -
      - name: option1_name
        value: option1_value
      - name: option2_name
        value: option2_value
    -
      - name: option1_name
        value: option1_value
      - name: option2_name
        value: option2_value
  storage_map:
    -
      - name: option1_name
        value: option1_value
      - name: option2_name
        value: option2_value
    -
      - name: option1_name
        value: option1_value
      - name: option2_name
        value: option2_value
    meta_attrs:
      - attrs:
          - name: bundle_meta_attribute1_name
            value: bundle_meta_attribute1_value
          - name: bundle_meta_attribute2_name
            value: bundle_meta_attribute2_value

This variable defines resource bundles. The items are as follows:

Note, that the role does not install container launch technology automatically. However, you can install it by listing appropriate packages in ha_cluster_extra_packages variable.

Note, that the role does not build and distribute container images. Please, use other means to supply a fully configured container image to every node allowed to run a bundle depending on it.

You may take a look at an example.

ha_cluster_resource_defaults

structure, default: no resource defaults

ha_cluster_resource_defaults:
  meta_attrs:
    - id: defaults-set-1-id
      rule: rule-string
      score: score-value
      attrs:
        - name: meta_attribute1_name
          value: meta_attribute1_value
        - name: meta_attribute2_name
          value: meta_attribute2_value
    - id: defaults-set-2-id
      rule: rule-string
      score: score-value
      attrs:
        - name: meta_attribute3_name
          value: meta_attribute3_value
        - name: meta_attribute4_name
          value: meta_attribute4_value

This variable defines sets of resource defaults. You can define multiple sets of the defaults and apply them to resources of specific agents using rules. Note, that defaults do not apply to resources which override them with their own defined values.

Only meta attributes can be specified as defaults.

The items of each defaults set are as follows:

You may take a look at an example.

ha_cluster_resource_operation_defaults

structure, default: no resource operation defaults

This variable defines sets of resource operation defaults. You can define multiple sets of the defaults and apply them to resources of specific agents and / or specific resource operations using rules. Note, that defaults do not apply to resource operations which override them with their own defined values. Note, that by default, the role configures resources in such a way that they define their own values for resource operations. See copy_operations_from_agent in ha_cluster_resource_primitives for more information.

Only meta attributes can be specified as defaults.

The structure is the same as for ha_cluster_resource_defaults, except that rules are described in section resource op defaults set create of pcs(8) man page.

ha_cluster_stonith_levels

structure, default: no stonith levels

ha_cluster_stonith_levels:
  - level: 1..9
    target: node_name
    target_pattern: node_name_regular_expression
    target_attribute: node_attribute_name
    target_value: node_attribute_value
    resource_ids:
      - fence_device_1
      - fence_device_2
  - level: 1..9
    target: node_name
    target_pattern: node_name_regular_expression
    target_attribute: node_attribute_name
    target_value: node_attribute_value
    resource_ids:
      - fence_device_1
      - fence_device_2

This variable defines stonith levels, also known as fencing topology. They configure the cluster to use multiple devices to fence nodes. You may define alternative devices in case one fails, or require multiple devices to all be executed successfully in order to consider a node successfully fenced, or even a combination of the two.

The items are as follows:

ha_cluster_constraints_location

structure, default: no constraints

This variable defines resource location constraints. They tell the cluster which nodes a resource can run on. Resources can be specified by their ID or a pattern matching more resources. Nodes can be specified by their name or a rule.

Structure for constraints with resource ID and node name:

ha_cluster_constraints_location:
  - resource:
      id: resource-id
    node: node-name
    id: constraint-id
    options:
      - name: score
        value: score-value
      - name: option-name
        value: option-value

You may take a look at an example.

Structure for constraints with resource pattern and node name:

ha_cluster_constraints_location:
  - resource:
      pattern: resource-pattern
    node: node-name
    id: constraint-id
    options:
      - name: score
        value: score-value
      - name: resource-discovery
        value: resource-discovery-value

You may take a look at an example.

Structure for constraints with resource ID and a rule:

ha_cluster_constraints_location:
  - resource:
      id: resource-id
      role: resource-role
    rule: rule-string
    id: constraint-id
    options:
      - name: score
        value: score-value
      - name: resource-discovery
        value: resource-discovery-value

You may take a look at an example.

Structure for constraints with resource pattern and a rule:

ha_cluster_constraints_location:
  - resource:
      pattern: resource-pattern
      role: resource-role
    rule: rule-string
    id: constraint-id
    options:
      - name: score
        value: score-value
      - name: resource-discovery
        value: resource-discovery-value

You may take a look at an example.

ha_cluster_constraints_colocation

structure, default: no constraints

This variable defines resource colocation constraints. They tell the cluster that the location of one resource depends on the location of another one. There are two types of colocation constraints: a simple one for two resources, and a set constraint for multiple resources.

Structure for simple constraints:

ha_cluster_constraints_colocation:
  - resource_follower:
      id: resource-id1
      role: resource-role1
    resource_leader:
      id: resource-id2
      role: resource-role2
    id: constraint-id
    options:
      - name: score
        value: score-value
      - name: option-name
        value: option-value

You may take a look at an example.

Structure for set constraints:

ha_cluster_constraints_colocation:
  - resource_sets:
      - resource_ids:
          - resource-id1
          - resource-id2
        options:
          - name: option-name
            value: option-value
    id: constraint-id
    options:
      - name: score
        value: score-value
      - name: option-name
        value: option-value

You may take a look at an example.

ha_cluster_constraints_order

structure, default: no constraints

This variable defines resource order constraints. They tell the cluster the order in which certain resource actions should occur. There are two types of order constraints: a simple one for two resources, and a set constraint for multiple resources.

Structure for simple constraints:

ha_cluster_constraints_order:
  - resource_first:
      id: resource-id1
      action: resource-action1
    resource_then:
      id: resource-id2
      action: resource-action2
    id: constraint-id
    options:
      - name: score
        value: score-value
      - name: option-name
        value: option-value

You may take a look at an example.

Structure for set constraints:

ha_cluster_constraints_order:
  - resource_sets:
      - resource_ids:
          - resource-id1
          - resource-id2
        options:
          - name: option-name
            value: option-value
    id: constraint-id
    options:
      - name: score
        value: score-value
      - name: option-name
        value: option-value

You may take a look at an example.

ha_cluster_constraints_ticket

structure, default: no constraints

This variable defines resource ticket constraints. They let you specify the resources depending on a certain ticket. There are two types of ticket constraints: a simple one for two resources, and a set constraint for multiple resources.

Structure for simple constraints:

ha_cluster_constraints_ticket:
  - resource:
      id: resource-id
      role: resource-role
    ticket: ticket-name
    id: constraint-id
    options:
      - name: loss-policy
        value: loss-policy-value
      - name: option-name
        value: option-value

You may take a look at an example.

Structure for set constraints:

ha_cluster_constraints_ticket:
  - resource_sets:
      - resource_ids:
          - resource-id1
          - resource-id2
        options:
          - name: option-name
            value: option-value
    ticket: ticket-name
    id: constraint-id
    options:
      - name: option-name
        value: option-value

You may take a look at an example.

ha_cluster_acls

structure, default: no ACLs

ha_cluster_acls:
  acl_roles:
    - id: role-id-1
      description: role description
      permissions:
        - kind: access-type
          xpath: XPath expression
        - kind: access-type
          reference: cib-element-id
    - id: role-id-2
      permissions:
        - kind: access-type
          xpath: XPath expression
  acl_users:
    - id: user-name
      roles:
        - role-id-1
        - role-id-2
  acl_groups:
    - id: group-name
      roles:
        - role-id-2

This variable defines ACLs roles, users and groups.

The items of acl_roles are as follows:

The items of acl_users are as follows:

The items of acl_groups are as follows:

Note: Configure cluster property enable-acl to enable ACLs in the cluster:

ha_cluster_cluster_properties:
  - attrs:
      - name: enable-acl
        value: 'true'

You may take a look at an example.

ha_cluster_alerts

structure, default: no alerts

ha_cluster_alerts:
  - id: alert1
    path: /alert1/path
    description: Alert1 description
    instance_attrs:
      - attrs:
        - name: alert_attr1_name
          value: alert_attr1_value
    meta_attrs:
      - attrs:
        - name: alert_meta_attr1_name
          value: alert_meta_attr1_value
    recipients:
      - value: recipient_value
        id: recipient1
        description: Recipient1 description
        instance_attrs:
          - attrs:
            - name: recipient_attr1_name
              value: recipient_attr1_value
        meta_attrs:
          - attrs:
            - name: recipient_meta_attr1_name
              value: recipient_meta_attr1_value

This variable defines Pacemaker alerts.

The items of alerts are as follows:

The items of recipients are as follows:

Note: The role configures the cluster to call external programs to handle alerts. It is your responsibility to provide the programs and distribute them to cluster nodes.

You may take a look at an example.

ha_cluster_qnetd

structure and default value:

ha_cluster_qnetd:
  present: boolean
  start_on_boot: boolean
  regenerate_keys: boolean

This configures a qnetd host which can then serve as an external quorum device for clusters. The items are as follows:

Note that you cannot run qnetd on a cluster node as fencing would disrupt qnetd operation.

You may take a look at an example.

Inventory

Nodes' names and addresses

Nodes' names and addresses can be configured in ha_cluster variable, for example in inventory. This is optional. Addresses configured in ha_cluster_node_options override those configured in ha_cluster. If no names or addresses are configured, play's targets will be used.

Example inventory with targets node1 and node2:

all:
  hosts:
    node1:
      ha_cluster:
        node_name: node-A
        pcs_address: node1-address
        corosync_addresses:
          - 192.168.1.11
          - 192.168.2.11
    node2:
      ha_cluster:
        node_name: node-B
        pcs_address: node2-address:2224
        corosync_addresses:
          - 192.168.1.12
          - 192.168.2.12

SBD watchdog and devices

When using SBD, you may optionally configure watchdog and SBD devices for each node in ha_cluster variable, for example in inventory. Even though all SBD devices must be shared to and accessible from all nodes, each node may use different names for the devices. The loaded watchdog modules and used devices may also be different for each node. SBD settings defined in ha_cluster_node_options override those defined in ha_cluster. See also SBD variables.

Example inventory with targets node1 and node2:

all:
  hosts:
    node1:
      ha_cluster:
        sbd_watchdog_modules:
          - module1
          - module2
        sbd_watchdog: /dev/watchdog2
        sbd_devices:
          - /dev/disk/by-id/000001
          - /dev/disk/by-id/000001
          - /dev/disk/by-id/000003
    node2:
      ha_cluster:
        sbd_watchdog_modules:
          - module1
        sbd_watchdog_modules_blocklist:
          - module2
        sbd_watchdog: /dev/watchdog1
        sbd_devices:
          - /dev/disk/by-id/000001
          - /dev/disk/by-id/000002
          - /dev/disk/by-id/000003

Example Playbooks

Following examples show what the structure of the role variables looks like. They are not guides or best practices for configuring a cluster.

Configuring firewall and selinux using each role

To run ha_cluster properly, the ha_cluster ports need to be configured for firewalld and the SELinux policy as shown in this example. Although they are omitted in each example playbook, we highly recommend to set them to true in your playbooks using the ha_cluster role.

- name: Manage HA cluster and firewall and selinux
  hosts: node1 node2
  vars:
    ha_cluster_manage_firewall: true
    ha_cluster_manage_selinux: true

  roles:
    - linux-system-roles.ha_cluster

Creating pcsd TLS cert and key files using the certificate role

This example creates self-signed pcsd certificate and private key files in /var/lib/pcsd with the file name FILENAME.crt and FILENAME.key, respectively.

- name: Manage HA cluster with certificates
  hosts: node1 node2
  vars:
    ha_cluster_pcsd_certificates:
      - name: FILENAME
        common_name: "{{ ansible_hostname }}"
        ca: self-sign
  roles:
    - linux-system-roles.ha_cluster

Creating a cluster running no resources

- name: Manage HA cluster with no resources
  hosts: node1 node2
  vars:
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password

  roles:
    - linux-system-roles.ha_cluster

Advanced Corosync configuration

- name: Manage HA cluster with Corosync options
  hosts: node1 node2
  vars:
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password
    ha_cluster_transport:
      type: knet
      options:
        - name: ip_version
          value: ipv4-6
        - name: link_mode
          value: active
      links:
        -
          - name: linknumber
            value: 1
          - name: link_priority
            value: 5
        -
          - name: linknumber
            value: 0
          - name: link_priority
            value: 10
      compression:
        - name: level
          value: 5
        - name: model
          value: zlib
      crypto:
        - name: cipher
          value: none
        - name: hash
          value: none
    ha_cluster_totem:
      options:
        - name: block_unlisted_ips
          value: 'yes'
        - name: send_join
          value: 0
    ha_cluster_quorum:
      options:
        - name: auto_tie_breaker
          value: 1
        - name: wait_for_all
          value: 1

  roles:
    - linux-system-roles.ha_cluster

Configuring cluster to use SBD

Using ha_cluster_node_options variable

- hosts: node1 node2
  vars:
    my_sbd_devices:
      # This variable is not used by the role directly.
      # Its purpose is to define SBD devices once so they don't need
      # to be repeated several times in the role variables.
      # Instead, variables directly used by the role refer to this variable.
      - /dev/disk/by-id/000001
      - /dev/disk/by-id/000002
      - /dev/disk/by-id/000003
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password
    ha_cluster_sbd_enabled: true
    ha_cluster_sbd_options:
      - name: delay-start
        value: 'no'
      - name: startmode
        value: always
      - name: timeout-action
        value: 'flush,reboot'
      - name: watchdog-timeout
        value: 30
    ha_cluster_node_options:
      - node_name: node1
        sbd_watchdog_modules:
          - iTCO_wdt
        sbd_watchdog_modules_blocklist:
          - ipmi_watchdog
        sbd_watchdog: /dev/watchdog1
        sbd_devices: "{{ my_sbd_devices }}"
      - node_name: node2
        sbd_watchdog_modules:
          - iTCO_wdt
        sbd_watchdog_modules_blocklist:
          - ipmi_watchdog
        sbd_watchdog: /dev/watchdog1
        sbd_devices: "{{ my_sbd_devices }}"
    # Best practice for setting SBD timeouts:
    # watchdog-timeout * 2 = msgwait-timeout (set automatically)
    # msgwait-timeout * 1.2 = stonith-timeout
    ha_cluster_cluster_properties:
      - attrs:
          - name: stonith-timeout
            value: 72
    ha_cluster_resource_primitives:
      - id: fence_sbd
        agent: 'stonith:fence_sbd'
        instance_attrs:
          - attrs:
              - name: devices
                value: "{{ my_sbd_devices | join(',') }}"
              - name: pcmk_delay_base
                value: 30

  roles:
    - linux-system-roles.ha_cluster

Using ha_cluster variable

The same result can be achieved by specifying node-specific options in inventory like this:

all:
  hosts:
    node1:
      ha_cluster:
        sbd_watchdog_modules:
          - iTCO_wdt
        sbd_watchdog_modules_blocklist:
          - ipmi_watchdog
        sbd_watchdog: /dev/watchdog1
        sbd_devices:
          - /dev/disk/by-id/000001
          - /dev/disk/by-id/000002
          - /dev/disk/by-id/000003
    node2:
      ha_cluster:
        sbd_watchdog_modules:
          - iTCO_wdt
        sbd_watchdog_modules_blocklist:
          - ipmi_watchdog
        sbd_watchdog: /dev/watchdog1
        sbd_devices:
          - /dev/disk/by-id/000001
          - /dev/disk/by-id/000002
          - /dev/disk/by-id/000003

Variables specified in inventory can be omitted when writing the playbook:

- hosts: node1 node2
  vars:
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password
    ha_cluster_sbd_enabled: true
    ha_cluster_sbd_options:
      - name: delay-start
        value: 'no'
      - name: startmode
        value: always
      - name: timeout-action
        value: 'flush,reboot'
      - name: watchdog-timeout
        value: 30
    # Best practice for setting SBD timeouts:
    # watchdog-timeout * 2 = msgwait-timeout (set automatically)
    # msgwait-timeout * 1.2 = stonith-timeout
    ha_cluster_cluster_properties:
      - attrs:
          - name: stonith-timeout
            value: 72
    ha_cluster_resource_primitives:
      - id: fence_sbd
        agent: 'stonith:fence_sbd'
        instance_attrs:
          - attrs:
              # taken from host_vars
              # this only works if all nodes have the same sbd_devices
              - name: devices
                value: "{{ ha_cluster.sbd_devices | join(',') }}"
              - name: pcmk_delay_base
                value: 30

  roles:
    - linux-system-roles.ha_cluster

If both the ha_cluster_node_options and ha_cluster variables contain SBD options, those in ha_cluster_node_options have precedence.

Configuring cluster properties

- hosts: node1 node2
  vars:
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password
    ha_cluster_cluster_properties:
      - attrs:
          - name: stonith-enabled
            value: 'true'
          - name: no-quorum-policy
            value: stop

  roles:
    - linux-system-roles.ha_cluster

Creating a cluster with fencing and several resources

- hosts: node1 node2
  vars:
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password
    ha_cluster_resource_primitives:
      - id: xvm-fencing
        agent: 'stonith:fence_xvm'
        instance_attrs:
          - attrs:
              - name: pcmk_host_list
                value: node1 node2
      - id: simple-resource
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
      - id: resource-with-options
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
        instance_attrs:
          - attrs:
              - name: fake
                value: fake-value
              - name: passwd
                value: passwd-value
        meta_attrs:
          - attrs:
              - name: target-role
                value: Started
              - name: is-managed
                value: 'true'
        operations:
          - action: start
            attrs:
              - name: timeout
                value: '30s'
          - action: monitor
            attrs:
              - name: timeout
                value: '5'
              - name: interval
                value: '1min'
      - id: example-1
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
      - id: example-2
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
      - id: example-3
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
      - id: simple-clone
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
      - id: clone-with-options
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
      - id: bundled-resource
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
    ha_cluster_resource_groups:
      - id: simple-group
        resource_ids:
          - example-1
          - example-2
        meta_attrs:
          - attrs:
              - name: target-role
                value: Started
              - name: is-managed
                value: 'true'
      - id: cloned-group
        resource_ids:
          - example-3
    ha_cluster_resource_clones:
      - resource_id: simple-clone
      - resource_id: clone-with-options
        promotable: true
        id: custom-clone-id
        meta_attrs:
          - attrs:
              - name: clone-max
                value: '2'
              - name: clone-node-max
                value: '1'
      - resource_id: cloned-group
        promotable: true
    ha_cluster_resource_bundles:
      - id: bundle-with-resource
        resource-id: bundled-resource
        container:
          type: podman
          options:
            - name: image
              value: my:image
        network_options:
          - name: control-port
            value: 3121
        port_map:
          -
            - name: port
              value: 10001
          -
            - name: port
              value: 10002
            - name: internal-port
              value: 10003
        storage_map:
          -
            - name: source-dir
              value: /srv/daemon-data
            - name: target-dir
              value: /var/daemon/data
          -
            - name: source-dir-root
              value: /var/log/pacemaker/bundles
            - name: target-dir
              value: /var/log/daemon
        meta_attrs:
          - attrs:
              - name: target-role
                value: Started
              - name: is-managed
                value: 'true'

  roles:
    - linux-system-roles.ha_cluster

Configuring resource and resource operation defaults

- hosts: node1 node2
  vars:
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password
    # Set a different `resource-stickiness` value during and outside work
    # hours. This allows resources to automatically move back to their most
    # preferred hosts, but at a time that (in theory) does not interfere with
    # business activities.
    ha_cluster_resource_defaults:
      meta_attrs:
        - id: core-hours
          rule: date-spec hours=9-16 weekdays=1-5
          score: 2
          attrs:
            - name: resource-stickiness
              value: INFINITY
        - id: after-hours
          score: 1
          attrs:
            - name: resource-stickiness
              value: 0
    # Default the timeout on all 10-second-interval monitor actions on IPaddr2
    # resources to 8 seconds.
    ha_cluster_resource_operation_defaults:
      meta_attrs:
        - rule: resource ::IPaddr2 and op monitor interval=10s
          score: INFINITY
          attrs:
            - name: timeout
              value: 8s

  roles:
    - linux-system-roles.ha_cluster

Configuring stonith levels

- hosts: node1 node2
  vars:
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password
    ha_cluster_resource_primitives:
      - id: apc1
        agent: 'stonith:fence_apc_snmp'
        instance_attrs:
          - attrs:
              - name: ip
                value: apc1.example.com
              - name: username
                value: user
              - name: password
                value: secret
              - name: pcmk_host_map
                value: node1:1;node2:2
      - id: apc2
        agent: 'stonith:fence_apc_snmp'
        instance_attrs:
          - attrs:
              - name: ip
                value: apc2.example.com
              - name: username
                value: user
              - name: password
                value: secret
              - name: pcmk_host_map
                value: node1:1;node2:2
    # Nodes have redundant power supplies, apc1 and apc2. Cluster must ensure
    # that when attempting to reboot a node, both power supplies are turned off
    # before either power supply is turned back on.
    ha_cluster_stonith_levels:
      - level: 1
        target: node1
        resource_ids:
          - apc1
          - apc2
      - level: 1
        target: node2
        resource_ids:
          - apc1
          - apc2

  roles:
    - linux-system-roles.ha_cluster

Creating a cluster with resource constraints

- hosts: node1 node2
  vars:
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password
    # In order to use constraints, we need resources the constraints will apply
    # to.
    ha_cluster_resource_primitives:
      - id: xvm-fencing
        agent: 'stonith:fence_xvm'
        instance_attrs:
          - attrs:
              - name: pcmk_host_list
                value: node1 node2
      - id: example-1
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
      - id: example-2
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
      - id: example-3
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
      - id: example-4
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
      - id: example-5
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
      - id: example-6
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
    # location constraints
    ha_cluster_constraints_location:
      # resource ID and node name
      - resource:
          id: example-1
        node: node1
        options:
          - name: score
            value: 20
      # resource pattern and node name
      - resource:
          pattern: example-\d+
        node: node1
        options:
          - name: score
            value: 10
      # resource ID and rule
      - resource:
          id: example-2
        rule: '#uname eq node2 and date in_range 2022-01-01 to 2022-02-28'
      # resource pattern and rule
      - resource:
          pattern: example-\d+
        rule: node-type eq weekend and date-spec weekdays=6-7
    # colocation constraints
    ha_cluster_constraints_colocation:
      # simple constraint
      - resource_leader:
          id: example-3
        resource_follower:
          id: example-4
        options:
          - name: score
            value: -5
      # set constraint
      - resource_sets:
          - resource_ids:
              - example-1
              - example-2
          - resource_ids:
              - example-5
              - example-6
            options:
              - name: sequential
                value: "false"
        options:
          - name: score
            value: 20
    # order constraints
    ha_cluster_constraints_order:
      # simple constraint
      - resource_first:
          id: example-1
        resource_then:
          id: example-6
        options:
          - name: symmetrical
            value: "false"
      # set constraint
      - resource_sets:
          - resource_ids:
              - example-1
              - example-2
            options:
              - name: require-all
                value: "false"
              - name: sequential
                value: "false"
          - resource_ids:
              - example-3
          - resource_ids:
              - example-4
              - example-5
            options:
              - name: sequential
                value: "false"
    # ticket constraints
    ha_cluster_constraints_ticket:
      # simple constraint
      - resource:
          id: example-1
        ticket: ticket1
        options:
          - name: loss-policy
            value: stop
      # set constraint
      - resource_sets:
          - resource_ids:
              - example-3
              - example-4
              - example-5
        ticket: ticket2
        options:
          - name: loss-policy
            value: fence

  roles:
    - linux-system-roles.ha_cluster

Configuring a cluster using a quorum device

Configuring a quorum device

Before you can add a quorum device to a cluster, you need to set the device up. This is only needed to be done once for each quorum device. Once it has been set up, you can use a quorom device in any number of clusters.

Note that you cannot run a quorum device on a cluster node.

- hosts: nodeQ
  vars:
    ha_cluster_cluster_present: false
    ha_cluster_hacluster_password: password
    ha_cluster_qnetd:
      present: true

  roles:
    - linux-system-roles.ha_cluster

Configuring a cluster to use a quorum device

- hosts: node1 node2
  vars:
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password
    ha_cluster_quorum:
      device:
        model: net
        model_options:
          - name: host
            value: nodeQ
          - name: algorithm
            value: lms

  roles:
    - linux-system-roles.ha_cluster

Configuring node attributes

- hosts: node1 node2
  vars:
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password
    ha_cluster_node_options:
      - node_name: node1
        attributes:
          - attrs:
              - name: attribute1
                value: value1A
              - name: attribute2
                value: value2A
      - node_name: node2
        attributes:
          - attrs:
              - name: attribute1
                value: value1B
              - name: attribute2
                value: value2B

  roles:
    - linux-system-roles.ha_cluster

Configuring ACLs

- hosts: node1 node2
  vars:
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password
    # To use an ACL role permission reference, the reference must exist in CIB.
    ha_cluster_resource_primitives:
      - id: not-for-operator
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
    # ACLs must be enabled (using the enable-acl cluster property) in order to
    # be effective.
    ha_cluster_cluster_properties:
      - attrs:
          - name: enable-acl
            value: 'true'
    ha_cluster_acls:
      acl_roles:
        - id: operator
          description: HA cluster operator
          permissions:
            - kind: write
              xpath: //crm_config//nvpair[@name='maintenance-mode']
            - kind: deny
              reference: not-for-operator
        - id: administrator
          permissions:
            - kind: write
              xpath: /cib
      acl_users:
        - id: alice
          roles:
            - operator
            - administrator
        - id: bob
          roles:
            - administrator
      acl_groups:
        - id: admins
          roles:
            - administrator

  roles:
    - linux-system-roles.ha_cluster

Configuring utilization

- hosts: node1 node2
  vars:
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password
    # For utilization to have an effect, the `placement-strategy` property
    # must be set and its value must be different from the value `default`.
    ha_cluster_cluster_properties:
      - attrs:
          - name: placement-strategy
            value: utilization
    ha_cluster_node_options:
      - node_name: node1
        utilization:
          - attrs:
              - name: utilization1
                value: 1
              - name: utilization2
                value: 2
      - node_name: node2
        utilization:
          - attrs:
              - name: utilization1
                value: 3
              - name: utilization2
                value: 4
    ha_cluster_resource_primitives:
      - id: resource1
        # wokeignore:rule=dummy
        agent: 'ocf:pacemaker:Dummy'
        utilization:
          - attrs:
              - name: utilization1
                value: 2
              - name: utilization2
                value: 3

  roles:
    - linux-system-roles.ha_cluster

Configuring Alerts

- hosts: node1 node2
  vars:
    ha_cluster_cluster_name: my-new-cluster
    ha_cluster_hacluster_password: password
    ha_cluster_alerts:
      - id: alert1
        path: /alert1/path
        description: Alert1 description
        instance_attrs:
          - attrs:
              - name: alert_attr1_name
                value: alert_attr1_value
        meta_attrs:
          - attrs:
              - name: alert_meta_attr1_name
                value: alert_meta_attr1_value
        recipients:
          - value: recipient_value
            id: recipient1
            description: Recipient1 description
            instance_attrs:
              - attrs:
                  - name: recipient_attr1_name
                    value: recipient_attr1_value
            meta_attrs:
              - attrs:
                  - name: recipient_meta_attr1_name
                    value: recipient_meta_attr1_value

  roles:
    - linux-system-roles.ha_cluster

Purging all cluster configuration

- hosts: node1 node2
  vars:
    ha_cluster_cluster_present: false

  roles:
    - linux-system-roles.ha_cluster

rpm-ostree

See README-ostree.md

License

MIT

Author Information

Tomas Jelinek