We have noticed failures on certain indexer cluster SVAs (C1 and C3) after introducing a dedicated instance/container for the monitoring console. Some background on the failure:
Failure occurs on the Fetch distributed peers when cm is defined play
Prior to this play, the monitoring console role compiles a list called cluster_master_peers that looks like the following: ["ip-address1:8089", "ip-address2:8089", "ip-address3:8089"]. These values represent idx1, idx2, and idx3 respectively.
The failing play attempts to query that these indexer peers are up using the name and content.status fields from the API response. This response contains somewhat duplicate entries for each peer. For example, there will be one entry with the name idx1:8089 and the status Up and one entry with the name ip-address1:8089 and the status Down.
Since we are comparing against the IP and not the fqdn (idx1, idx2, idx3), the task comes to the conclusion that all peers are Down, and it retries until the failure.
I have adjusted the 2 plays mentioned to use the fqdn (denoted by label and peerName) instead of the ip address assigned to each peer.
We have noticed failures on certain indexer cluster SVAs (C1 and C3) after introducing a dedicated instance/container for the monitoring console. Some background on the failure:
Fetch distributed peers when cm is defined
playcluster_master_peers
that looks like the following: ["ip-address1:8089", "ip-address2:8089", "ip-address3:8089"]. These values represent idx1, idx2, and idx3 respectively.name
andcontent.status
fields from the API response. This response contains somewhat duplicate entries for each peer. For example, there will be one entry with the nameidx1:8089
and the statusUp
and one entry with the nameip-address1:8089
and the statusDown
.Down
, and it retries until the failure.I have adjusted the 2 plays mentioned to use the fqdn (denoted by
label
andpeerName
) instead of the ip address assigned to each peer.