Open sgup432 opened 1 year ago
Essentially, when we have a dedicated cluster manager node, we should skip collecting all metrics along dimension(see
https://opensearch.org/docs/1.0/monitoring-plugins/pa/reference/):
1/ ShardID, IndexName, Operation, ShardRole
2/ ShardID, IndexName
3/ Operation, Exception, Indices, HTTPRespCode, ShardID, IndexName, ShardRole
4/ ThreadPoolType
The Dedicated Cluster Manager is already an overloaded node w.r.t Performance Analyzer RCA, disabling metric collection for a set of metrics will help bring down the overall footprint of the PA-RCA component.
Identify a Dedicated Master Node: The way to achieve a dedicated node type is to mark all other node types as false. In this case, you have to mark all the other nodes as not cluster manager eligible.
from See, https://opensearch.org/docs/latest/tuning-your-cluster/cluster/
This would require changing the code logic in ClusterDetailsEventProcessor
to categorize the nodes into 3 :Data
, Co-located Cluster manager
and Dedicated Cluster manager
.
In case of Dedicated Cluster manager
, the nodes should not be collecting any metrics relevant to shards/index as they contain no data.
As mentioned in the comment above and in #308, these errors can be prevented by limiting certain RCA node executions to certain OS node roles. Note that these propose a framework change, which will not prevent "incorrect" usage of tags and framework itself, and exceptions like ones from the issue description would still be possible. Those scenarios should also be gracefully handled so let this remain a separate issue from 308.
Mostly agree with the proposal mentioned here ie to not collect these metrics for dedicated cluster manager node.
But we should also consider non dedicated cluster manager nodes as well, right? As it doesn't make here as well.
Identification of cluster manager nodes should be possible in this package as already have mechanisms in place where RCA has domain level info ie which nodes are of cluster manager or data roles. RCA framework itself relies on it.
@sgup432 you're right. I talked about that in #308. Currently we have to choose between no-effect dedicated Cluster Manager execution if we want correctness, and no execution at non dedicated Cluster Manager nodes if we want performance, #308 proposes framework change in order to achieve both correctness and performance without tradeoffs.
As the node's decision for which tags to apply comes down to RCA .conf
files, there appear to be three main approaches for making this change:
Changing the structure of the locus tags inside .conf files to include hybrid roles.
Setting this tag and the conf file as a whole is already the user's/administrator's responsibility - i.e. choosing a persisting storage type, defining RCA graphs, tuning RCA-specific settings and thresholds and choosing tags for the node directly linked to a certain conf file. With this change, it would include user being aware of the cluster setup (if the CM is dedicated or not) and giving it a corresponding tag. The tag would remain the same if the CM is dedicated and it would be different (hybrid) if it's not. With this change, RCA nodes that shouldn't be executed at dedicated CM nodes, like i.e. HotShardRCA
can now safely be marked with LOCUS_DATA_NODE
as the hybrid conf file tag would execute these in case of hybrid, and analogously, cluster-manager
conf file tag would not execute these in case of dedicated CM node.
Introducing a new conf file for hybrid Opensearch
nodes. This would still require the aforementioned new hybrid tag. It would also make the process a bit more automated - no need for explicit set by admin (note: only for default setup), but with a cost of having to change the internal RCA role logic plus having two new conf files with mostly duplicated code to it's dedicated CM counterparts.
Abandoning the tag reliance for conf files and creating the completely internal logic for this. This one gives the most automated solution but requires the greatest framework change by far.
It is important to note that solutions from points 2 and 3 currently have indistinct feasibility and consequences because of this higher level of automation that wasn't implemented nor planned by the original creators of the framework, but left to be hand configured.
What is the bug? We saw below errors in PA log in one of the master/clusterManager node.
It attempts to calculate indices cache related metrics which is not present on cluster manager nodes. PA plugin does not write this data into shared memory.
PA plugin write below data in data node but not for cluster manager node. As expected.
How can one reproduce the bug? Steps to reproduce the behavior:
What is the expected behavior? We should not calculate indices cache related metrics at cluster manager node. Might require changes around ReaderMetricProcessor.
What is your host/environment?
Do you have any screenshots? If applicable, add screenshots to help explain your problem.
Do you have any additional context? Add any other context about the problem.