wazuh / wazuh-qa

Wazuh - Quality Assurance
GNU General Public License v2.0
63 stars 30 forks source link

Show tasks duration and resource usage metrics in `test_cluster_performance` output #5390

Open Selutario opened 2 weeks ago

Selutario commented 2 weeks ago

Description

We need to modify the test below:

It fails when any of the cluster stats (task duration or resource usage) exceeds a predefined threshold. However, it may be helpful to review what those stats really are even if the test does not fail, so that we can detect slight increases in some of the metrics.

To make this easier, the test should print (and include in the report) the detailed metrics that it uses internally. For example:

>>> from wazuh_testing.tools.performance.csv_parser import ClusterCSVTasksParser
>>> ClusterCSVTasksParser('/home/selu/Descargas/cluster_performance/517/artifacts_480_rc1').get_stats()
{
    "setup_phase": {
        "integrity_check": {
            "time_spent(s)": {
                "workers": {
                    "mean": ("worker_17", 0.3481111111111111),
                    "max": ("worker_14", 3.176),
                },
                "master": {
                    "mean": ("master", 0.05240245824141191),
                    "max": ("master", 0.709),
                },
            }
        },
        "integrity_sync": {
            "time_spent(s)": {
                "workers": {
                    "mean": ("worker_8", 0.04211764705882353),
                    "max": ("worker_23", 0.163),
                },
                "master": {
                    "mean": ("master", 0.5421203007518796),
                    "max": ("master", 3.217),
                },
            }
        },
        "agent-info_sync": {
            "time_spent(s)": {
                "workers": {
                    "mean": ("worker_18", 0.9509827586206897),
                    "max": ("worker_9", 10.639),
                },
                "master": {
                    "mean": ("master", 0.687005693950178),
                    "max": ("master", 10.257),
                },
            }
        },
    },
    "stable_phase": {
        "integrity_check": {
            "time_spent(s)": {
                "workers": {
                    "mean": ("worker_3", 0.01140740740740741),
                    "max": ("worker_3", 0.04),
                },
                "master": {
                    "mean": ("master", 0.00456888888888889),
                    "max": ("master", 0.017),
                },
            }
        },
        "agent-info_sync": {
            "time_spent(s)": {
                "workers": {"mean": ("worker_18", 0.00964), "max": ("worker_18", 0.025)}
            }
        },
    },
}
Selutario commented 2 weeks ago

We should also decrease these task thresholds: https://github.com/wazuh/wazuh-qa/blob/d6616bff7329b99a9b896af9dd7eb63cb6ad97c3/tests/performance/test_cluster/test_cluster_performance/data/25w_50000a_thresholds.yaml#L2-L43

Artifacts like the ones attached here should be making the test fail.