scylladb / cql-stress

8 stars 4 forks source link

Add HDR histogram support to c-s frontend #95

Open soyacz opened 3 weeks ago

soyacz commented 3 weeks ago

Running elasticity test (grow-shrink) failed due:

2024-06-06 14:26:46.373: (DisruptionEvent Severity.ERROR) period_type=end event_id=5d608d77-aaa3-4259-832f-aa1eace72925 duration=30m43s: nemesis_name=GrowShrinkClusterParallel target_node=Node perf-latency-grow-shrink-ubuntu-db-node-dcfb0b61-3 [34.201.20.143 | 10.12.0.246] errors='NoneType' object has no attribute 'get_percentile_to_value_dict'
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/csrangehistogram.py", line 339, in _build_histograms_summary_with_interval_by_tag
end_interval).build_histogram_summary_by_tag(path, hdr_tag)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/csrangehistogram.py", line 332, in build_histogram_summary_by_tag
return _CSRangeHistogramBuilder._get_summary_for_operation_by_hdr_tag(histogram)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/csrangehistogram.py", line 301, in _get_summary_for_operation_by_hdr_tag
if parsed_summary := _CSRangeHistogramBuilder._convert_raw_histogram(histogram.histogram, histogram.start_time,
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/csrangehistogram.py", line 311, in _convert_raw_histogram
if percentiles := histogram.get_percentile_to_value_dict(PERCENTILES):
AttributeError: 'NoneType' object has no attribute 'get_percentile_to_value_dict'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5202, in wrapper
result = method(*args[1:], **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 4056, in disrupt_grow_shrink_cluster_parallel
self.steady_state_latency()
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 215, in wrapped
result["hdr"] = args[0].tester.get_cs_range_histogram_by_interval(stress_operation=workload,
File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 3709, in get_cs_range_histogram_by_interval
return make_cs_range_histogram_summary_by_interval(
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/csrangehistogram.py", line 64, in make_cs_range_histogram_summary_by_interval
return builder.build_histograms_summary_with_interval(path, interval)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/csrangehistogram.py", line 197, in build_histograms_summary_with_interval
if res := future.result():
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
AttributeError: 'NoneType' object has no attribute 'get_percentile_to_value_dict'

and later:

2024-06-06 15:01:30.372: (TestFrameworkEvent Severity.ERROR) period_type=one-time event_id=5247dc39-d640-4e71-b661-027bda159ef6, source=PerformanceRegressionTest.test_latency_write_with_nemesis (performance_regression_test.PerformanceRegressionTest)() message=Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/performance_regression_test.py", line 626, in test_latency_write_with_nemesis
self.run_workload(stress_cmd=self.params.get('stress_cmd_w'), nemesis=True)
File "/home/ubuntu/scylla-cluster-tests/performance_regression_test.py", line 343, in run_workload
check_latency()
File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 3175, in check_latency_during_ops
latency_results = json.load(file)
File "/usr/local/lib/python3.10/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/usr/local/lib/python3.10/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Logs and commands - Restore Monitor Stack command: `$ hydra investigate show-monitor dcfb0b61-f2b7-486f-9d17-53ecc028830f` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=dcfb0b61-f2b7-486f-9d17-53ecc028830f) - Show all stored logs command: `$ hydra investigate show-logs dcfb0b61-f2b7-486f-9d17-53ecc028830f` ## Logs: *No logs captured during this run.* [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-staging/job/lukasz/job/scylla-master-perf-regression-latency-650gb-grow-shrink/34/) [Argus](https://argus.scylladb.com/test/18619447-834b-4c2c-9a77-c87e54a099a8/runs?additionalRuns[]=dcfb0b61-f2b7-486f-9d17-53ecc028830f)
fruch commented 3 weeks ago

Seems a bit there are few assumptions here about data that isn't available i.e. HDR histograms

I would seek a way to skip it for cql-stress

soyacz commented 3 weeks ago

yes, simple workaround is to disable hdr histogram analysis with param which I did for my testing. Otherwise, we need to implement workaround for cql-stress until this feature is not there.

piodul commented 3 weeks ago

Just wanted to point out that there is support for outputting hdr histograms in cql-stress, but AFAIK it was only done for the scylla-bench frontend. Perhaps it shouldn't be too difficult to reuse the code for the cassandra-stress frontend if it's the same format.