Open jesusvazquez opened 1 year ago
Hi @jesusvazquez, @bboreham, and @bwplotka,
I hope you’re all doing well. I've been working on this issue for a few days now and believe I have a solution. However, I have some doubts about the logic, so I'd like to clarify them here before making a pull request. Apologies if this seems unprofessional.
1. Changes Made:
I’ve added functionality to handle native histograms and simulate out-of-order queries. Specifically, the following changes have been made:
native_histogram
metric to track histogram data.generate_native_histogram_query
, to modify the query expression based on use_native_histograms
.query
method to include out-of-order load simulation with a random factor.Here’s a summary of the changes:
# Native Histogram Metric
native_histogram = Histogram("loadgen_native_histogram", "Native histogram for testing", ["prometheus", "group"], buckets=(0.05, 0.1, 0.3, 0.7, 1.5, 2.5, 4, 6, 8, 10))
# Method to generate native histogram query
def generate_native_histogram_query(self, expr):
if self.use_native_histograms:
return f"histogram_quantile(0.99, {expr})"
return expr
# Modified query method with out-of-order simulation
def query(self, expr):
...
if self.type == "range":
if random.random() < self.out_of_order_probability:
out_of_order_factor = random.uniform(-self.start * self.out_of_order_range, self.start * self.out_of_order_range)
params["start"] = start_time - self.start + out_of_order_factor
params["end"] = start_time - self.end + out_of_order_factor
Querier.out_of_order_count.labels(self.target, self.name, expr, self.type).inc()
else:
params["start"] = start_time - self.start
params["end"] = start_time - self.end
params["step"] = self.step
...
Could you please review these changes to ensure they meet the requirements for detecting memory regressions?
2. Randomness in Query Method: In the query method, I’ve introduced a random factor to simulate out-of-order loads:
out_of_order_factor = random.uniform(-self.start * self.out_of_order_range, self.start * self.out_of_order_range)
params["start"] = start_time - self.start + out_of_order_factor
params["end"] = start_time - self.end + out_of_order_factor
This factor represents 10% of the start time (self.start). Is this level of randomness appropriate for our testing, or should it be adjusted?
Also just want to make sure that after making changes in tools/load-generator/main.py, do I also need to update tools/fake-webserver/server.go to generate native histograms? Sorry if this question seems basic. 😅
Thank you for your time and help!
@jesusvazquez Sir, can you please check it once?
FYI: With recent changed avalanche
is capable for this: https://github.com/prometheus-community/avalanche (for write)
Relevant: https://github.com/prometheus/test-infra/issues/559
An issue was detected a few days ago https://github.com/prometheus/prometheus/issues/12603 where a memory regression was introduced with a native histograms change.
We do have a process in place where we run prombench in PRs that we think might increase resources and also we do the same in our process to cut new releases.
I'd like to make sure that we generate native histograms and out-of-order load with our Prombench tool to make sure we catch this things prior to releasing changes into the wild