jesusvazquez commented 1 year ago

An issue was detected a few days ago https://github.com/prometheus/prometheus/issues/12603 where a memory regression was introduced with a native histograms change.

We do have a process in place where we run prombench in PRs that we think might increase resources and also we do the same in our process to cut new releases.

I'd like to make sure that we generate native histograms and out-of-order load with our Prombench tool to make sure we catch this things prior to releasing changes into the wild

HarshitNagpal29 commented 2 months ago

Hi @jesusvazquez, @bboreham, and @bwplotka,

I hope you’re all doing well. I've been working on this issue for a few days now and believe I have a solution. However, I have some doubts about the logic, so I'd like to clarify them here before making a pull request. Apologies if this seems unprofessional.

1. Changes Made:

I’ve added functionality to handle native histograms and simulate out-of-order queries. Specifically, the following changes have been made:

Added a native_histogram metric to track histogram data.
Introduced a new method, generate_native_histogram_query, to modify the query expression based on use_native_histograms.
Modified the query method to include out-of-order load simulation with a random factor.

Here’s a summary of the changes:

# Native Histogram Metric
native_histogram = Histogram("loadgen_native_histogram", "Native histogram for testing", ["prometheus", "group"], buckets=(0.05, 0.1, 0.3, 0.7, 1.5, 2.5, 4, 6, 8, 10))

# Method to generate native histogram query
def generate_native_histogram_query(self, expr):
    if self.use_native_histograms:
        return f"histogram_quantile(0.99, {expr})"
    return expr

# Modified query method with out-of-order simulation
def query(self, expr):
    ...
    if self.type == "range":
        if random.random() < self.out_of_order_probability:
            out_of_order_factor = random.uniform(-self.start * self.out_of_order_range, self.start * self.out_of_order_range)
            params["start"] = start_time - self.start + out_of_order_factor
            params["end"] = start_time - self.end + out_of_order_factor
            Querier.out_of_order_count.labels(self.target, self.name, expr, self.type).inc()
        else:
            params["start"] = start_time - self.start
            params["end"] = start_time - self.end
        params["step"] = self.step
    ...

Could you please review these changes to ensure they meet the requirements for detecting memory regressions?

2. Randomness in Query Method: In the query method, I’ve introduced a random factor to simulate out-of-order loads:

out_of_order_factor = random.uniform(-self.start * self.out_of_order_range, self.start * self.out_of_order_range)
params["start"] = start_time - self.start + out_of_order_factor
params["end"] = start_time - self.end + out_of_order_factor

This factor represents 10% of the start time (self.start). Is this level of randomness appropriate for our testing, or should it be adjusted?

Also just want to make sure that after making changes in tools/load-generator/main.py, do I also need to update tools/fake-webserver/server.go to generate native histograms? Sorry if this question seems basic. 😅

Thank you for your time and help!

HarshitNagpal29 commented 2 months ago

@jesusvazquez Sir, can you please check it once?

bwplotka commented 3 weeks ago

FYI: With recent changed avalanche is capable for this: https://github.com/prometheus-community/avalanche (for write)

Relevant: https://github.com/prometheus/test-infra/issues/559

krajorama commented 3 weeks ago

748 was merged with superseded #725

prometheus / test-infra

Create native histograms load in Prombench runs #594

748 was merged with superseded #725