Add option for metrics sampling

raxod502-plaid commented 2 years ago

We've noticed that this plugin sends metrics to Prometheus that are sampled every 50ms. This results in an absolutely monumental amount of data volume being ingested, which overloads our system. All of our other services have metrics that are sampled at around 30s intervals. Even 1s would be fine, but 50ms is complete overkill since that level of detail will never show up in query results or a dashboard.

We've worked around the problem in our fork of xk6-output-prometheus-remote by applying the following diff:

--- a/kube/charts/k6/xk6-output-prometheus-remote/pkg/remotewrite/remotewrite.go
+++ b/kube/charts/k6/xk6-output-prometheus-remote/pkg/remotewrite/remotewrite.go
@@ -131,6 +131,8 @@ func (o *Output) flush() {
 func (o *Output) convertToTimeSeries(samplesContainers []stats.SampleContainer) []prompb.TimeSeries {
        promTimeSeries := make([]prompb.TimeSeries, 0)

+       seen := map[string]bool{}
+
        for _, samplesContainer := range samplesContainers {
                samples := samplesContainer.GetSamples()

@@ -148,9 +150,17 @@ func (o *Output) convertToTimeSeries(samplesContainers []stats.SampleContainer)

                        if newts, err := o.metrics.transform(o.mapping, sample, labels); err != nil {
                                o.logger.Error(err)
-                       } else {
+                       } else if !seen[sample.Metric.Name] {
                                promTimeSeries = append(promTimeSeries, newts...)
                        }
+
+                       // We only need 1 sample per metric per remote
+                       // write, not one every 50ms(!!). Can't do
+                       // this earlier in the function because
+                       // counters have to be computed using all
+                       // incoming samples. We just refrain from
+                       // sending the final value multiple times.
+                       seen[sample.Metric.Name] = true
                }

                // Do not blow up if remote endpoint is overloaded and responds too slowly.

This is clearly a hack, as it would be better to just set a specific time interval and have the sampling done intelligently, rather than hardcoding to once per remote write. Also, this implementation sends the oldest datapoint per interval, rather than the most recent which is probably what is desired. But, Prometheus simply can't manage with hundreds of thousands of time series generated by a 50ms sample interval, so we needed to do something to get the plugin working.

Would a change to make the sampling interval customizable be accepted upstream?

yorugac commented 2 years ago

Hi @raxod502-plaid thanks for opening this issue.

Yes, you're right; 50ms is hard-coded in k6 itself and this extension just re-sends everything. This issue is known and somewhat implied in #2. The main problem is that meaningful work in this extension is currently blocked by plans for metrics refactoring in k6. But given not fully clear timeline for that refactoring, perhaps, adding some workaround here would be good.

Personally I'd prefer if we had a slightly smarter aggregation, even for a workaround... Would you be willing to add a simple configuration option together with this change? E.g. K6_PROMETHEUS_AGGREGATION_PERIOD option: if this period is > 0 then use the logic of sending one sample per batch; otherwise, send everything. So this behavior can be switched on and off in configuration. WDYT?

raxod502-plaid commented 2 years ago

I can work on that - one concern I have though is, after implementing the workaround above, we see the following memory utilization pattern for k6:

With this plugin disabled, the memory usage is stable at only a few GB.

Do you know what's up with that? Presumably, metrics are somehow not getting freed, or the amount of data churn is so large the GC cannot keep up.

yorugac commented 2 years ago

This is interesting. Going by the snippet you posted before, the logic could definitely be improved by moving the check for seen to the beginning of the loop. My current assumption is that Golang has an optimization for slices that are appended, so the function ends up using more memory when it's not appending newts as in your approach. IOW, I'd try moving the check to the top of the loop and see how it performs then. But there could be something else at play as well. So I suggest making a PR to be able to discuss it properly and in case there's some additional caveat.

raxod502-plaid commented 2 years ago

Okay - I've been working on this integration on and off for many weeks now, so I'm going to take a break, but will make a note of this internally for when the work is picked back up.

daniel-maganto commented 2 years ago

Any news on this issue? Many thanks

raxod502-plaid commented 2 years ago

Nothing on my side. The project has been deprioritized for now, unfortunately. I still think it is a needed feature, though.

raxod502-plaid commented 2 years ago

Not sure why GitHub auto-closed that. Reopened.

raxod502-plaid commented 2 years ago

Wtf.

codebien commented 1 year ago

The major issues with this have been resolved. The output aggregates all the samples of the same series seen during the same configured push interval. I keep it open mostly for tracking eventual feedback on the case to configure a different aggregation interval from the push interval.

grafana / xk6-output-prometheus-remote

Add option for metrics sampling #16