grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.35k stars 189 forks source link

Flow: Add component for multi-tenant remote_write support #521

Open tpaschalis opened 1 year ago

tpaschalis commented 1 year ago

The Prometheus remote_write protocol doesn't have the notion of multi-tenancy itself. For this reason different backends offer use various methods to enable multi-tenancy, most often using an X-Scope-OrgID header, or labels with a special meaning.

When using a prometheus.remote_write component, the Prometheus queue_manager reads WAL segments sequentially and enqueues metrics opportunistically to be batched off as remote_write requests. This behaviour offers no fine-grained control at the request level. Our current suggestion to users is that they add an extra header to their endpoint block, but these per-endpoint headers are static, and we don't support write_relabel_config filtering in Flow yet.

The new component would act as a remote_write middleware, receiving metrics from upstream components, extracting a given label (say tenant), and batching timeseries grouped by this label value.

It would then send discrete remote_write requests for these batches while also adding the correct X-Scope-OrgID header for each one.

Notes

oscni commented 1 year ago

Just to add to the great summary above, an example where this is needed is when you scrape metrics from a kubernetes cluster and want the metrics from different namespaces to end up in different tenants/headers.

akselleirv commented 1 year ago

Thank you @tpaschalis for creating the issue.

I have not looked into Flow too much, but an important configuration option would be to handle this X-Scope-OrgID logic in a centralized way. Let's say I'm collecting the scrape jobs from the Prometheus CRDs, then I want add this write_relabel_config step for all these jobs.

My current approach is to use Kyverno to add this relabel step to all the ServiceMonitor CRDs. Would an option for the Grafana Agent to do the same approach. For example, I have one component that collects all the ServiceMonitors and apart of that component I want to add the relabel of a pod label into a time series label.

Would this be feasible?

tpaschalis commented 1 year ago

Hey @akselleirv, apologies for the belated response. What you're describing would not be actually possible with Flow, until we have Operator components that can read the Prometheus CRDs.

I suppose that your current approach to have duplicated remote_write definitions for each tenant with the correct headers in each, and a write_relabel_config rule that drops all metrics except for those with the correct tenant label? Good news is that is something that could work in the 'static' Agent mode and the Operator, but would basically be some syntactic sugar around the same approach, as built-in multi-tenancy cannot be implemented without changes to Prometheus' remote_write protocol in itself.

Could you open a separate issue describing your desired approach around the Operator to kickstart this conversation and put some more eyes to it? I think it would benefit to discuss your proposal from scratch.

akselleirv commented 1 year ago

Hello @tpaschalis, sorry for not responding to you sooner. I saw that this proposal have been added to the upcoming release which I'm very grateful for. If Prometheus CRDs support is also added to Flow, then it would satisfy my requirements and I can replace the existing setup with Flow.

At the moment I'm using a modified version of the cortext-tenant proxy which allows for some additional logic. Currently I'm adding the time series in an open tenant if the pod has not been labeled with privateMetrics=true. If it has, then I'm adding the time series to the tenant specified by another pod label.

Would it be possible to have this kind of logic in this component? This would be similar to the feature found in the Promtail tenant stage.

mattdurham commented 1 year ago

This is accepted based on the concept but not the specific implementation.

ptodev commented 1 year ago

Collector's Loki exporter seems to support remote writing using different headers based on label hints.

Also, today on the community slack a person asked about using different headers based on label hints, but for otelcol.exporter.otlp.

adberger commented 9 months ago

Any updates on this?