thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.15k stars 2.1k forks source link

Enable Receiver to extract Tenant from a label present in incoming timeseries #7081

Open verejoel opened 10 months ago

verejoel commented 10 months ago

Is your proposal related to a problem?

One way to support a multi-tenant ruler is to enable receives to infer the tenant based not on HTTP headers, but on labels present in the incoming time-series. See the comment in issue #5133 for more information.

As well as helping move forward with the multi-tenant ruler, this would help enable multi-tenancy in clients that deliver telemetry. For example, currently if we want to ship telemetry using the OpenTelemetry collector prometheus remote write exporter, we would need to configure an exporter per tenant (I am aware that one can use the headers setter extension to dynamically set headers, but this only works if you have the same tenant for the whole request context).

Describe the solution you'd like

Introduce a new CLI flag for the receiver, --incoming-tenant-label-name. Thanos receiver will then search each time-series for occurrences of this label, building a map of unique tenant names discovered from the label values to slices of time series belonging to that tenant. Each element of the map can then be distributed according to the hashring config.

The process is summarised in this flow chart: image

Notes:

Describe alternatives you've considered

Performing the manipulations to the THANOS-TENANT header upstream dynamically (using an OTel collector, for example).

Additional context

We'd need to modify the behaviour of the receiveHTTP handler, in particular where we extract the tenant from the HTTP request.

fpetkovski commented 10 months ago

This would be a really cool feature indeed! We've tried to build a proxy that extracts the tenant from a label and sends one request per tenant with the appropriate header, but it overwhelmed receivers and was not worth the hassle. Having the feature natively built into Thanos is the way to go.

verejoel commented 10 months ago

Thanks Filip - do you think the proposal for how the feature would work makes sense? I’m a little worried to add too much overhead to the routing receives and cause requests to get backed up.

On Sat, 20 Jan 2024 at 10:49, Filip Petkovski @.***> wrote:

This would be a really cool feature indeed! We've tried to build a proxy that extracts the tenant from a label and sends one request per tenant with the appropriate header, but it overwhelmed receivers and was not worth the hassle. Having the feature natively built into Thanos is the way to go.

— Reply to this email directly, view it on GitHub https://github.com/thanos-io/thanos/issues/7081#issuecomment-1902049545, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALE5VOKEKSMDLZSCHTFBCLDYPOHLTAVCNFSM6AAAAABCCR5KMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBSGA2DSNJUGU . You are receiving this because you authored the thread.Message ID: @.***>

MichaHoffmann commented 10 months ago

Thanks Filip - do you think the proposal for how the feature would work makes sense? I’m a little worried to add too much overhead to the routing receives and cause requests to get backed up.

Wouldnt this be evaluated on the ingesting receiver when looking in which tenant the sample should be written to?

I would imagine that instead of one local write, we would inspect the request and group it by tenant and issue multiple local writes here https://github.com/thanos-io/thanos/blob/4a73fc3cdbc94ddce238c695f4723a9c0995f9da/pkg/receive/handler.go#L793, right? That would happen on the ingester I think!

fpetkovski commented 10 months ago

The ingesting receiver does not always have access to the hashring (e.g. in router-ingester split mode). So routers need to know which ingester to send samples to.

verejoel commented 10 months ago

Had some discussions with @MichaHoffmann. We came to conclusion that the proposed implementation would completely break the current rate limiting concept...e.g. if 1 tenant in a batch of 20 is over the limit, what should Thanos do? 429 will be retried and result in the whole batch being ingested again. If we drop all metrics in the batch due to 1 tenant being over the limit, we have created a noisy neighbour problem. But if we accept the metrics from the valid 19/20 tenants, then we will have out of order issues.

So the current design is incompatible with per-tenant rate limiting as it stands.

sepich commented 10 months ago

Another way to do it, is via current --receive.relabel-config in router stage by exposing tenant as internal label (ex. __meta_tenant_id) and allowing it's modifications:

- |
    --receive.relabel-config=
    - source_labels: [prometheus]
      target_label: __meta_tenant_id

The same way it is done in mimir: https://github.com/grafana/mimir/pull/4725

Example of changes needed in Thanos: https://github.com/thanos-io/thanos/commit/44a0728f03d9ed1812c4289caa36650c18abff27#diff-42c21b7b04cc61ab0cda17794cc1efff14802e0e89a85503d28601e721c1dd31R849

verejoel commented 10 months ago

@sepich I like that approach. Do you know how Mimir handles per-tenant limits in that situation?

MichaHoffmann commented 10 months ago

@verejoel I think it has the same issues with the ratelimit since all the samples still come from one remote write request!

GiedriusS commented 10 months ago

Implemented a PoC for this, works really well. A few caveats:

verejoel commented 8 months ago

@GiedriusS any chance you can open a PR for this issue? Would love to try it out.

GiedriusS commented 8 months ago

@verejoel check out https://github.com/thanos-io/thanos/pull/7256

pvlltvk commented 6 months ago

@GiedriusS Hi! I've tried this functionality in v0.35.0, but with no luck. Does it work in router + ingestor mode?

irizzant commented 6 months ago

I've tried --receive.split-tenant-label-name too but it's not working.

Thanos is deployed with Bitnami Helm Chart, here is the receive configuration:

- receive
            - --log.level=info
            - --log.format=logfmt
            - --grpc-address=0.0.0.0:10901
            - --http-address=0.0.0.0:10902
            - --remote-write.address=0.0.0.0:19291
            - --objstore.config=$(OBJSTORE_CONFIG)
            - --tsdb.path=/var/thanos/receive
            - --label=replica="$(NAME)"
            - --label=receive="true"
            - --tsdb.retention=15d
            - --receive.local-endpoint=127.0.0.1:10901
            - --receive.hashrings-file=/var/lib/thanos-receive/hashrings.json
            - --receive.replication-factor=1
            - --receive.split-tenant-label-name="tenant_id"

I created a ServiceMonitor which adds the label tenant_id=testbut metrics come out with tenant_id=default-tenant

MichaHoffmann commented 6 months ago

Tenant might be a reserved label iirc:

--receive.tenant-label-name="tenant_id" Label name through which the tenant will be announced.

irizzant commented 6 months ago

So, if I understand, it's impossible to use the --receive.tenant-label-name="tenant_id" because tenant_id is a reserved label?

irizzant commented 6 months ago

Also, Receive is adding a tenant label along with tenant_id in the tsdb metrics, is this expected?

benjaminhuo commented 3 months ago

@verejoel check out #7256

It's an important feature, so does this PR complete everything in this proposal? @GiedriusS @verejoel What else is still needed to have a tenant that can remote write evaluated metrics to different ingester based on specific label in metrics?

Thanks!

verejoel commented 3 months ago

@benjaminhuo I tried using this with the stateless ruler and the router/ingester setup. It seems to not work as expected, as the ALERTS metrics disappear. I still need to devote some time to work out why that might be the case - it could be they get written to some default tenant that is not quriable in our setup. Would be interested if anyone else has managed to get this working with the stateless ruler, as this is one of the ways to enable a multi-tenanted ruler based on internal metric labels.

yeya24 commented 3 weeks ago

@verejoel Is this still an issue for you to use this feature with stateless ruler? Stateless ruler labels are kind of hard to configure but if you can share your configuration, we can help you debug a bit.

It would be also nice to create a doc for users about how to troubleshoot this. I will create an issue

verejoel commented 3 weeks ago

@yeya24 that sounds good, I will dedicate some time to it this week. Let me know which issue and I will post my findings there