grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
3.86k stars 468 forks source link

Multi-Tenancy Support for Mimir Ruler #8165

Open algo7 opened 1 month ago

algo7 commented 1 month ago

Is your feature request related to a problem? Please describe.

It is frustrating that you can't use Mimir ruler with multiple tenants. We are using Grafna Mimir with Grafana Alert and on the Grafana Alert UI it shows

Failed to load rules state from [Mimir](https://xxx.example.com/datasources/edit/PAE45454D0EDB9216): {"status":"error","data":null,"errorType":"server_error","error":"no valid org id found"}
Failed to load rules config from [Mimir](https://xxx.example.com/datasources/edit/PAE45454D0EDB9216): {"status":"error","data":null,"errorType":"server_error","error":"no org id"}

In the Mimir Ruler logs:

 caller=spanlogger.go:109 method=API.PrometheusRules level=error msg="error extracting org id from context" err="multiple org IDs present"
 caller=logging.go:126 level=warn trace_id=0b42dc380bb6d43a msg="GET /prometheus/api/v1/rules (500) 174.261µs Response: \"{\\\"status\\\":\\\"error\\\",\\\"data\\\":null,\\\"errorType\\\":\\\"server_error\\\",\\\"error\\\":\\\"no valid org id found\\\"}\" ws: false; Accept-Encoding: gzip; Connection: close; User-Agent: Grafana/10.1.5; X-Forwarded-For: x.x.x.x; X-Forwarded-Host: xyz.example.com; X-Forwarded-Port: 443; X-Forwarded-Proto: https; X-Forwarded-Server: traefik-xxxxxxxx; X-Grafana-Referer: ; X-Real-Ip: x.x.x.x; X-Scope-Orgid: tenant1|tenant2|tenant3; "

and is unable to utilize Mimir Ruler

Describe the solution you'd like

When adding Mimir as a datasource in Grafana, one can add X-Scope-OrgID in the header with tenants being a | separated list for getting metrics across multiple tenants. Mimir Ruler should be able to support multi-tenant request just like it does for the metrics.

Describe alternatives you've considered

  1. Create one datasource per tenant on Grafana. However, this can quickly get messy and complex as the number of tenants scale.
  2. Using Federated Rule Group: there is a page on Mimir's official documentation https://grafana.com/docs/mimir/latest/references/architecture/components/ruler/#federated-rule-groups mentioning federated rule group. However, there is no clear mentioning of where the rule group should be configured.

Additional context

There was an old issue here, which was closed without specific reason: https://github.com/grafana/mimir/issues/6020

There is a post on Grafana Community with no replies: https://community.grafana.com/t/alerting-plugin-with-multi-tenant-mimir-ruler/112040

pstibrany commented 1 month ago

Mimir Ruler should be able to support multi-tenant request just like it does for the metrics.

How exactly should Mimir Ruler endpoint for listing rules work when it finds multiple tenants in X-Scope-OrgID header? List rule groups from all tenants? Then we run into possibility of conflicting namespaces. Shall endpoint create logical hiearchy with tenant in the top-level? Then view would be different between single-tenant and multi-tenant and all clients would need to understand both.

What about endpoint for setting rule group, or deleting namespace? Should the modification be applied to the first tenant? All tenants?

These questions need to be answered before we can add support for multi tenancy into Ruler API.

algo7 commented 1 month ago

Mimir Ruler should be able to support multi-tenant request just like it does for the metrics.

How exactly should Mimir Ruler endpoint for listing rules work when it finds multiple tenants in X-Scope-OrgID header? List rule groups from all tenants? Then we run into possibility of conflicting namespaces. Shall endpoint create logical hiearchy with tenant in the top-level? Then view would be different between single-tenant and multi-tenant and all clients would need to understand both.

What about endpoint for setting rule group, or deleting namespace? Should the modification be applied to the first tenant? All tenants?

These questions need to be answered before we can add support for multi tenancy into Ruler API.

I see. Then maybe it will make sense to include the challenges you described above in the documentation so it's clear to everyone.

It's mentioned in the official documentation

Grafana Mimir is a multi-tenant system where tenants can query metrics and alerts that include their tenant ID. The query takes the tenant ID from the X-Scope-OrgID parameter that exists in the HTTP header of each request, for example X-Scope-OrgID: . You can federate queries across multiple tenants by using true in -tenant-federation.enabled=true. When you specify tenant IDs, separate them with a pipe (|) character in the X-Scope-OrgID header, as in the example X-Scope-OrgID: tenant-1|tenant-2|tenant-3.

Source: https://grafana.com/docs/mimir/latest/manage/secure/authentication-and-authorization/

It's easy for people to assume that all components in the Grafana Mimir system support multi-tenancy until they realize that's not the case during implémentation.

Once it's made cleared, maybe the community will be able to offer some creative solutions.

I start the issue with "it's frustrating..." because it's part of the template provided. While it's actually frustrating, I do appreciate the work being done at Grafana Lab and there's no offense.

Hopefully there will be solutions or easier alternatives to this problem soon.

derek-cadzow commented 1 month ago

Assigning to myself so that we can assign these to the new Mimir tech writer when she is sufficiently through onboarding.