Open saswatamcode opened 2 years ago
We discussed this briefly with @saswatamcode with one more suggested alternative from me, which would be to have a separate remote write config for each tenant, set the tenant header and use relabeling to only forward metrics which are applicable to that tenant. However, this is not really a systematic solution and require to always manually set up the remote write config for each tenant. The proposal solution seems reasonable to me :+1:.
Hey, just trying to understand the main problem we are discussing here.
The only possible way might be using a Ruler for each tenant which is simpler but wasteful of resources.
Do we have any data on this? Because for stateless rulers there is not much baseline overhead for this situation. I would even say, the more problematic thing is the extreme situation where one tenant has too many rules and alerts for one ruler.
A potential solution would be using the Receive multitsdb in Ruler and having the same flags for tenancy as Receive
Do you mean sending things to Receive that uses multitsdb
or literally using multitsdb
code?
This would start an agent, i.e, a WAL-only storage for each tenant which remote-writes to only locations that were configured for that tenant. In essence, a multiagent package, would be needed to be able to handle this.
I would really avoid doing that - multi-tsdb is already a tough idea - every new TSDB has a lot of costs to be started and reloaded. Not sure if we want to replicate this idea for agent code.
Also, in the case of using Stateless Rulers, it's harder to achieve multi-tenancy, as different tenants might need different configurations while remote writing (write to separate locations with separate HTTP headers like THANOS-TENANT).
Right. We need essentially something like this:
I feel we should have multi-tenant rulers that can do any number of tenants rules (tenant agnostic) and we build tenancy with label aware sharding on receiver. Receive router already checks EACH series in write request and distribute with hashring - so why not checking tenant label there?
Hello π Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! π€
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Hello π Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! π€
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind
command if you wish to be reminded at some point in future.
Would love to see this moving forward. A general sharder is really something we need in Thanos. Cortex has something similar like this using the Ring. In Thanos, we have the hashring only on the receiver side. However, if we want to distribute works like rules, compaction jobs, etc. We don't have a good way now.
Yup! I'm writing a proposal + poc for this currently. Will land soon! π
Yup! I'm writing a proposal + poc for this currently. Will land soon! π
Looking forward to this feature!
How is ruler sharing going? :-) As a cortex user, this feature was useful.
I feel we should have multi-tenant rulers that can do any number of tenants rules (tenant agnostic) and we build tenancy with label aware sharding on receiver. Receive router already checks EACH series in write request and distribute with hashring - so why not checking tenant label there?
Does https://github.com/thanos-io/thanos/pull/7256 already implement this feature? @bwplotka @GiedriusS
Is your proposal related to a problem?
Currently, the Thanos Ruler has no built-in support for multi-tenancy like Receive. This creates issues when running it in a setup where we want to isolate tenants and store their rule-evaluated metrics in a different
tsdb
instance each. The only possible way might be using a Ruler for each tenant which is simpler but wasteful of resources.Also, in the case of using Stateless Rulers, it's harder to achieve multi-tenancy, as different tenants might need different configurations while remote writing (write to separate locations with separate HTTP headers like
THANOS-TENANT
).For example, consider a Receive with multiple tenants, to which a single Ruler might need to remote-write multi-tenant rule-based metrics and store it in the tenant's Receive
tsdb
. But in this case, the Ruler cannot add HTTP headers for each tenant, so it is treated as a completely new default tenant by Receive and newtsdb
gets created.(Note: This is a separate problem from ensuring that Ruler only selects data from one tenant while evaluating rules.)
Describe the solution you'd like
A potential solution would be using the Receive
multitsdb
in Ruler and having the same flags for tenancy as Receive (--receive.default-tenant_id
,--receive.tenant-label-name
). So the Ruler would be tenant-aware and store evaluated metrics in a differenttsdb
instance for each tenant using thetenant_id
label to identify what rule-based series belongs to which tenant (assuming that the rule file configuration will specify the tenant label for each rule).This can be extended to Stateless Ruler and allow separate remote write configs for each tenant. This would start an
agent
, i.e, a WAL-only storage for each tenant which remote-writes to only locations that were configured for that tenant. In essence, amultiagent
package, would be needed to be able to handle this.The addition of
multitsdb
to Ruler can also be skipped as the Scalable Rule proposal does mention the removal of embeddedtsdb
to be in the work plan! :)Describe alternatives you've considered
Running a Ruler for each tenant.
Open to feedback and suggestions! If there are existing solutions/configuration options for achieving the same result which will be easier to implement than the above idea, that would be great too! π