grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
4.03k stars 509 forks source link

Helm chart for monolithic and read-write deployment mode #4832

Open rubenvw-ngdata opened 1 year ago

rubenvw-ngdata commented 1 year ago

Is your feature request related to a problem? Please describe.

There is currently only a helm chart available for the full microservices deployment mode of Grafana Mimir. This is pretty exhaustive and results in a lot of pods. Ideally there would be an alternative to this.

Describe the solution you'd like

An separate helm chart or a deployment mode configuration in the chart to distinguish the deployment mode (could be in a similar way as what is available for loki). Ideally the alternative deployment solution also supports multi-AZ (where we are running one instance in each AZ)

Describe alternatives you've considered

The only alternative now is to run a minimalistic version of the mimir-distributed helm chart

Additional context

See also previous ticket on grafana/helm-charts: https://github.com/grafana/helm-charts/issues/1189

rubenvw-ngdata commented 1 year ago

I had some time to work on this, so I did a try to get this functionality myself (but I failed to get it fully working)

See PR https://github.com/grafana/mimir/pull/4858 (I know it is not ready, but sharing it, so you can help me on it)

dimitarvdimitrov commented 1 year ago

Thank you for the proposal and the draft PR. I appreciate the time spent. We've been experimenting with the two deployment modes and would like to explore them further as alternatives to microservices mode (maybe even "at scale"). We're not quite there yet, but these deployment modes are also not being deprecated soon.

However, there are some considerations we have to take into account before adding different deployment modes to the helm chart. A couple that come to mind now:

Most of these aren't trivial to answer and there will probably be divided opinions. At the same time we, at Grafana Labs, don't have much visibility into how much read-write or monolithic deployment modes will be used or how much they can scale.

As much as I hate to say it, keeping this functionality in a fork will be more pragmatic as it stands. You can publish the forked chart under a different name and we can track how much usage it gets. With time we can revisit and incorporate the changes in the mimir-distributed chart and share the maintenance efforts.

rubenvw-ngdata commented 1 year ago

Hi @dimitarvdimitrov ,

Thanks for your answer. I'm a bit disappointed though that you propose to leave it on a fork branch.

The most important reason to use mimir for us (and I don't think we are alone) is to make prometheus HA. With the microservices configuration this comes at a high maintenance level with a very fine grained configuration.

I understand that there are various things that you should think about when embedding it into the product; that's also why this is just a draft.

Have you been able to check the error message I was facing with the monolithic setup? I'm willing to continue, rename the chart and maintain the fork for the time being, but I could use a bit of help debugging through the issues that I'm facing (I don't know a lot the mimir internals).

dimitarvdimitrov commented 1 year ago

The most important reason to use mimir for us (and I don't think we are alone) is to make prometheus HA. With the microservices configuration this comes at a high maintenance level with a very fine grained configuration.

With the helm chart we are aiming to make this configuration less of a hassle. The defaults in the chart should work for most users. In addition to that monolithic and read-write deployments have the same configuration options as microservices. However, I can see how scaling up/out a microservices deployment is more complicated than scaling a monolithic deployment.

I left a comment on the draft PR wrt the "connection refused" error. I'm happy to help with answers when I can.

WoodyWoodsta commented 1 year ago

To add my two cents, since Grafana Loki already has the "read-write" mode and the helm chart for it, I was sort of expecting to be able to deploy Mimir in the same way if it contains the same component architecture (which is does). So I'm wondering if the considerations listed above are not the equivalents of what has already been done in Grafana Loki?

davinkevin commented 11 months ago

Monolithic mode is a very important (strategic?) deployment model IMO, because it makes able to start simple with it, and then increase the complexity if the product fits our needs.

ATM, without the monolithic mode, I don't see me deploying mimir or tempo in clusters I manage "just for evaluation purpose"… and so I start to look at other tool, even if I already run loki & grafana.

As a user, I don't expect any SLA or validation from this chart flavour, just a parameter to deploy it in "target=all".

rubenvw-ngdata commented 11 months ago

@davinkevin If you want to try out mimir in monolithic deployment mode, you can use our fork at https://github.com/NGDATA/mimir. Currently we only do internal releases, so if you want to use it, you will have to take care of the release process yourself.

The more usages of the fork, the more likely it gets that this gets embedded in the product.

mhoyer commented 10 months ago

I like the idea of providing one ore more less complex helm chart solutions for mimir. Why? Because we also tried to deploy the current mimir-distributed one and it was really though to walk through the values.yaml. Sure, the chart probably would have run out of the box, but a) we had to apply some modifications and b) my inner nerd wants to know what I am deploying. And here I didn't even look into the templates.

The complex mimir-distrubuted helm chart definitely has it's use case for larger production deployments. Though, the more simple rollout methods are valuable too. For beginners, but also for scenarios with lower performance requirements.

As the almost 4k lines long values.yaml is already overwhelming I suggest to really split up into separate helm charts before adding even more complexity to the existing one (with deployment method). This makes your lifes as maintainers easier and the ones of the consumers too, because they can decide upfront which sophisticated kind of helm chart to start with. In fact, they just have to deal with less complex values.yaml and may understand how the templates work (in case of an issue).

Regarding the sharing of common template functions you may follow a similar approach like Bitnami with a mimir-common helm chart? See https://github.com/bitnami/charts/tree/main/bitnami/common

davinkevin commented 7 months ago

@rubenvw-ngdata is the fork still maintained?

rubenvw-ngdata commented 7 months ago

It is, we are using it without issues. We do not follow all changes that happen on main immediately though. If there is something that is not working for you, let me know.

Ca-moes commented 1 month ago

Having a monolithic deployment for the helm chart would be awesome for the meta-monitoring chart

lieberlois commented 1 week ago

Is there any update on this? I really don't understand the decision to have the simplescalable variant for loki but not for mimir 😓

rorynickolls-skyral commented 3 days ago

This would be a useful feature where Mimir needs to be deployed for testing. We currently test our observability stack in CI and Mimir, even in a minimal distributed setup, consumes a lot of resources.

Loki can easily just run in SingleBinary mode for tests, and I had assumed the two would be configurable in the same way.