grafana / mimir

Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus.
https://grafana.com/oss/mimir/
GNU Affero General Public License v3.0
4.07k stars 518 forks source link

Disallow changing some options unless a `--dangerous` flag is provided #3528

Open colega opened 1 year ago

colega commented 1 year ago

Is your feature request related to a problem? Please describe.

Mimir is full of configuration options, and we don't consider some of them being valid for production use cases.

Some examples of those are changing TSDB block ranges or using a replication factor of 2.

Current approach is to hide those options in the docs or mark them as experimental, however this doesn't mean that customers can't change them. Running Mimir with those settings changed, or changed to unsupported values causes unexpected behaviors which end up in support cases and unhappy community users who can't find answers to their issues as their configuration is unique.

On the other hand, we don't want to remove those options completely, as we often run internal experiments trying to find out better configuration combinations: once we've checked than they work, we ship them to the community with enough confidence to recommend them or even changing the default values.

Describe the solution you'd like

We've discussed during our offsite to have a behavior similar to Tanka's:

$ tk show environments/xeon.colega.eu/mimir | cat  
Redirection of the output of tk show is discouraged and disabled by default.
If you want to export .yaml files for use with other tools, try 'tk export'.
Otherwise run tk show --dangerous-allow-redirect to bypass this check.

I.e., I'm proposing to introduce a flag called --dangerous-unsafe-configuration (or something more scary, if possible) that would have to be set in order to change some configuration options, two examples of that would be:

When the flag --dangerous-unsafe-configuration is provided, it would issue a big warning in the logs during the startup, which would simplify debugging issues.

Following the usual deprecation path, we would allow running without that flag set for two more versions, although we will issue a big warning at the application start.

Describe alternatives you've considered

Just remove those options. Build ad-hoc Mimir image when we need to change them.

pracucci commented 1 year ago

Makes sense to me. I would allow to change any config option marked as "experimental" only when the "unsafe" flag is enabled, in addition to few others (e.g. replication factor = 2).

I'm proposing to introduce a flag called --dangerous-unsafe-configuration (or something more scary, if possible) When the flag --dangerous-unsafe-configuration is provided, it would issue a big warning in the logs during the startup, which would simplify debugging issues.

I think would be important to communicate that we don't provide support if you change any of those.

colega commented 1 year ago

Maybe we can call the flag --dangerous-unsupported-configuration then?

RichiH commented 1 year ago

Is everything in experimental considered dangerous & unsupported?

A more neutral and descriptive option such as --nonproduction-development-mode or even --nonproduction--i-know-and-accept-the-risks might make sense?

colega commented 1 year ago

I don't think this would necessarily apply to all experimental flags, but I do want to underline that some configuration values (or tweaking them) is dangerous and unsupported.

Using one or another option name isn't important to me: I'd use a more aggressive version, but I also understand that some people take "aggressive" as "unnecessarily very aggressive".

RichiH commented 1 year ago

Are all dangerous flags experimental? If yes, would it make sense to group them all into a dangerous?

I might be overthinking things; I am trying to write docs in my head. Adding another group is easier to explain than introducing an orthogonal measurement.