[cmd/mdatagen] Ability to disable/enable all metrics

dmitryax commented 9 months ago

Problem

We have a few issues when it's not desirable to configure the emitted metrics and resources starting from the "default" set. Sometimes, it's needed to start with an empty set and enable just a few metrics or start with enabling all metrics/resources (default+optional) and explicitly disabling a few of them.

This is needed at least for:

Possible solutions

Given that the metrics and resource_attributes interfaces provide a map name: config, we need to introduce special metric names that can be applied to a group of metrics/resources.

Option 1

Have a predefined special metric name that cannot collide with existing names, e.g., _ALL.

metrics:
  _ALL:
    enabled: false
  metric1:
    enabled: true

Option 2

Support glob-style * in metric names. Using * in metric/attribute names isn't valid according to the OTel specification, so it cannot cause collisions. Providing this capability makes the user interface more powerful, as users will be able to disable/enable groups of metrics by their namespaces.

metrics:
  '*':
     enabled: false
  metric1:
    enabled: true

Providing this interface can cause collisions if the several globs match the same metric (e.g. * and system.* match any system.* metric). Once such collisions are detected, the config validation should fail because there is no way to clearly define precedence with a map.

This approach requires an evaluation from the implementation perspective. It might over-complicate the mdatagen logic.

TylerHelmuth commented 9 months ago

I really want this feature.

The glob-style matching is really interesting. It would easily let you turn off a set of metrics you don't want, like *__replicas to turn off all replica metrics from the k8sclusterreceiver. As long as we can detect collisions during startup I really like that option.

I think we should be very strict about not introducing regex as any part of this solution now or in the future. I know it is common to take the step from glob to regex eventually, but I don't think that'd be a good idea.

dmitryax commented 9 months ago

I think we should be very strict about not introducing regex as any part of this solution now or in the future. I know it is common to take the step from glob to regex eventually, but I don't think that'd be a good idea.

Right, I was thinking we can support * characters only, nothing else. But since other glob characters like ? [] are also invalid in OTel naming conventions, maybe we can just adopt glob as is. Definitely never regex.

TylerHelmuth commented 9 months ago

Supporting full glob is probably ok.

TylerHelmuth commented 9 months ago

If we used globs, I think we need more rules for how clashes interact. It feels like wanting to do something like

metrics:
  '*':
     enabled: false
  "k8s.pod.*":
    enabled: true

could arise to turn off all metrics except pod metrics

braydonk commented 9 months ago

I opened a related PR here which I was hoping I could leverage in the processesscraper deprecation work: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/30995

I would be in favour of ditching this and going for the glob style instead. I think that would be very powerful. I'd be happy to take on the implementation.

Unless there exists a Go library for it that I'm not aware of, I'd implement glob matching with *, ?, and [] and then implement detecting globs and matching against metric names when accepting a metric builder config.

For clashes, I would need to look at this again but when the metrics builder config is passed to, is the order as written in the config file maintained? Because in that case I think it would make the most sense to make it so globs are matched in the order they appear in the config.

dmitryax commented 9 months ago

For clashes, I would need to look at this again but when the metrics builder config is passed to, is the order as written in the config file maintained? Because in that case I think it would make the most sense to make it so globs are matched in the order they appear in the config.

@braydonk It's not maintained. It's map

dmitryax commented 9 months ago

If we used globs, I think we need more rules for how clashes interact. It feels like wanting to do something like

Do you mean we define that enabled: true takes precedence over enabled: false? What about other configuration options that we can add in the future? Like applying some aggregation. It's problematic even for enabled: if we allow clashes, I'd see users trying something like

metrics:
  '*':
     enabled: true
  "k8s.pod.*":
    enabled: false

which won't work for them in that case.

Trying to be smart and set precedence based on the size of the matching metrics set seems to be an overkill. I think we should start with not allowing clashes.

TylerHelmuth commented 9 months ago

Not allowing clashes at the start would be really nice, but I think will take away a lot of the use cases. The implementation will help users who want to turn on all their metrics, but not users who want to turn off most of the metrics. Maybe that is ok?

dmitryax commented 9 months ago

The implementation will help users who want to turn on all their metrics, but not users who want to turn off most of the metrics.

Why not? You just disable all of them with * and enable some of them explicitly by names

dmitryax commented 9 months ago

Another option is to allow clashes but specify priority explicitly as an additional field e.g.:

metrics:
  '*':
    matching_order: 1
    enabled: true
  "k8s.pod.*":
    matching_order: 2
    enabled: false

Seems a bit ugly to me

TylerHelmuth commented 9 months ago

You just disable all of them with * and enable some of them explicitly by names

Oh I was under the impression that would be considered a clash. If that is allowed we're good - no clash support needed.

github-actions[bot] commented 9 months ago

Pinging code owners for cmd/mdatagen: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

dmitryax commented 8 months ago

@braydonk will you be able to take this?

braydonk commented 8 months ago

Yes I will take it.

braydonk commented 7 months ago

I have the wildcard matching working. I implemented two types of matching:

* - 0 to n of any character {a,b,...} - A collection of multiple match options

I can implement any other types of wildcards if it would make sense, but every use case I can think of only really benefits from these two options.

The matching also has a priority system where the earlier wildcards are applied first. This way the scenario where we see something like:

'*':
  enabled: false
'process.memory.*':
  enabled: true

This will work in a deterministic and (I think) sensible order; all metrics will be disabled, then all process.memory metrics will be enabled.
I'm working on writing a proper specification for the format and priority orders to iron out the edge cases.

The next step is to determine how this code is included in a generated metadata package. I see two reasonable possibilities:

The wildcard matching code is included directly as another .go.tmpl file (or in an existing one) and it's used locally within the generated package.
A go module is added somewhere under go.opentelemetry.io and the package is imported by generated code.

Option 1 is definitely easier to implement, but working with the code in template files might be frustrating for future maintenance. But this also probably doesn't deserve to be a new module because it seems pretty specific to mdatagen.

I'm going to start by implementing option 1 in a branch so I can put something up, but want to leave the possibility of option 2 open for now.

braydonk commented 6 months ago

I have opened https://github.com/open-telemetry/opentelemetry-collector/pull/10065

I ended up going with option 1 from my comment above for the PR.

P.S. This issue should probably be moved to opentelemetry-collector since this component has moved there now.

dmitryax commented 6 months ago

@braydonk, I don't think we can rely on the order of the map keys because it's not supposed to be preserved

braydonk commented 6 months ago

The PR I posted takes that into account and works regardless of the order. There's a custom priority system. The priority of the pattern goes down for every . in the name, so * will be viewed as higher priority than x.* regardless of what order the patterns are read.

dmitryax commented 6 months ago

How does it work with the {...}? Which one wins compared to *? What if I have prefix.* and *.suffix?

braydonk commented 5 months ago

I have written a full specification at https://gist.github.com/braydonk/ccb6775331fdd5dca91a650330b9839f.

dmitryax commented 5 months ago

Thank you for putting the doc! Once we have the PR merged, please submit a PR to add this in cmd/mdatagen

open-telemetry / opentelemetry-collector