open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.07k stars 2.37k forks source link

[exporter/awsemf] Add StorageResolution config to support high-resolution metrics #29506

Closed bviolier closed 4 weeks ago

bviolier commented 11 months ago

Component(s)

exporter/awsemf

Is your feature request related to a problem? Please describe.

We want to be able to push high-resolution metrics from the opentelemetry-collector. This means that we should be able to set StorageResolution to 1.

Describe the solution you'd like

Have the option in the AWSEMF config to turn on high-resolution metrics through the StorageResolution config.

Describe alternatives you've considered

There doesn't seem to be another way of doing this, except for creating a new exporter that uses either this setting, or directly uses PutMetricData, which would mean a completely new exporter.

Additional context

I will be creating a PR soon.

github-actions[bot] commented 11 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

bryan-aguilar commented 11 months ago

More information on storage resolution can be found here for anyone interested.

@bviolier what is your idea for the configuration changes necessary to support this? We would need to be able to specify metrics by their name and then give them an associated storage resolution right?

I could see how this could be a useful feature but understanding the configuration required for it first would be a good start.

bviolier commented 11 months ago

Thanks for the quick reply @bryan-aguilar. I created a PR for it. I chose to put it at the same level as things like dimension_rollup_option, detailed_metrics and log_retention as that seemed like things on the same configuration level.

Also, I am assuming (as was done before, I believe) that people can create multiple exporters if they want different settings for different metrics.

If you think it should look different, let me know. I'll try to devise a different way of setting this configuration value.

bryan-aguilar commented 11 months ago

I had thought about how we could adopt storage resolution in the past and I thought that having a global storageresolution option would lead to more user pain. With a global storage resolution flag like what was proposed in the PR, users would have to configure a high and low storage resolution pipeline in their collector config. Then, users would have to drop high resolution metrics from the low resolution pipelines and vise versa. Is my understanding here correct? If you did not drop you could end up with duplicated metrics right?

I think it would behoove us to consider other options for this. Here are two thoughts that come to mind

  1. A configuration option where you can supply a list of metric name which will have high storage resolution applied. A list of regex strings should be accepted.

    awsemf:
    high_resolution: 
    - metric_name_foo
    - metric_name_bar_selector*
  2. We could also look for a high resolution, or storage resolution, attribute. If it is set then we would add the high or low storage resolution to the metric. The benefit of this is that this could be done at the collector level using available processors, and would require no configuration change, or at the point at which the metric is generated. The downside would be that relying on a metric attributing name starts to move into the territory of semantic conventions and we may want to consider ironing the naming first.

bviolier commented 11 months ago

You replied on the PR saying you think it shouldn't be a top-level configuration (like I suggested above).

The other option I can think of is making it part of metric_declarations with as a new attribute called storage_resolution, that gives you the freedom to define it for every metric you want.

The downside is that it does force you to define the metrics in metric_declarations if you want to use the storage_resolution.

An example would then look like this:

exporters:
  awsemf:
    log_group_name: '/containers/{ClusterName}/performance'
    log_stream_name: '{TaskId}'
    dimension_rollup_option: NoDimensionRollup
    metric_declarations:
      - dimensions: [ [ ClusterName, TaskDefinitionFamily ] ]
        storage_resolution: 1,
        metric_name_selectors:
          - MemoryUtilized
          - MemoryReserved
          - CpuUtilized
          - CpuReserved
bviolier commented 11 months ago

I think doing an attribute might get hairy pretty quickly indeed. It does give you the power on the client to force specific metrics into high-resolution, but in our case, we specifically wouldn't want the client to have that power (that being said, that can easily be fixed with filters also).

The high-resolution attribute with a list seems like a good option, but I didn't think that would be acceptable as the value on AWS is 1 or 60 (and specifically not true/false). And I assumed there was a reason behind it (to add more values in the future maybe?).

bryan-aguilar commented 11 months ago

The other option I can think of is making it part of metric_declarations with as a new attribute called storage_resolution, that gives you the freedom to define it for every metric you want.

Putting it as part of metric declarations may be a good idea also. It may be even more appropriate then my suggestion also. Putting it within the metric declarations object would give more flexibility.

The downside is that it does force you to define the metrics in metric_declarations if you want to use the storage_resolution.

Agreed, but I think this is an acceptable tradeoff when comparing the user experience of a top level storageresolution configuration option.

I think doing an attribute might get hairy pretty quickly indeed. It does give you the power on the client to force specific metrics into high-resolution, but in our case, we specifically wouldn't want the client to have that power (that being said, that can easily be fixed with filters also)

I think it's fine to strike that idea, especially if it does not fit your use case. I have not heard a request for this either and was just throwing out ideas. The good thing is that this can always be added in the future if we do get a feature request for this functionality.

I didn't think that would be acceptable as the value on AWS is 1 or 60 (and specifically not true/false). And I assumed there was a reason behind it (to add more values in the future maybe?).

This is a good call out and I don't know the exact reasoning for that. I can see if I can find out though. We shouldn't back ourselves into a corner right away if there's the chance that there could be additional values in the future.

bviolier commented 11 months ago

I am up for updating the PR with either the array option (assuming true/false is ok) or going the metric_declarations way! Let me know what you think will be the best option.

On our side, we are now testing this on sandbox (with the current PR) and our Alarm for auto-scaling ECS containers went from 220+ seconds to between 40 and 50 seconds after load increase.

bryan-aguilar commented 11 months ago

I may need to think on this a bit more but I'm leaning toward making this a configurable part of the metric_declarations object. I think the flexibility of being able to tie storage resolution to a metric directive object is what we would want.

There is also another option we hadn't considered. There is a metrics descriptor configuration option. It seems appropriate that storage resolution should be included in that.

bviolier commented 11 months ago

The Metrics Descriptor is interesting indeed! I was thinking though that the metric-directive also gives you the option to store the same metric in different resolutions when using different dimensions - which is also an interesting case.

Just let me know when you have come to a conclusion and I will pick this up again. No hurry from my side, as we can already use it by building our own collector.

bryan-aguilar commented 11 months ago

I think both capabilities can work together. There may be times when using the metric descriptor field that you need to overwrite a resolution of a metric in all dimension sets. Having that option in metric descriptor provided a nice option.

But also having the ability to configure it per metric/dimension set is useful.

Also, I think if a storage resolution for a metric is configured in both the directive and descriptor then the descriptor would take priority. Would you tend to agree with that order of precedence?

bviolier commented 11 months ago

It sounds like a good plan to implement in both, indeed.

I would, however, have it the other way around. The directive is the most precise one, you can create "new metrics" out of a set of metrics + dimensions. You might want to have different storage-resolutions for different sets of dimensions. I could see myself setting a storage-resolution with the descriptor that says all CPU resources should be high-resolution, but then having a set of dimensions set to low-resolution through the directive.

bryan-aguilar commented 11 months ago

I see your point and I agree that it would make more sense to have the directive field take precedence. Thanks you for the feedback!

bviolier commented 11 months ago

I did a first pass for moving the storage resolution. Please let me know what you think, I had to dive deep a bit for the declaration to work, but I got something working. Open to improve, of course.

I didn't test it locally, but I updated the tests.

bviolier commented 9 months ago

@bryan-aguilar it seems the PR has been closed due to inactivity. Do you know when you have some time to review and get this merged? Thanks!

zehsor commented 8 months ago

@bryan-aguilar any updates on this? I'd love to help if there's anything needed, we're desperate for this functionality!

zehsor commented 6 months ago

@Aneurysm9 @shaochengwang @mxiamxia @bryan-aguilar any updates on this?

zehsor commented 4 months ago

@Aneurysm9 @shaochengwang @mxiamxia @bryan-aguilar it feel like this is not maintained anymore at all after not receiving any answer for almost half a year..

github-actions[bot] commented 2 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 4 weeks ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.

jpbarto commented 2 weeks ago

@bryan-aguilar this feature is still needed by users of opentelemetry. Is there any update on its implementation?