open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.73k stars 2.16k forks source link

New component: Sum Connector #32669

Open greatestusername opened 2 months ago

greatestusername commented 2 months ago

The purpose and use-cases of the new component

Sum Connector takes in logs, metrics, or traces and matches an attribute then allows summing numerical values present in that attribute and sending those sums as a time series metric along with any attributes defined within the connector config.

E.G. Log contains a field for total_price of a cart. Matching on total_price will take numerical values from this field and emit a time series metric of the sums along with any other attributes defined in the connector config.

This will function similarly to -- and be patterned after -- the current countconnector but emitting sums of values rather than counts of occurrences.

Example configuration for the component

receivers:
  foo:
connectors:
  sum:
    logs:
      field.to.metric.sum:
        conditions:
          - attributes["boop"] != "NULL"
        attributes:
          - key: boop
            default_value: unspecified_level
          - key: env
            default_value: no_env
exporters:
  bar:

service:
  pipelines:
    metrics/sum:
       receivers: [sum]
       exporters: [bar]
    logs:
       receivers: [foo]
       exporters: [sum]

Telemetry data types supported

In: Traces, metrics, logs Out: Metrics

Is this a vendor-specific component?

Code Owner(s)

greatestusername, shalper2

Sponsor (optional)

No response

Additional context

No response

crobert-1 commented 1 month ago

Hello @greatestusername, can you share a bit more details on the proposed configuration? I'm not sure I follow the proposed options and how it would impact the resulting output.

greatestusername commented 1 month ago

Thanks @crobert-1 I'm pasting a slightly modified config to better illustrate. Imagine I'm sending logs from a checkout service through the OTel collector. I want to use the total.payment attribute from those logs to output a sum of total.payments as a time series metric called checkout.total. I also want to include attributes on that metrics for payment.processor that will default to a value of unspecified_processor if no payment.processor attribute value is available. Similarly I want to include an env attribute on the metric.

Let me know if you need further clarification? Thanks!

receivers:
  foo:
connectors:
  sum:
    logs:
      checkout.total:
        conditions:
          - attributes["total.payment"] != "NULL"
        attributes:
          - key: payment.processor
            default_value: unspecified_processor
          - key: env
            default_value: no_env
exporters:
  bar:

service:
  pipelines:
    metrics/sum:
       receivers: [sum]
       exporters: [bar]
    logs:
       receivers: [foo]
       exporters: [sum]
crobert-1 commented 1 month ago

Thanks for sharing another example, that was helpful for me to better understand. A couple more questions:

  1. For your given example, shouldn't we also specify source_attribute or something like that? Maybe I'm missing something, but the given config doesn't seem to specify where the actual value is coming from, it just filters logs where total.payment is NULL.
  2. I assume the default configuration for this component would be no options specified, and then every attribute would be summed, is that accurate? Then if a single new metric is defined, do we only keep that, and drop the rest?
crobert-1 commented 1 month ago

I'll sponsor this component 👍

greatestusername commented 1 month ago

Thanks for sharing another example, that was helpful for me to better understand. A couple more questions:

  1. For your given example, shouldn't we also specify source_attribute or something like that? Maybe I'm missing something, but the given config doesn't seem to specify where the actual value is coming from, it just filters logs where total.payment is NULL.
  2. I assume the default configuration for this component would be no options specified, and then every attribute would be summed, is that accurate? Then if a single new metric is defined, do we only keep that, and drop the rest?

Thank you for offering to sponsor this!

Also thank you for pointing this out! Yes a source_attribute does make sense! I was initially using the conditions and looking for the field in the conditions. So yes I think a single source_attribute setting to define which attribute to look for the numerical value. Then any conditions are just conditions rather than where the value is found.

I think having source_attribute would also answer question 2? We'd only ever metricize the defined attribute then sum those values into a single metric.

Example config with source_attribute:

receivers:
  foo:
connectors:
  sum:
    logs:
      checkout.total:
        source_attribute: 
          - attributes["total.payment"]
        conditions:
          - attributes["total.payment"] != "NULL"
        attributes:
          - key: payment.processor
            default_value: unspecified_processor
          - key: env
            default_value: no_env
exporters:
  bar:

service:
  pipelines:
    metrics/sum:
       receivers: [sum]
       exporters: [bar]
    logs:
       receivers: [foo]
       exporters: [sum]

Does that make sense?

crobert-1 commented 1 month ago

Do you think there would be value in having some default behavior for a no configuration option? Something like:

connectors:
  sum:
    logs:

It would result in a lot of internal logic and decision making that we'd have to work through, but maybe it would be useful for someone?

greatestusername commented 1 month ago

In that sort of case where would it be getting the numerical value to sum? All attributes with numerical values? I'm not sure how that would work exactly. But interested in hearing more on the idea!

crobert-1 commented 1 month ago

It might be some kind of check if a value can be converted to a number, and keep it if successful. It's not a requirement for this component by any means though, and may be more complicated than it's worth.

You're welcome to start submitting PRs, I think we've got a good outline here!

greatestusername commented 1 month ago

Awesome! I'll get a wireframe PR up in the next couple days so I can get started on the meat. Thank you! And apologies for my tardy reply.

greatestusername commented 1 week ago

Edit: I've got a PR up that should work

I'm having some trouble getting tests for connectors to pass in the initial PR. I'm just trying to get a wireframe going but am hitting failures on tests like TestComponentLifecycle/logs_to_metrics and other telemetry generation checks should I just be stubbing these for now so the initial PR isn't so large?