open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.97k stars 2.3k forks source link

New component: Log-based metrics processor #18269

Closed weyert closed 1 month ago

weyert commented 1 year ago

The purpose and use-cases of the new component

Log-based Metrics (logmetrics) analyses the received log records, and generates metrics from them.

Example configuration for the component

processors:
   logmetrics:
         pg_permissions_errors:
               type: counter
               filter:
                  -   from_attribute: textPayload
                       match: permission denied for table (?P<tableName>.*)
                       action: 
                             - type: add_attribute
                                name: db.table.name
                                value: $tableName

Telemetry data types supported

This processor would accept logs and create metrics

Is this a vendor-specific component?

Sponsor (optional)

No response

Additional context

The idea behind this proposed processor is to allow generating metrics from logs that have been sent to the collector using the new logs functionality. For example, in Google Cloud you have the ability to generate metrics based on the logs. I would love to have a similar solution for the collector.

A potential use case would be the ability to send all the logs to dedicated logs-collector which passthrough the logs to the appropriate logs backend (e.g. Google Cloud Logging) but at the same time generate log-bas metrics which then via the Prometheus Remote Write exporter get send to Prometheus.

atoulme commented 1 year ago

You might want to look at connectors, as a way to achieve this effect.

atoulme commented 1 year ago

Closing as connectors perform this work now, please take a look there. Please comment or reopen for clarifications or if I missed something.

cwegener commented 8 months ago

Only the spanmetricsconnector exists at the moment. Has a new logmetricsconnector been discussed at all before?

atoulme commented 8 months ago

depends what you're looking for, countconnector has log use cases.

cwegener commented 8 months ago

depends what you're looking for, countconnector has log use cases.

Yes, the counting of logs by attributes is covered in that connector and is the most widely applicable use case.

One additional feature that the spanmetricsconnector does is calculate duration.

To compare this to the logs signal, take the example of web servers and application servers (Websphere, Tomcat, HTTPd, IIS, etc.)

Such server logs oftentimes have duration measurements included in the log body as well. And those logs will also often have additional attributes associated with those duration measurements that otherwise would be unavailable from other sources. Examples of such attributes would be "client IP", "user name".

The metrics generated from the same web servers and application servers via a respective otelcol receive would typically not have these attributes included.

So, very similar to the spanmetrics, logmetrics would also provide the opportunity to create (request) duration histograms with these useful attribute labels given above, although span attributes are probably unlikely to include "client ip" and "user name". So, these examples would be exclusive to log attributes.

cwegener commented 8 months ago

Hmm .. so, I just found that there is an even older issue for this exact topic here: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/13530

And in the discussion in that other issue nobody ever create a 'New component' issue.

But it looks like this issue is actually the 'New Component' issue. Looks like maybe some wires got crossed.

djaglowski commented 8 months ago

I'm reopening this to represent the formal "New Component" issue requested on #13530. Please continue conversation about this proposal here.

verejoel commented 8 months ago

Hi folks, we're currently interested in building a logs -> metrics connector. We have a few specific use cases where we want to first parse values from logs, and then generate completely new metrics. Some examples include:

I'm currently working on the design for our specific use-cases, but I can see a need for a generic connector that can generate metrics from any telemetry signal.

I think a good approach would be to require that parsing and filtering of telemetry should be handled by dedicated processors (i.e. the transform and filter processors). Therefore, the connector will only build metrics based on attributes and resource attributes present in the telemetry payload. This will reduce the scope of the connector to manipulating attributes, and emitting the configured metrics, and not to be concerned with parsing or filtering of telemetry.

In our specific use case, we would like aggregated metrics to be flushed periodically through to a prometheusremotewrite endpoint, which then ships metrics into Thanos in our particular setup. However, I think a more useful approach would be to have the connector emit delta metrics like the count connector, so that it is inherently stateless, and then rely on the introduction of the accepted deltatocumulative and metricaggregation processors (#29300 and #29461) to convert the metrics into a Prometheus compatible format. In this way, the combination of this connector and those two processors should meet a wide range of potential use cases.

weyert commented 8 months ago

I would be interesting in a counter for matching strings of a log record. E.g. to count the number of errors in the postgres log file etc. So it can trigger alert manager alerts.

atoulme commented 8 months ago

You could have a log pipeline where you filter the logs down to what you want to alert on, and then use the countconnector.

cwegener commented 8 months ago
* extract the request duration from HTTP request logs and build a histogram

* extract the request size in bytes from an HTTP request log and increment a `bytes_total` counter

* build CPU/Memory gauges from legacy systems that report these data as log lines

Those describe the use cases that I am after as well. And I think the first two will have broad appeal. And variations of the third use case will occur quite often in my world as well.

djaglowski commented 8 months ago

You could have a log pipeline where you filter the logs down to what you want to alert on, and then use the countconnector.

I don't think the count connector can handle many of the use cases suggested here. It really is only useful for counting the number of instances of telemetry items that match some criteria. What I'm understanding from the mentioned use cases is that we need the ability to aggregate values within the telemetry.

Edit: I see that likely the suggestion was towards this comment:

I would be interesting in a counter for matching strings of a log record. E.g. to count the number of errors in the postgres log file etc. So it can trigger alert manager alerts.

Count connector should be able to support this today, and you don't need to pre-filter the data, just specify matching criteria for the count metric you want to generate.

djaglowski commented 8 months ago

I think a good approach would be to require that parsing and filtering of telemetry should be handled by dedicated processors (i.e. the transform and filter processors).

This probably goes a little further than necessary and may actually complicate the problem. Specifically, filtering of telemetry should not be necessary ahead of time, since any criteria which would be used to filter can also be used to select the appropriate telemetry. This can be done with OTTL in the same way as count connector.

OTTL can also help us with accessing fields, and I don't think we necessarily need to place constraints on where the field is found. Instead, we should only constrain the type of value we expect to find at any field accessible by OTTL. So for example I may have a numeric field in the body of my logs and just want to refer to it using OTTL.

matt-mercer commented 6 months ago

I'd like to add my support for this as a specific collector, in terms of functionality, I'd like to see inclusion of the histogram type exporters (inspiration could be taken from https://docs.fluentbit.io/manual/pipeline/filters/log_to_metrics) where high frequency samples can be aggregated into a bucketed histogram

weyert commented 6 months ago

Sorry, what is OTTL?

proffalken commented 5 months ago

Also interested in this, particularly the idea of being able to extract metrics directly from within a log line, for example when given a log line of <timestamp> <level> <hostname> success=1 items_processed=925 avg_process_time=0.25, I'd like to convert success, items_processed, and avg_process_time into metrics with their values from the log line rather than just a count that they have been seen n times.

FWIW, this is how Vector does it

manojksardana commented 5 months ago

also interested in this. specially when there is no official support for events in open telemetry. we get events from various sources which reports sales, orders etc. This events are now getting ingested as log entries and there is a need to create metrics like total sales or orders over a period of time. Such a connector will help achieving the goal.

github-actions[bot] commented 3 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] commented 1 month ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.