grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.87k stars 3.45k forks source link

[Grafana Cloud OTLP Endpoint] Support specifying OTel Resource Attributes promoted as Loki labels #13044

Open cyrille-leclerc opened 5 months ago

cyrille-leclerc commented 5 months ago

Is your feature request related to a problem? Please describe.

Context: As discussed with @sandeepsukhani and many others, we want to simplify Loki's OpenTelemetry ingestion path and move away from the otel2loki converters available through the OpenTelemetry Collector Loki Exporter and the Alloy otelcol.exporter.loki in favor of the newly introduced Loki OTLP Endpoint.

However, we have identified the limitation to specify OTel resource attributes that should be promoted as Loki labels:

  1. If self managed Loki supports overwriting the default list of resource attributes that are promoted as labels through distributor: otlp_config / default_resource_attributes_as_index_labels (docs here), Grafana Cloud Logs and the Grafana Cloud OTLP Endpoint does provide such a stack wide config option.
  2. The Loki OTLP Endpoint doesn't offer mechanisms for the logs ingestion pipeline to specify additional resource attributes to promote as labels similar to the loki.resource.labels attributes that was available when using the OpenTelemetry Collector Loki Exporter

Describe the solution you'd like

I would like

Describe alternatives you've considered

Continue to do the otel2loki conversion through the OpenTelemetry Collector Loki Exporter and Alloy otelcol.exporter.loki but it's more burden put on the Loki users and none of these converters leverage Loki V3 metadata.

Additional context

Similar to the problem Grafana Labs Community - Add additional index labels in Loki 3.0 via OTLP

fredrikgh commented 3 months ago

Given the removal of the lokiexporter in September, this feature gap hits us pretty hard as Grafana Cloud users. Any update on the possibility of promoting resource attributes to indexed labels on Loki using the OTLP exporter & endpoint?

cyrille-leclerc commented 3 months ago

@fredrikgh can you please help us understand what kind of attribute you want to promote as Loki labels? Are they additional standard resource attributes? custom resource attributes? What information do these attributes convey?

@stevendungan can you please help here?

martinsson commented 2 months ago

Following your question @cyrille-leclerc, here's my use-case for what it's worth

I wanted to use this functionality as a way to circumvent the fact that le level field is no longer present. It's replaced with detected_level in loki 3.1 but that is not supported by grafana and is not indexed. There's a bug for this of course.

I also have a custom field that I use a lot in the dropdown in the explore view, similar to service_name. If my field cannot get indexed I won't have it in the drop-downs in grafana cloud.

adrielp commented 2 months ago

Grafana Cloud documentation says:

Because it is too costly from a cardinality perspective, Grafana Loki indexes a few attributes from log entries instead of indexing all available attributes or the entire log message. As such, you must provide hints to the Loki translator, stating which attributes to promote to Loki labels. You can do this by adding new synthetic attributes, which are read by the Loki translator and removed before the data is sent over the network. The following snippet shows how the processors section looks when you add a resource processor that adds the loki.resource.labels hint. This example tells the Loki translator that the host_name resource attribute should be promoted to a label. You are not required to add labels, and every entry that passes through the Loki exporter will have a static label exporter with the value OTLP by default. For more information about labels and how to chose the right ones for your use case, refer to the Loki documentation.

But this behavior doesn't actually work when sending over OTLP to the Grafana Cloud OTLP endpoint in our experience for any resource attribute we want to promote to a label.

fredrikgh commented 2 months ago

@fredrikgh can you please help us understand what kind of attribute you want to promote as Loki labels? Are they additional standard resource attributes? custom resource attributes? What information do these attributes convey?

One example we had was to have loki labels for exception and/or scope of a log entry, i.e. custom attributes.

cyrille-leclerc commented 2 months ago

Grafana Cloud documentation says:

... But this behavior doesn't actually work when sending over OTLP to the Grafana Cloud OTLP endpoint in our experience for any resource attribute we want to promote to a label.

@adrielp This documentation is outdated, it predates the introduction of Loki structured metadata, we are going to refresh this section.

Please use OTel log attributes to capture logs metadata (eg thread.name...). Note that the OTel auto instrumentation of logging frameworks is usually capable of capturing interesting metadata.

We are sorry for the inconvenience. Would this solution meet your expectations?

One example we had was to have loki labels for exception and/or scope of a log entry, i.e. custom attributes.

Thanks @fredrikgh , would you by any chance have example values and a sense of the cardinality?

In particular, I would be interested in understanding:

adrielp commented 1 month ago

Thanks @cyrille-leclerc - glad the updates are going to be made. I'd also keep an eye on the entity OTEP that relates to resource attributes. I think these types of things will be important for labels as things evolve.

fredrikgh commented 1 month ago

Thanks @fredrikgh , would you by any chance have example values and a sense of the cardinality?

In particular, I would be interested in understanding:

  • exception is it:

    • Just a marker like true/false to have a different data management policy, for example different retention policy?
    • The exception type like NullPointerException
    • Or also include the exception message like InvalidFormatException: '123azerty' is not a valid integer
  • scope is it:

    • A reference to the OpenTelemetry instrumentation scope name which is mapped to the logger name by the OTel auto instrumentation of logging framework, for example com.mycompany.OrderService

@cyrille-leclerc It would be NullPointerException and com.mycompany.OrderService respectively. I suppose technically, these aren't to be considered resource attributes. But some mechanism of getting these indexed would be very useful.

cyrille-leclerc commented 1 month ago

@adrielp: Thanks @cyrille-leclerc - glad the updates are going to be made. I'd also keep an eye on the https://github.com/open-telemetry/oteps/pull/264 that relates to resource attributes. I think these types of things will be important for labels as things evolve.

We are aligned here, we have several engineers who contribute to this OTEP, both to surface better the concept of entities in OTel and to hlp improve the support for high dimensionality in Prometheus

@fredrikgh: @cyrille-leclerc It would be NullPointerException and com.mycompany.OrderService respectively. I suppose technically, these aren't to be considered resource attributes. But some mechanism of getting these indexed would be very useful.

Thanks @fredrikgh. Please pardon my curiosity but what is your use case for this level of details in labels and thus this cardinality on the log streams? Applications in java have hundreds of logger name (eg com.mycompany.OrderService) and use dozens of exception classes (NullPointerException). I suspect we may not be aware with the use case you are solving here.

fredrikgh commented 1 month ago

@cyrille-leclerc we were misusing them initially. We have a limitation on error metrics exported by the apps, and built log data dashboards for log meta analysis instead. E.g. error count by certain metadata, backed by recording rules. But we've accomplished this now with label_format and all is well.

Getting standard resource attributes such as cluster, node, pod etc as indexed labels is a more valid use case, and more fitting to resource attributes. I may have missed it, but have you settled on how you intend to make this possible? This is indeed where we used loki_resource_labels before.