grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.52k stars 3.4k forks source link

Support receiving logs in Loki using OpenTelemetry OTLP #5346

Closed js8080 closed 1 month ago

js8080 commented 2 years ago

Is your feature request related to a problem? Please describe. I am running Grafana Loki inside a Kubernetes cluster but I have some applications running outside the cluster and I want to get logging data from those applications into Loki without relying on custom APIs or file-based logging.

Describe the solution you'd like OpenTelemetry describes a number of approaches including using the OpenTelemetry Collector. The OpenTelemetry Collector supports various types of exporters and the OTLP exporter supports logs, metrics, and traces. Tempo supports receiving trace data via OTLP and it would be great if Loki also had support for receiving log data via OTLP. This way, people could run the OpenTelemetry Collector next to their applications and send logs into Loki in a standard way using the OpenTelemetry New First-Party Application Logs recommendations.

Currently, unless I am misunderstanding the Loki documentation, it seems the only API into Loki is custom:

Details on the OTLP specification:

Describe alternatives you've considered There are a number of Loki Clients that one can use to get logs into Loki but they all seem to involve using the custom Loki push API or reading from log files. Supporting the OpenTelemetry Collector would allow following the OpenTelemetry New First-Party Application Logs recommendations

Additional context Add any other context or screenshots about the feature request here.

liguozhong commented 2 years ago

done .https://github.com/grafana/loki/pull/5363

1: grafana otlp log view image 2 go client mod.go dependency

    go.opentelemetry.io/collector/model v0.44.0

demo go client code:

import (
    "context"
    "testing"
    "time"

    "github.com/stretchr/testify/require"
    "go.opentelemetry.io/collector/model/otlpgrpc"
    "go.opentelemetry.io/collector/model/pdata"
    "google.golang.org/grpc"
    "google.golang.org/grpc/credentials/insecure"
)

func TestGrpcClient(t *testing.T) {
    grpcEndpoint := "localhost:4317"
    //client
    addr := grpcEndpoint
    conn, err := grpc.Dial(addr, grpc.WithTransportCredentials(insecure.NewCredentials()))
    require.NoError(t, err)

    client := otlpgrpc.NewLogsClient(conn)
    request := markRequest()
    _, err = client.Export(context.Background(), request)
    require.NoError(t, err)
}

func markRequest() otlpgrpc.LogsRequest {
    request := otlpgrpc.NewLogsRequest()
    pLog := pdata.NewLogs()

    pmm := pLog.ResourceLogs().AppendEmpty()
    pmm.Resource().Attributes().InsertString("app", "testApp")

    ilm := pmm.InstrumentationLibraryLogs().AppendEmpty()
    ilm.InstrumentationLibrary().SetName("testName")

    now := time.Now()

    logReocrd := ilm.LogRecords().AppendEmpty()
    logReocrd.SetName("testName")
    logReocrd.SetFlags(31)
    logReocrd.SetSeverityNumber(1)
    logReocrd.SetSeverityText("WARN")
    logReocrd.SetSpanID(pdata.NewSpanID([8]byte{1, 2}))
    logReocrd.SetTraceID(pdata.NewTraceID([16]byte{1, 2, 3, 4}))
    logReocrd.Attributes().InsertString("level", "WARN")
    logReocrd.SetTimestamp(pdata.NewTimestampFromTime(now))

    logReocrd2 := ilm.LogRecords().AppendEmpty()
    logReocrd2.SetName("testName")
    logReocrd2.SetFlags(31)
    logReocrd2.SetSeverityNumber(1)
    logReocrd2.SetSeverityText("INFO")
    logReocrd2.SetSpanID(pdata.NewSpanID([8]byte{3, 4}))
    logReocrd2.SetTraceID(pdata.NewTraceID([16]byte{1, 2, 3, 4}))
    logReocrd2.Attributes().InsertString("level", "WARN")
    logReocrd2.SetTimestamp(pdata.NewTimestampFromTime(now))
    request.SetLogs(pLog)
    return request
}
stale[bot] commented 2 years ago

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

frzifus commented 1 year ago

May I ask, whats the current state here? :)

periklis commented 1 year ago

@frzifus The status of this is that we still miss the API, but the key storage issue is addressed by non-indexed labels (See upcoming docs PR: https://github.com/grafana/loki/pull/10073). As @slim-bean mentioned in his earlier comment, we need an efficient storage for OLTP labels: and AFAIU and as mentioned in the last NASA community call we are close to non-indexed labels:

madhub commented 10 months ago

Any update on the native OTLP support ?

jpkrohling commented 10 months ago

@sandeepsukhani might know a thing or two about this :-)

sandeepsukhani commented 10 months ago

Hey folks, we have added experimental OTLP log ingestion support to Loki. It has yet to be released, so you would have to use the latest main to try it. You can read more about it in the docs. Please give it a try in your dev environments and share any feedback or suggestions.

mxab commented 10 months ago

Hi, really looking forward to that feature :)

I saw that a service.instance.id will be considered a label, doesn't this have the potential to be a high cardinality value?

Also will it be possible to customize the "labels" list? In our case we run nomad so the k8s.... resource attributes wouldn't really work for us. But we would have resource attributes like nomad.job.name which would make sense for us as labels

bouk commented 9 months ago

@sandeepsukhani looks good, I'll give it a try next week.

One immediate suggestion is that I'd like to be able to configure the indexed labels so I can add/remove items from the list. Perhaps it should default to the list you have in the docs and then the user can provide their own list to override it.

Also I see the span_id and trace_id are currently metadata, shouldn't the trace_id at least be indexed so I can correlate logs to traces?

Another suggestion is that the conversion adds a 'severity_number' metadata attribute which is not very useful, instead it should map it to a 'level' field like the opentelemetry collector translator does: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/translator/loki/logs_to_loki.go.

DengYiPeng commented 5 months ago

Hi, Can I ask if there are plans to support grpc? Or maybe I missed some documentation and it's actually supported now?

gkaskonas commented 4 months ago

Is this supported in Loki v3? I get 404 error when calling the endpoint

alextricity25 commented 3 months ago

Does anyone know if this is now possible?

I have an OTEL collector running on a k8s cluster which I would like to gather logs from and send over to a remote loki stack running on another k8s cluster. I'm hoping to achieve this via OTLP HTTP, and there is documentation that seems to indicate that this is possible. However, after following the documentation I haven't had any success. Sending logs from the OTEL collector to a remote Loki instance should be possible through OTLP HTTP, right?

jpkrohling commented 3 months ago

Yes, Loki v3 includes an OTLP port to ingest OTLP Logs natively.

alextricity25 commented 3 months ago

Does anyone know if this is now possible?

I have an OTEL collector running on a k8s cluster which I would like to gather logs from and send over to a remote loki stack running on another k8s cluster. I'm hoping to achieve this via OTLP HTTP, and there is documentation that seems to indicate that this is possible. However, after following the documentation I haven't had any success. Sending logs from the OTEL collector to a remote Loki instance should be possible through OTLP HTTP, right?

A quick update on this - I was able to receive logs via OTLP HTTP successfully. Turned out to be a mistake with my config.

leonhma commented 3 months ago

Solved

Well nevermind. I eventually found out I was just using a wrong version of loki (grafana/loki instead of grafana/loki:3.0.0) and the OTLP endpoint wasn't ready yet. So if you have the issue described below, just upgrade 🤷


Original Problem

A quick update on this - I was able to receive logs via OTLP HTTP successfully. Turned out to be a mistake with my config.

@alextricity25 Would you mind sharing how you got this to work? I am currently stuck at a stage where the collector gives me this error:

2024-06-23T17:30:51.116Z    error   exporterhelper/queue_sender.go:90   Exporting failed. Dropping data.    {"kind": "exporter", "data_type": "logs", "name": "otlphttp", "error": "not retryable error: Permanent error: rpc error: code = Unimplemented desc = error exporting items, request to http://loki.telemetry.svc.cluster.local:3100/otlp/v1/logs responded with HTTP Status Code 404", "dropped_items": 10}
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1
    go.opentelemetry.io/collector/exporter@v0.103.0/exporterhelper/queue_sender.go:90
go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume
    go.opentelemetry.io/collector/exporter@v0.103.0/internal/queue/bounded_memory_queue.go:52
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1
    go.opentelemetry.io/collector/exporter@v0.103.0/internal/queue/consumers.go:43

For reference, this is the config for the collector I am currently deploying using the operator:

Click to expand ```yaml # OpenTelemetry Operator apiVersion: opentelemetry.io/v1beta1 kind: OpenTelemetryCollector metadata: name: otel-collector namespace: telemetry spec: image: otel/opentelemetry-collector-contrib:0.103.0 serviceAccount: otel-collector mode: daemonset volumeMounts: # Mount the volumes to the collector container - name: varlogpods mountPath: /var/log/pods readOnly: true - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true volumes: # Typically the collector will want access to pod logs and container logs - name: varlogpods hostPath: path: /var/log/pods - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers config: receivers: otlp: protocols: grpc: {} http: {} filelog: include_file_path: true include: - /var/log/pods/*/*/*.log exclude: - /var/log/pods/telemetry_otel-collector*/*/*.log operators: - id: container-parser type: container processors: batch: {} exporters: logging: loglevel: debug otlphttp: endpoint: http://loki.telemetry.svc.cluster.local:3100/otlp compression: none tls: insecure: true prometheus: endpoint: "0.0.0.0:8889" service: pipelines: metrics: receivers: [otlp] processors: [batch] exporters: [prometheus] logs: receivers: [otlp,filelog] processors: [batch] exporters: [logging, otlphttp] ```

And, the loki config:

Click to expand ```yaml auth_enabled: false server: http_listen_port: 3100 grpc_listen_port: 9095 common: path_prefix: /loki storage: filesystem: chunks_directory: /loki/chunks rules_directory: /loki/rules replication_factor: 1 ring: kvstore: store: inmemory query_range: results_cache: cache: embedded_cache: enabled: true max_size_mb: 100 schema_config: configs: - from: 2024-04-01 object_store: s3 store: tsdb schema: v13 index: prefix: index_ period: 24h storage_config: tsdb_shipper: active_index_directory: /loki/tsdb-index cache_location: /loki/tsdb-cache aws: s3: s3://minioadmin:minioadmin@minio-service.minio.svc.cluster.local:9000/loki-data s3forcepathstyle: true limits_config: retention_period: 744h enforce_metric_name: false reject_old_samples: true reject_old_samples_max_age: 168h max_global_streams_per_user: 5000 ingestion_rate_mb: 10 ingestion_burst_size_mb: 20 allow_structured_metadata: true # chunk_store_config: # max_look_back_period: 744h table_manager: retention_deletes_enabled: true retention_period: 744h ruler: storage: type: local local: directory: /loki/rules rule_path: /loki/rules-temp # alertmanager_url: http://alertmanager:9093 TODO deploy alertmanager ring: kvstore: store: inmemory enable_api: true query_scheduler: max_outstanding_requests_per_tenant: 2048 frontend: max_outstanding_per_tenant: 2048 compress_responses: true ```

This looks suspiciously like the OTLP collector is still using gRPC, but this is exactly what the docs tell me to do so I am clueless here. Any help would be appreciated

leahneukirchen commented 3 months ago

The documentation lists http://<loki-addr>:3100/otlp as endpoint for the oltphttp exporter, but the actual endpoint is http://<loki-addr>:3100/otlp/v1/logs; this explains the 404.

astrojuanlu commented 3 months ago

Well nevermind. I eventually found out I was just using a wrong version of loki (grafana/loki instead of grafana/loki:3.0.0) and the OTLP endpoint wasn't ready yet. So if you have the issue described below, just upgrade 🤷

The official docker-compose.yml still contains the old version

https://github.com/grafana/loki/blob/0a7e9133590ffb361b9c4eb6c4b8a5b772d83676/production/docker-compose.yaml#L6-L8

The documentation lists http://:3100/otlp as endpoint for the oltphttp exporter, but the actual endpoint is http://:3100/otlp/v1/logs; this explains the 404.

Indeed, I found the /otlp reference in https://grafana.com/docs/loki/latest/send-data/. From https://grafana.com/docs/loki/latest/reference/loki-http-api/#ingest-logs-using-otlp,

When configuring the OpenTelemetry Collector, you must use endpoint: http://:3100/otlp, as the collector automatically completes the endpoint. Entering the full endpoint will generate an error.

So there's some inconsistency somewhere in the docs or the examples.

leahneukirchen commented 3 months ago

Using Alloy, endpoint = "http://localhost:3100/otlp" works, but if you want to log directly, e.g., you need OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=http://localhost:3100/otlp/v1/logs

LookOuta commented 2 months ago

The documentation lists http://<loki-addr>:3100/otlp as endpoint for the oltphttp exporter, but the actual endpoint is http://<loki-addr>:3100/otlp/v1/logs; this explains the 404.

I met the same problem. Have you solved yet ?

Zagrebelin commented 1 month ago

The documentation lists http://<loki-addr>:3100/otlp as endpoint for the oltphttp exporter, but the actual endpoint is http://<loki-addr>:3100/otlp/v1/logs; this explains the 404.

I met the same problem. Have you solved yet ?

I solved this problem by upgrading the Loki docker container to version 3.1.1.

Jayclifford345 commented 1 month ago

Closing this issue with the introduction of the native OTLP endpoint. Please reopen if required :)