Integration with external logs storage when displaying traces

yurishkuro commented 6 years ago

Many organizations have log aggregation setups, such as shipping all logs to Elasticsearch. For logs corresponding to traced transactions, it may not be feasible or desired to direct them all to the tracing backend. As long as the logs are labeled with trace/span IDs, it is possible to reassemble the logs from the logging backend when viewing traces in Jaeger UI.

This ticket is to track the design and implementation details of such solution.

The high level design is something like this:

+--------------+        +--------------+         +--------------+                    
|              |        |              |         |              |                    
|  Jaeger UI   |------->| jaeger-query |-------->|  logs query  |                    
|              |        |              |         |              |                    
+--------------+        +--------------+         +--------------+                    
                                                        |                            
                                     +------------------+--------------------+       
                                     |                  |                    |       
                             +--------------+    +--------------+    +--------------+
                             | logs storage |    | logs storage |    | logs storage |
                             | Elasticsearch     |   Kafka      |    |    HDFS      |
                             |              |    |              |    |              |
                             +--------------+    +--------------+    +--------------+

The log query service is meant to serve as an abstraction of different log storage types.

Open questions

what is the API of the log-query service
does jaeger-query retrieve logs when returning the trace to the UI or on-demand per-span

phal0r commented 6 years ago

@yurishkuro We will go this road with our centralized logging and are currently drafting an architecture. Based on the ideas you have at the moment, do you think it is necessary to stick to the current log format of a span:

"logs": [
        {
          "timestamp": 1400,
          "fields": [
            {
              "key": "error",
              "type": "string",
              "value": "something bad happened"
            }
          ]
        }

or would it just be necessary to store the spanId and implement an abstraction layer with a specified interface, which does the transformation. We currently only would need to know if the format is given or we a free to log what is necessary (inlcuding spanId) in our opinion.

You wrote, that only a spanid would be necessary to reassemble the logs, but I am thinking about visualizing different JSON object with different keys and different hierarchy if no schema is specified. It would be quite a generic JSON view, wouldn't it?

Also, the logs-query interface would need to be able to (at least imho :) ):

query logs by spanId, probably with pagination
filter by keywords (implementation would differ for every adapter)

yurishkuro commented 6 years ago

I think logs must store trace ID, not just span ID, because it's quite easy to get clashes in span IDs with the current 64bit length.

The format of the logs storage doesn't matter at all, it's the function of the adapter to understand the format when responding to the "log query" service.

Finally, the plan above was only meant to search for logs by trace/span ID, not by keywords. The latter is possible but a lot more complicated.

otisg commented 6 years ago

I'm a bit surprised logs don't already contain trace+span IDs. Is that really so? If yes, we may take a crack at that, esp. if somebody can point us to where in Jaeger we may want to start. Somewhere in Agent land, I imagine?

yurishkuro commented 6 years ago

@otisg I updated the title to make it clearer what this ticket is about. It is not about producing the logs, but rather retrieving them from an external log storage (like ELK, that's outside the scope of Jaeger) when you view a trace in the Jaeger UI.

In order for that approach to work it does require changing how applications are producing the logs into that 3rd log storage, specifically, as I mentioned above, it requires the application to associate the current trace/span IDs with each log message. How the application does it is also outside the scope of this ticket, I am afraid, because it's tightly coupled to the specific logging framework the application is using. E.g. in Java it is possible to give the tracer an extended ScopeManager that not only stores the spans in thread-local, but also sets MDC that can be used by the logging framework to enrich the log lines. In Go it requires a change in the logging API, e.g. as was done in the HotROD demo app: https://github.com/jaegertracing/jaeger/tree/master/examples/hotrod/pkg/log

otisg commented 6 years ago

Aha, aha, I see. The logs that Jaeger client already supports and sends to collector already have/inherit the trace/spanIDs because the logs are an integral part of the same trace structure that ends up being stored in the backend together with traces. Is there a way to route logs to a different endpoint? For example, if I have my ES cluster, I may prefer to route all my logs there instead of storing them together with traces. When you implement the functionality you've described here, Jaeger's UI will be able to get logs from such external ES cluster. Other "logs UIs" (e.g. people may have their Kibanas, etc.) will then also be able to get those logs (those that got collected with Jaeger) from such ES clusters.

Would such log routing be a welcome addition to Jaeger?

yurishkuro commented 6 years ago

Is there a way to route logs to a different endpoint?

Conceptually nothing prevents it as the Storage API in Jaeger collectors takes the spans and does whatever it wants with them, so you could have an implementation of that api that only pays attention to the logs in the span and sends them somewhere else. Such storage can be combined with other normal trace storage by using a composite SpanWriter.

Having said that, I am not seeing a lot of value in doing that, because most people do not emit logs using Span.log API. They do it using normal logging APIs, so the problem is usually the reverse, how to get those normal logs either (a) associated with the current span or (b) forwarded to the tracing system (maybe in parallel with normal log, which is what HotROD example is doing).

ajbouh commented 6 years ago

Any update on this? We've found it valuable to store HTTP headers and body for both requests and responses.

These are certainly larger than 64kb in some cases. We don't mind using Span.log or any other custom reporting logic, as we have complete control over the code doing the request processing.

sta-szek commented 6 years ago

Hi Can we somehow modify format of logs? I want to use sidecar filebeat container to deliver them to logstash but it would be nice to change logs format to fit my needs.

Is that right issue for question or should I open feature request for that?

yurishkuro commented 6 years ago

Which logs are you referring to?

sta-szek commented 6 years ago

jaeger logs, all components. All I could find is: https://www.jaegertracing.io/docs/1.7/monitoring/#logging, nothing about possibility to change logging format.

yurishkuro commented 6 years ago

This is off-topic for this ticket. Feel free to open another one. Zap logger can be configured with different formatter, but I don't think it's worth doing in the code. Jaeger logs are already log-integration-friendly, being encoded as JSON messages.

aclowkey commented 4 years ago

Is this similar to how StackDriver traces shows trace spans with logs?

pavolloffay commented 4 years ago

Somehow, the idea is to query log storage and attach logs containing traceid/spanid to the spans showed in jaeger UI.

ivsokol commented 4 years ago

HI all,

are there any accepted solutions for this ticket?

For me, this is mostly an issue of managing storage capacity, as I am sending application logs (trace and span enriched) to ELK, and Jaeger is also picking them and storing them in Jaeger DB (again ES).

It would be better if somehow Jaeger could query (through standardized interface like logs query above) logs for certain trace+span combination.

Also, is there a way to block sending app logs to Jaeger (only traces and spans with meta data) but allowing logs to go to ELK?

I am using jaeger in SpringBoot app (that uses slf4j and logback as log provider)

objectiser commented 4 years ago

As we will be transitioning to an OpenTelemetry based collector, wondering if this is something that could be supported there?

Currently there is support for metrics and tracing receivers/exporters - may be logging can be supported as an additional category, and a processor could be used to split out the logs from the tracing data and export it to a logging system (with the appropriate context metadata).

ivsokol commented 4 years ago

If I understood correctly, your proposal is to have jaeger act as facade for logging? Then all logs would go through jaeger, and then jaeger marks them and sends them to logging receiver?

for me, if there can be a flag that says "don't send app logs to jaeger", this would be good enough to minimize storage requirements (let's think of this as MVP)

and currently I am marking logs that go to ELK with trace and span ID through MDC and logstash-logback encoder setting, so it is not an issue to query app ELK logs by these fields (if I want to achieve functionality to see logs belonging to trace+span in Jaeger GUI).

objectiser commented 4 years ago

If I understood correctly, your proposal is to have jaeger act as facade for logging? Then all logs would go through jaeger, and then jaeger marks them and sends them to logging receiver?

Not quite - what I am suggesting is OpenTelemetry Collector adds logging as a third dimension to the information it is able to collect and export.

Then, as part of processing the tracing data (not necessarily Jaeger specific) that is handled by the OpenTelemetry Collector, a processor is created that can split off the logs received via the tracing data, and exports it to a logging exporter instead of a tracing exporter.

objectiser commented 4 years ago

Looks like logs may become part of OpenTelemetry at some point: https://github.com/open-telemetry/oteps/blob/master/text/0092-logs-vision.md

jpkrohling commented 4 years ago

This ticket is still valid though: this is about getting Jaeger UI to retrieve logs from an external provider. I think @rubenvp8510 had this in his queue some time ago.

alphavector commented 3 years ago

Still no news? Similar functionality was implemented by DataDog and logz.io https://docs.datadoghq.com/tracing/connect_logs_and_traces https://docs.logz.io/user-guide/distributed-tracing/correlate-traces

jpkrohling commented 3 years ago

Not yet, but if you have free cycles to contribute code for this feature, feel free to send a draft PR with a PoC based on @yurishkuro's proposal.

jaegertracing / jaeger

Integration with external logs storage when displaying traces #649