opensearch-project / observability

Visualize and explore your logs, traces and metrics data in OpenSearch Dashboards
https://opensearch.org/docs/latest/observability-plugin/index/
Apache License 2.0
52 stars 97 forks source link

[BLOG] Jaeger/Trace Analytics/ SSFO (DRAFT) #1444

Open derek-ho opened 1 year ago

derek-ho commented 1 year ago

Blog: Trace analytics with OpenSearch and Jaeger

As organizations evolve their software architectures towards microservices-based architectures, the operational data generated by those applications has become increasingly large and complex. Due to the distributed nature of the signals, the old approach of digging through logs just does not scale.

Therefore, organizations are adopting distributed tracing as a way of getting an insight into where to get an overall picture of their systems and use traces help determine where to start investigating in case of issues and shorten their root cause analysis times. Traces are an observability signal that captures the entire lifecycle of a particular request as it traverses the distributed services. These individual service hops are called spans and a trace can have multiple spans that make up the whole operation.

One of the most popular open source solutions for distributed tracing is Jaeger - an open source, end to end solution hosted by CNCF (Cloud NAtive Computing Foundation). Jaeger instrumentation SDKs are OpenTelemetry based and support multiple open source data stores like Cassandra and OpenSearch for storing traces. While Jaeger does provide a UI solution for visualizing traces, for users of OpenSearch, we are now providing an additional option in the form of OpenSearch Dashboards Trace analytics solution the native visualization tool that ships with OpenSearch.

OpenSearch is an open source solution that provides great support for log analytics and observability use cases. It has added support for analyzing distributed tracing data via its Observability plugin since v 1.3. Using these trace analytics capabilities users can analyze the crucial RED (Rate, Error, Duration) metrics information contained in their trace data. They can also then analyze various components of their system for things like latency and errors and pinpoint services that need attention.

OpenSearch trace analytics launched with support for OTEL compliant trace data provided by Data Prepper - another open source component that ships as part of OpenSearch. To widen the support for more popular trace formats used by developers, OpenSearch recently added the support for the Jaeger trace data. Jaeger is a widely used distributing tracing solution that can also use OpenSearch as its data store. With the newly added support (since OpenSearch 2.5), you can now analyze your Jaeger trace data stored in OpenSearch using the Trace analytics feature of OpenSearch Observability plugin.

You can benefit from the same feature rich analysis capabilities around RED metrics and contextual linking of traces and spans to their related logs that have been available for the Data Prepper trace data. You can now filter traces and figure out exactly which spans are showing the error and narrow down to the relevant logs quickly.

Currently, there are several differences between the OpenTelemetry format and the Jaeger format, mainly outlined here.

In order to try out this new feature, follow the guide here, which has a docker compose file that shows how to add sample data using Jaeger hot rod demo application and visualize it using trace analytics. In order to enable this feature, --es.tags-as-fields.all=true flag needs to be set. This is due to a limitation tracked here.

[INSERT VIDEO HERE]

Currently, there are a few workflows that are helpful in triaging/exploring your data. On the dashboard page, you can see the top service/operation combinations with the highest non-zero errors and latency. Clicking any of those will automatically bring you to the traces page with those appropriate filters applied. You can also investigate on your own any trace or services with any filters applied.

Next Steps

Get OpenSearch or try it out in the Playground

@YANG-DB Bring this back to SSFO/need to standardize to avoid proliferation of UI mapping components.

ariamarble commented 1 year ago

@derek-ho @pajuric do we have the video link for the line reading "[INSERT VIDEO HERE]" ?