jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.15k stars 2.4k forks source link

[jaeger-v2] Jaeger v1 vs. v2 Benchmarking #5768

Open jkowall opened 1 month ago

jkowall commented 1 month ago

Background

Jaeger is an open-source, end-to-end distributed tracing system that helps monitor and troubleshoot transactions in complex, microservices-based environments. Jaeger v2 is a major new version where we rebase all Jaeger backend components (agent, collector, ingester, and query) on top of the OpenTelemetry Collector, bringing significant improvements and changes to the platform.

The transition from v1 to v2 introduces significant architectural changes, particularly in the collector component. As part of this transition, it's crucial to understand the performance implications of these changes through comprehensive benchmarking.

Relevant links:

Project Objective

The goal of this project is to develop a comprehensive benchmarking suite that compares the performance of Jaeger v1 and v2, with a primary focus on the collector component. This benchmarking will provide valuable insights into the efficiency, scalability, and resource utilization of both versions, helping the community understand the benefits and potential trade-offs of migrating to Jaeger v2. The CNCF will provide compute resources on this project if needed. Please coordinate with the mentors.

Key Features and Implementation

  1. Benchmarking Environment Setup

    • Develop a reproducible environment for running benchmarks, using tools like Docker.
    • Ensure consistent hardware and software configurations for fair comparisons.
    • Create scripts to automate the deployment of Jaeger v1 and v2 components in isolation.
    • Support multiple backends for benchmarking (ElasticSearch, OpenSearch, Cassandra)
  2. Workload Generation

    • Utilize cmd/tracegen as a workload generator that can simulate various real-world scenarios.
    • Develop mechanisms to control the rate and volume of span ingestion.
  3. Performance Metrics Collection

    • Implement collection of key performance indicators, including:
      • Throughput (spans processed per second)
      • Latency (processing time per span)
      • Resource utilization (CPU, memory, network, disk I/O)
      • Dropped span rate under high load
    • Utilize Prometheus for metrics collection and storage.
    • Utilize Grafana for reporting and dashboarding of the data
  4. Storage Backend Integration

    • Evaluate collector performance with different storage backends (Elasticsearch, Cassandra, OpenSearch).
    • Measure the impact of different storage configurations on collector performance.
  5. Data Processing and Analysis

    • Generate comprehensive dashboards and reports comparing v1 and v2 performance across different scenarios.
  6. Documentation and Reproducibility

    • Prepare a blog post summarizing the results
    • Create detailed documentation of the benchmarking methodology, environment setup, and test scenarios.
    • Develop a guide for running the benchmarks, allowing community members to reproduce and verify results.

Expected Outcome

By the end of this project, we aim to have:

Proposal

No response

Open questions

No response

yurishkuro commented 1 month ago

Previous ticket https://github.com/jaegertracing/jaeger/issues/4869

Previous PR https://github.com/jaegertracing/jaeger/pull/5214

yurishkuro commented 1 month ago

Earlier attempt: https://github.com/jaegertracing/jaeger/pull/5214/files