Jaeger is an open-source, end-to-end distributed tracing system that helps monitor and troubleshoot transactions in complex, microservices-based environments. Jaeger v2 is a major new version where we rebase all Jaeger backend components (agent, collector, ingester, and query) on top of the OpenTelemetry Collector, bringing significant improvements and changes to the platform.
The transition from v1 to v2 introduces significant architectural changes, particularly in the collector component. As part of this transition, it's crucial to understand the performance implications of these changes through comprehensive benchmarking.
The goal of this project is to develop a comprehensive benchmarking suite that compares the performance of Jaeger v1 and v2, with a primary focus on the collector component. This benchmarking will provide valuable insights into the efficiency, scalability, and resource utilization of both versions, helping the community understand the benefits and potential trade-offs of migrating to Jaeger v2. The CNCF will provide compute resources on this project if needed. Please coordinate with the mentors.
Key Features and Implementation
Benchmarking Environment Setup
Develop a reproducible environment for running benchmarks, using tools like Docker.
Ensure consistent hardware and software configurations for fair comparisons.
Create scripts to automate the deployment of Jaeger v1 and v2 components in isolation.
Support multiple backends for benchmarking (ElasticSearch, OpenSearch, Cassandra)
Workload Generation
Utilize cmd/tracegen as a workload generator that can simulate various real-world scenarios.
Develop mechanisms to control the rate and volume of span ingestion.
Performance Metrics Collection
Implement collection of key performance indicators, including:
Throughput (spans processed per second)
Latency (processing time per span)
Resource utilization (CPU, memory, network, disk I/O)
Dropped span rate under high load
Utilize Prometheus for metrics collection and storage.
Utilize Grafana for reporting and dashboarding of the data
Storage Backend Integration
Evaluate collector performance with different storage backends (Elasticsearch, Cassandra, OpenSearch).
Measure the impact of different storage configurations on collector performance.
Data Processing and Analysis
Generate comprehensive dashboards and reports comparing v1 and v2 performance across different scenarios.
Documentation and Reproducibility
Prepare a blog post summarizing the results
Create detailed documentation of the benchmarking methodology, environment setup, and test scenarios.
Develop a guide for running the benchmarks, allowing community members to reproduce and verify results.
Expected Outcome
By the end of this project, we aim to have:
A comprehensive, automated benchmarking suite for comparing Jaeger v1 and v2 collectors.
Detailed performance report highlighting the strengths and potential areas of improvement in Jaeger v2.
Clear insights into the scalability and efficiency gains (or trade-offs) in Jaeger v2.
A set of recommendations for users considering migration from v1 to v2, based on performance characteristics.
Background
Jaeger is an open-source, end-to-end distributed tracing system that helps monitor and troubleshoot transactions in complex, microservices-based environments. Jaeger v2 is a major new version where we rebase all Jaeger backend components (agent, collector, ingester, and query) on top of the OpenTelemetry Collector, bringing significant improvements and changes to the platform.
The transition from v1 to v2 introduces significant architectural changes, particularly in the collector component. As part of this transition, it's crucial to understand the performance implications of these changes through comprehensive benchmarking.
Relevant links:
Project Objective
The goal of this project is to develop a comprehensive benchmarking suite that compares the performance of Jaeger v1 and v2, with a primary focus on the collector component. This benchmarking will provide valuable insights into the efficiency, scalability, and resource utilization of both versions, helping the community understand the benefits and potential trade-offs of migrating to Jaeger v2. The CNCF will provide compute resources on this project if needed. Please coordinate with the mentors.
Key Features and Implementation
Benchmarking Environment Setup
Workload Generation
Performance Metrics Collection
Storage Backend Integration
Data Processing and Analysis
Documentation and Reproducibility
Expected Outcome
By the end of this project, we aim to have:
Proposal
No response
Open questions
No response