jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.19k stars 2.4k forks source link

[jaeger-v2] Kubernetes Operator #5766

Open yurishkuro opened 1 month ago

yurishkuro commented 1 month ago

Part of LFX Mentorship 2024 Term 3 https://github.com/jaegertracing/jaeger/issues/5772.

Background

Jaeger is an open-source, end-to-end distributed tracing system that helps monitor and troubleshoot transactions in complex, microservices-based environments. Jaeger v2 is a major new version where we rebase all Jaeger backend components (agent, collector, ingester, and query) on top of the OpenTelemetry Collector, bringing significant improvements and changes to the platform.

Jaeger-v1 has its own Kubernetes Operator which deploys Jaeger components according to the deployment strategy as well as the database or datastore.

Relevant links:

Project Objective

The goal of this project is to develop a new operator for Jaeger-v2 that achieves feature parity with the v1 operator while introducing improvements and new capabilities. This new operator will leverage the OpenTelemetry operator for Jaeger-v2 deployment while maintaining and enhancing the storage management features from the v1 operator.

Key Features and Implementation

  1. OpenTelemetry Integration

    • Utilize the OpenTelemetry operator for deploying and managing collectors in Jaeger v2.
    • Ensure seamless integration between Jaeger components and OpenTelemetry-managed collectors.
  2. Deployment Strategies

    • Implement various deployment strategies for Jaeger components, mirroring the flexibility offered in v1.
    • Ensure compatibility with different Kubernetes environments and versions.
  3. Configuration Management

    • Develop a robust configuration management system that allows users to easily customize Jaeger deployments.
    • Implement ConfigMap and Secret management for sensitive configuration data.
  4. Documentation and Examples

    • Create comprehensive documentation for the new operator, including installation, configuration, and usage guidelines.
    • Develop a set of example deployments and use-cases to help users get started quickly.
    • Design and implement documentation to allow for a smooth upgrade path from Jaeger v1 to v2.

Expected Outcome

By the end of this project, we aim to achieve full feature parity between the Jaeger v2 operator and the v1 operator, with the added benefits of OpenTelemetry integration. The new operator will provide a seamless experience for users, maintaining the robustness and flexibility of v1 while introducing the advantages of v2 and OpenTelemetry.

Please note that Jaeger v2 operator will not manage/orchestrate the storage backends and we should provide documentation pointing to the Operators for Cassandra, ElasticSearch, and OpenSearch for the users to install and manage the datastores.

yurishkuro commented 1 month ago

A tip for implementation from @pavolloffay https://github.com/jaegertracing/jaeger/issues/5221#issuecomment-2015154966

tmjoris commented 1 month ago

Hello @yurishkuro, my name is Tanga I would like to help in developing the new operator as a prospective LFX mentee. Any other tips as of now?

yurishkuro commented 1 month ago

All tips are in the description. If you have specific questions don't hesitate to ask.

Ankit152 commented 1 month ago

Hello @yurishkuro! 👋🏼

I have been working with operators since a while now and I am also familiar with all the technologies that are mentioned. I would love to work on this issue. 🙂

Ali-Alnosairi commented 1 month ago

Hi @projectmaintainer,

The Jaeger v2 project looks exciting! I’m particularly interested in the OpenTelemetry integration and deployment strategies.

Are there any specific resources or documentation you recommend for getting up to speed?

Looking forward to contributing!

yurishkuro commented 1 month ago

@Ali-Alnosairi all resources are already linked in the description.

hellspawn679 commented 1 month ago

would love to give it a try for this fall term

Ali-Alnosairi commented 1 month ago

Hi @yurishkuro ,

This text is from the project description:

Configuration Management

Develop a robust configuration management system that allows users to easily customize Jaeger deployments.

would you please clarify what is meant by "robust configuration management system that allows users to easily customize Jaeger deployments". Does it include developing interface for customizing Jaeger deployments.

Thank you in advance!

yurishkuro commented 1 month ago

@Ali-Alnosairi I don't know if it needs to be an "interface" or just a process, but the business need here is that even though the operator deploys Jaeger-v2 components in broad strokes, the users still need the ability to customize the YAML configuration of Jaeger, e.g. to fine-tune some parameters, maybe configure additional pipeline components, etc. So the question is how will they achieve that while still deploying the code via Operator.

Ali-Alnosairi commented 1 month ago

@yurishkuro Does Kubernetes have API to extend this feature? Maybe the Kubernetes Operator  can answer the question as it handle some configuration settings.

yurishkuro commented 1 month ago

This should already work in OTEL Collector Operator, we just need to adopt the same procedure & document it.

karan2704 commented 1 month ago

Hi @yurishkuro , I wanted to ensure my understanding of the first objective of leveraging the Opentelemetry operator is correct.

In the linked approach provided by @pavolloffay above I see that the file of kind OpenTelemetryCollector deployed using Otel operator, but I am not clear on how other Jaeger components besides collectors are handled in this case as Otel operator is not concerned with those. It would be helpful if you could elaborate more on that approach.

Alternatively, I found this thread (not sure if it’s outdated or still relevant) but looks like embedding Otel config in Jaeger custom resource is reasonable so I was thinking on building on this to come up with a solution.

yurishkuro commented 1 month ago

@karan2704

I am not clear on how other Jaeger components besides collectors

In Jaeger v2 there is only a single binary to deploy, but it can be deployed in different roles depending on its configuration. I assume OTEL Operator already allows deploying Collector at minimum in two roles, as a local host agent / sidecar and as a central tier. We do not need Jaeger to be deployed as agent anymore, but we do need three other roles: collector, ingester (when using Kafka, although technically it's also "collector"), and query-service / ui.