open-telemetry / oteps

OpenTelemetry Enhancement Proposals
https://opentelemetry.io
Apache License 2.0
326 stars 157 forks source link

Introducing Application Telemetry Schema in OpenTelemetry - Vision and Roadmap #243

Closed lquerel closed 5 months ago

lquerel commented 7 months ago

[Easy to read version with images: https://github.com/lquerel/oteps/blob/app-telemetry-schema-vision-roadmap/text/0243-app-telemetry-schema-vision-roadmap.md]

Unlike the traditional data ecosystem (OLTP and OLAP), the world of telemetry generally does not rely on the concept of a schema. Instrumentation is deeply embedded in the code of applications and libraries, making it difficult to discover all the possible telemetry signals an application can emit. This gap prevents or limits the development of CI/CD tools for checking, reporting, documenting, and generating artifacts from telemetry signals specific to an application. This document presents a long-term vision aimed at enabling the OpenTelemetry project to address this issue and extend its impact to a broader ecosystem. It proposes extending the initiatives of Telemetry Schema and Semantic Conventions to include concepts of Application Telemetry Schema and Resolved Telemetry Schema. A series of OTEPs and Tools will be proposed in this overarching document to detail each aspect of this vision.

Similar (but proprietary) initiative from Facebook: Positional Paper: Schema-First Application Telemetry

EDIT 1: The OTel Weaver project (a PoC implementation of this OTEP and some of the others mentioned in the roadmap) is now available here.

EDIT 2: The Slack channel #otel-weaver is dedicated to this OTEP and the associated OTel Weaver project.

yurishkuro commented 7 months ago

Easier to read version with images: https://github.com/lquerel/oteps/blob/app-telemetry-schema-vision-roadmap/text/0240-app-telemetry-schema-vision-roadmap.md

tigrannajaryan commented 7 months ago

@lquerel thank you for working on this, I think schemas are very important for Otel's evolution.

I already had a chance to discuss the roadmap and I support it. The details of each steps of course are going to be important and will need their own OTEPs or spec PRs.

lquerel commented 7 months ago

@yurishkuro @tigrannajaryan @jmacd @pyohannes @jsuereth @lmolkova

I have added the following notes to the Telemetry Schema v1.2 section to clarify a few important concepts that were omitted in the first version of this document. I don't believe there is anything controversial in these additions, but please read them and let me know if you have any disagreements or need further clarification.

Note 1: Each signal definition, where possible, reuses the existing syntax and semantics defined by the semantic conventions. Each signal definition is also identified by a unique name (or ID), making schemas easy to traverse, validate, and diff.

Note 2: This hierarchy of telemetry schemas helps large organizations in collaborating on the Application Telemetry Schema. It enables different aspects of an Application Telemetry Schema to be managed by various teams.

Note 3: For all the elements that make up the Application Telemetry Schema, a general mechanism of annotation or tagging will be integrated in order to attach additional traits, characteristics, or constraints, allowing vendors and companies to extend the definition of concepts defined by OpenTelemetry.

Note 4: Annotations or Tags can also be employed to modify schemas for diverse audiences. For example, the public version of a schema can exclude all signals or other metadata labeled as private. Similarly, elements can be designated as exclusively available for beta testers. These annotations can also identify attributes as PII (Personally Identifiable Information), and privacy policy enforcement can be implemented at various levels (e.g., in the generated client SDK or in a proxy).

EDIT: added another note on the relationship between Semantic Layers, Telemetry Schemas, and Observability Query Assistant.

Note 5: This recent paper from data.world, along with the MetricFlow framework which underpins the dbt Semantic Layer, highlights the significance of adopting a schema-first approach in data modeling, especially for Generative AI-based question answering systems. Tools like Observability Query Assistants (e.g. Elastic AI Assistant and Honeycomb Query Assistant) are likely to become increasingly prevalent and efficient in the near future, thanks to the adoption of a schema-first approach.

mtwo commented 7 months ago

@jmacd can you also add the 'triaged' label?

lquerel commented 6 months ago

@jsuereth @yurishkuro @tigrannajaryan @pyohannes @jack-berg @jmacd

To celebrate the end of the year, I have prepared an update to the OTEP which I hope takes into account all the feedback I received over the last few weeks. I suggest you read through the entire document as there are many differences compared to the previous version. Happy reading. I look forward to your feedback.

lquerel commented 6 months ago

@jsuereth @yurishkuro @tigrannajaryan @pyohannes @jack-berg @jmacd Happy New Year to everyone! To start off 2024 on the right foot, I would like to finalize our first OTEP focusing on the concept of application telemetry schema. I believe all feedback has been incorporated (if not, please let me know), and it seems crucial to obtain approval before we proceed with the plan outlined at the end of the OTEP.

lquerel commented 6 months ago

The only addition that I think would be useful is to link to potential examples of component and resolved schemas (perhaps from the prototype), just to make it more concrete for people, vs. just the diagrams that frankly look pretty similar.

@yurishkuro, I will add an appendix with example schemas to illustrate the different concepts.

lquerel commented 6 months ago

LGTM

The only addition that I think would be useful is to link to potential examples of component and resolved schemas (perhaps from the prototype), just to make it more concrete for people, vs. just the diagrams that frankly look pretty similar.

@yurishkuro I added 2 appendices to describe the structure of the component and resolved telemetry schemas. This is only given as an example and represents my current view of things. A dedicated OTEP will be devoted in the future to the precise definition of their respective structure and format.

jack-berg commented 5 months ago

I just wanted to drop and leave a comment since I left some early comments on this OTEP: I'm pretty excited / interested in a lot of these ideas, but don't have the capacity to review deeply or be a big part of this effort, at least for the moment. I haven't seen anything in this proposal that concerns me - please don't wait for my approval to move forward.

lquerel commented 5 months ago

@tigrannajaryan @yurishkuro @jsuereth @jmacd @pyohannes Hello everyone. After discussing with Tigran, I have significantly updated the OTEP to mainly simplify it, focus on the vision and roadmap, and eliminate any aspects that might limit our options in the future. No new concepts are introduced. As this represents a significant change, I would like to request a new approval from you, apologizing for the additional work this entails.

tigrannajaryan commented 5 months ago

I had some old unpublished comments remaining from reviewing the previous iteration of the OTEP, which accidentally got published when I approved this OTEP. Ignore the comments if you receive them in emails, they are no longer relevant. I also deleted the comments from this PR to avoid confusion.

tigrannajaryan commented 5 months ago

@open-telemetry/specs-approvers please review.

tigrannajaryan commented 5 months ago

@open-telemetry/specs-approvers this is a major OTEP that will impact many parts of Otel. We have enough approvals to merge it. Last call to review it.

tigrannajaryan commented 5 months ago

Thank you @lquerel for the OTEP!