open-telemetry / oteps

OpenTelemetry Enhancement Proposals
https://opentelemetry.io
Apache License 2.0
326 stars 157 forks source link

Standard Telemetry Data Query API #218

Closed itaykat closed 1 year ago

itaykat commented 1 year ago

Standard Telemetry Data Query API

This change suggest a standard API definition to query telemetry data stored in any observability backend platform, both open source (Jaeger, Tempo, Zipkin, etc.) and proprietary (Datadog, appdynamics, etc.)

This idea of "having a standard for trace query API " has also been discussed by @lucasponce and others at length in https://github.com/open-telemetry/oteps/issues/193.

Collaborators and Reviewers: @ronyis @lucasponce

Comments, ideas, feedback, etc. are all very welcome and highly appreciated!

linux-foundation-easycla[bot] commented 1 year ago

CLA Signed

The committers listed above are authorized under a signed CLA.

jpkrohling commented 1 year ago

Are there products and projects committed to implementing such an API already?

tigrannajaryan commented 1 year ago

From what I see this OTEP does not define specifically any Query API. Instead this reads more as a declaration of an intent and a request for comments and opinions about whether OpenTelemetry wants to have such query API.

It is somewhat unusual but OK to have OTEPs of this kind to gauge the initial interest around a topic. I would suggest to circulate this widely on Otel Slack and in Otel Specification SIG meeting.

I already commented on https://github.com/open-telemetry/oteps/issues/193 that an initiative like this may look more like a project than a single OTEP (unless you manage to constraint the scope such that it can be reviewed and approved as one atomic piece of work). Given that this OTEP has to be followed by at least another one that defines the actual Query API, it seems like it is already is closer to being a project :-)

My personal opinion is that this somewhat deviates from how we define OpenTelemetry's charter today. However I can also see the value of such Query API so I don't want to rule out the possibility of modifying the charter to include this. It will need to be widely supported though for this to be possible, including by vendors expressing interest in implementing this API.

reyang commented 1 year ago

My personal opinion is that this somewhat deviates from how we define OpenTelemetry's charter today. However I can also see the value of such Query API so I don't want to rule out the possibility of modifying the charter to include this. It will need to be widely supported though for this to be possible, including by vendors expressing interest in implementing this API.

+1, OpenTelemetry's current charter is scoped to instrumentation and collection.

image

I think the Governance Committee needs to be involved if there is desire to change the charter.

itaykat commented 1 year ago

I agree with @tigrannajaryan about the initial need to "feel" the OTel community around this topic.

This need also relates to @jpkrohling question about the intent of telemetry platforms (@jaegertracing, @grafana Tempo, @signoz, etc.) to implement such a standard.

I'll add another required validation which is the telemetry consumers, the platforms which consume telemetry data for internal usage.

So currently required validations:

  1. The intent of observability backends to implement such an API
  2. The real need for such an API from the telemetry consumers side, more examples for such platforms.
  3. The intent of the OTel community to have such a project (generally speaking this is not a blocker for such initiative)
itaykat commented 1 year ago

I think the Governance Committee needs to be involved if there is desire to change the charter.

Thanks @reyang this is a concern I also mentioned in the open questions section and would love to get some feedback about.

Can you pleas tag someone from the Governance Committee?

jpkrohling commented 1 year ago

I probably can't join next week's, but I can add this to the agenda and am happy to introduce the topic on the call from 29 September 2022 if it's not addressed sooner.

I do "feel" like there's a community interested in having this and I did hear from users quite often about this, including from @lponce and, more recently, from @VineethReddy02. My only concern right now is investing energy in a proposal without anyone committing any engineering resources to implement this, where we could be investing this energy elsewhere.

edit: it's still Wednesday! We have a GC meeting tomorrow, I've added this to the agenda.

reyang commented 1 year ago

I think the Governance Committee needs to be involved if there is desire to change the charter.

Thanks @reyang this is a concern I also mentioned in the open questions section and would love to get some feedback about.

Can you pleas tag someone from the Governance Committee?

@open-telemetry/governance-committee please advise here, thanks!

itaykat commented 1 year ago

I do "feel" like there's a community interested in having this and I did hear from users quite often about this, including from @lponce and, more recently, from @VineethReddy02.

Hi @lponce and @VineethReddy02, can you please share your use-cases in that context (the need for a vendor-agnostic standard telemetry data query api), it will highly benefit the discussion. Thanks!

@zirain and @kampe same question for you as you were interested on that topic in https://github.com/open-telemetry/oteps/issues/193

@vikasmalhotra08 and @ceastman-r7 you are also welcomed to share your insights.

itaykat commented 1 year ago

My only concern right now is investing energy in a proposal without anyone committing any engineering resources to implement this, where we could be investing this energy elsewhere.

Our group at Cisco has the intent, will and engineering resources to back this initiative, we just need sufficient validation from the community around the need for such a standard.

pilhuhn commented 1 year ago

The charter already says "export". This request is about exporting from a backend. Q.e.D. :-)

itaykat commented 1 year ago

The charter already says "export". This request is about exporting from a backend. Q.e.D. :-)

Actually this may be a good point, telemetry data query can be seen as exporting, @jpkrohling this nuance worth a mention in the GC meeting today.

This makes my think of the usage of the word "exporters" as Sink as in data pipeline, so we may have Sink Exporters (current once) and Source Exporters (where this api should go). Just thinking out loud here...

jpkrohling commented 1 year ago

I discussed this today with the other GC members, and while you can watch the recording once it's uploaded and read the meeting notes yourselves, I thought it would be worth summarizing what was discussed there.

My personal opinions are:

bhs commented 1 year ago

There are really two questions here...

  1. Is a common effort around a multi-telemetry, analytical query language worthwhile?
  2. Should such an effort be part of OpenTelemetry?

Re question 1: quite possibly!

Re question 2: OpenTelemetry is already a project that's much wider than it is deep... and even for core signal types, our OTel mission (i.e., "to enable effective observability by making high-quality, portable telemetry ubiquitous") is still very much a WIP, and also not well-aligned to an analytical language. As such, I am skeptical that such an analytical query language should be part of OTel per se.

My two cents. (And apologies that I had an immovable conflict with today's GC meeting – this is definitely an important topic!)

lucasponce commented 1 year ago

I think this comment is still valid in this context https://github.com/open-telemetry/oteps/issues/193#issuecomment-1241875146.

The observability space is getting mature, and the collection of signals is just a part of the flow.

On the other side, there are "consumers" of this information to add value to the end-users.

Starting to propose some "standard" API would help; otherwise, consumers would need to deal with different products and technology.

As this proposal shows, there is a need; I think OpenTelemetry is the best organization that can lead, give visibility and shape this effort.

I like the comments that point to start defining a "common set of operations" that perhaps can be mapped from existing solutions.

I understand this triggers more questions related to "the vision" of OpenTelemetry. Still, I think this request represents valid "actors" in this business, and increasing the scope beyond the "collection" of the signals is a reasonable evolution.

cartermp commented 1 year ago

FWIW from a commercial vendor's standpoint (Honeycomb), we have had people ask for this kind of capability (we don't have an API to get traces like jaeger does). But it's not been many people. Most asks around APIs tend to be around CRUD for product nouns and aggregate data for reporting purposes, neither of which would fit within the OpenTelemetry vision IMO.

bhs commented 1 year ago

I would kinda like to table the question of whether this is a valid idea (some sort of new query language) and instead focus on whether an analytical query language belongs in OTel, regardless of independent merit.

jmacd commented 1 year ago

I agree that OpenTelemetry is not the right community or place to try to standardize a data query language or API.

However, I'm also excited to see OpenTelemetry data used with existing data platforms for streaming data analysis. In particular, I would like to focus attention on this proposal to integrate Apache Arrow with OpenTelemetry which will enable us to apply the data analysis tools in the Apache Arrow ecosystem to telemetry data.

itaykat commented 1 year ago

@jmacd thank you for sharing your OTEP, this one may relate also to your suggestion: Taken from the OpenTelemetry Mission Vision page:

Telemetry should be vendor-neutral For decades, proprietary drop-in agents from monitoring and observability vendors have been the primary source for useful telemetry from across the application stack. Unfortunately, the lack of common standards or APIs across these agents has led to vendor lock-in for customers, and inhibited innovation by tightly coupling telemetry collection with telemetry storage and analysis. With OpenTelemetry, we strive to provide a level playing field for all observability providers, avoid lock-in to any vendor, and interoperate with other OSS projects in the telemetry and observability ecosystem.

we strive to provide a level playing field for all observability providers, avoid lock-in to any vendor, and interoperate with other OSS projects in the telemetry and observability ecosystem.

We need to ask ourselves if we keep enabling a tightly coupled telemetry storage with telemetry analysis. As I see it, unless we remove the lock-in to specific observability backend (vendor) by providing a standard api for querying telemetry data and enable vendor agnostic telemetry consumption - we won't align with this section in the vision, please correct me if I miss something. Thanks 🤍

Edit Note: either way we need to find more telemetry consumers down the stream to validate this need.

pilhuhn commented 1 year ago

I am with @itaykat And an API does not need to provide for every use case, but at least define some very basic use cases like

bhs commented 1 year ago

@itaykat, apologies if I'm sounding like a broken record, but the "mission and vision" excerpt you quoted was about removing vendor agent lock-in, with the header (which you've retained) pertaining to vendor-neutral telemetry – a narrower thing than vendor-neutral analytics and full-spectrum observability. Hope that is clear.

pilhuhn commented 1 year ago

@bhs I think no one asks for vendor-neutral analytics, but for a way to get your own data back out from the vendors system in a vendor-neutral way.

itaykat commented 1 year ago

with the header (which you've retained) pertaining to vendor-neutral telemetry – a narrower thing than vendor-neutral analytics and full-spectrum observability

Thanks @bhs for emphasizing, this is correct, I mentioned analysis as a superset which includes the telemetry query.

Adding to @pilhuhn here, this initiative talks about vendor-neutral telemetry.

If we look at the downstream Telemetry consumption path:

  1. Telemetry Instrumentation
  2. Telemetry Collection
  3. Telemetry Exportation & Ingestion
  4. Telemetry Storage
  5. Telemetry Query (either pure extraction of the data for enrichment purposes or further analysis of it)

We can all agree (hopefully) that stages 1-3 are indeed vendor neutral and maintained by OTel, though once we go down the path, stages 4 and 5 are definitely not.

This conflicts with the vision tittle I quoted:

Telemetry should be vendor-neutral

reyang commented 1 year ago

We've discussed this during the Sep. 21st, 2022 Technical Committee Meeting. The @open-telemetry/technical-committee agrees with the @open-telemetry/governance-committee that a data query language/API is not belonging to the OpenTelemetry charter/scope.

In addition, with the current workstreams (e.g. Prometheus, Semantic Conventions, Logging), the Technical Committee is already spreading thin and won't be able to effectively take any new work stream.

reyang commented 1 year ago

The OpenTelemetry Collector project still needs a solution for its data transformation, so it's likely that the Collector SIG would be interested in providing requirements/feedback regarding a data transformation language (which might be a subset of a query language, or a completely different language).

itaykat commented 1 year ago

This conversation may now continue on the CNCF Slack channel #otel-telemetry-consumers, you are all welcomed. Thank you for sharing your thoughts.

itaykat commented 1 year ago

The OpenTelemetry Collector project still needs a solution for its data transformation, so it's likely that the Collector SIG would be interested in providing requirements/feedback regarding a data transformation language (which might be a subset of a query language, or a completely different language).

Thank you @reyang for sharing this insight, also watched @bhs talked about that need in the last GC meeting. Can you please share more resources/issues/users around this topic so I can continue that conversation?

tigrannajaryan commented 1 year ago

FYI all, Observability TAG (not Otel) plans to work on this, see https://docs.google.com/document/d/1JRQ4hoLtvWl6NqBu_RN8T7tFaFY5jkzdzsB9H-V370A/edit

itaykat commented 1 year ago

FYI all, Observability TAG (not Otel) plans to work on this, see https://docs.google.com/document/d/1JRQ4hoLtvWl6NqBu_RN8T7tFaFY5jkzdzsB9H-V370A/edit

Thanks @tigrannajaryan for mentioning the charter by @manolama and @vjsamuel and also mentioning this OTEP in his charter, this is highly relevant.

It worth mentioning that there is another working group by @AloisReitbauer and @austinlparker around this topic.