open-telemetry / oteps

OpenTelemetry Enhancement Proposals
https://opentelemetry.io
Apache License 2.0
337 stars 164 forks source link

Support Elastic Common Schema in OpenTelemetry #199

Closed cyrille-leclerc closed 1 year ago

cyrille-leclerc commented 2 years ago

This OTEP proposes to add support for Elastic Common Schema in OpenTelemetry, enriching the Otel Semantic Conventions. It has been prepared by @cyrille-leclerc , @alolita , @kumoroku, @jkowall, @danielkhan, and many others

Relates to:

cyrille-leclerc commented 2 years ago

It is not clear to me what this OTEP is actually proposing. It says "to support ECS", but what does it mean in practice? Is it proposing that OTLP data model be changed to match ECS? Is it proposing a mapping of OTEL semantic conventions to/from ECS?

@yurishkuro thanks for identifying this lack of clarity. The proposal is to enrich existing OpenTelemetry Semantic Conventions with the fields defined in the Elastic Common Schema. We are aware that there will be some overlaps that will have to be resolved favoring the backward compatibility of the OpenTelemetry Semantic Conventions, usually preferring the existing Otel Semantic Convention attribute unless a good reason would justify to prefer the ECS field naming.

Does this clarify the proposal?

yurishkuro commented 2 years ago

This clarifies the intent, but not the proposal. The OTEP itself does not provide said mapping, so if I were to approve the OTEP, what am I approving, the intent to create another OTEP that will contain the actual mapping?

cyrille-leclerc commented 2 years ago

This clarifies the intent, but not the proposal. The OTEP itself does not provide said mapping, so if I were to approve the OTEP, what am I approving, the intent to create another OTEP that will contain the actual mapping?

You are correct @yurishkuro . If this OTEP is accepted, then we plan to:

Does it make sense?

yurishkuro commented 2 years ago

Ok. I don't have objection to such sequencing, but please make it explicit in the OTEP that this is the action plan.

On a related note, what's preventing you from doing the mapping in the same OTEP? Are you concerned that it may not be accepted after doing all that work?

And one more thing: it would be good to add a diagram (eg using mermaid) showing the proposed data flow, i.e. what is the direction of transformations, where the data comes from and goes into.

cyrille-leclerc commented 2 years ago

Ok. I don't have objection to such sequencing, but please make it explicit in the OTEP that this is the action plan.

Thanks, I'll do it asap.

On a related note, what's preventing you from doing the mapping in the same OTEP? Are you concerned that it may not be accepted after doing all that work?

Correct, it's the quantity of work.

And one more thing: it would be good to add a diagram (eg using mermaid) showing the proposed data flow, i.e. what is the direction of transformations, where the data comes from and goes into.

I'm not sure to catch. We thought of enriching the Otel Semantic Conventions with new attributes and we didn't identify changes in the data flow. We envisioned for example authors of OTel Collector Receivers to leverage these new attributes to structure more the log files they parse but this would not change the fact that people would create Otel Collector Receivers to parse log files. IS it something we should clarify in the OTEP?

yurishkuro commented 2 years ago

We thought of enriching the Otel Semantic Conventions with new attributes and we didn't identify changes in the data flow. We envisioned for example authors of OTel Collector Receivers to leverage these new attributes to structure more the log files they parse but this would not change the fact that people would create Otel Collector Receivers to parse log files. IS it something we should clarify in the OTEP?

Most semantic conventions are used to capture data via instrumentation, some are for converting from existing formats. I am not familiar enough with ECS, is that actually an established interchange format or just for data at rest in ES? Would you have log files written with ECS-formatted data?

Basically, my point about the diagram: if you're defining the mapping, then paint a picture where that mapping might be used.

jkowall commented 2 years ago

@yurishkuro the purpose is to be able to correlate data from different log sources which are not OpenTelemetry instrumented data sources. Think of existing technologies like routers, switches, host operating systems, DNS logs, app server logs, and so forth. Hopefully, over time some of these will switch to an Otel format with the proper schema, but as you are aware, this can take decades to change. Meanwhile, we need to correlate data between these togs and that's why this is important for users to have.

cyrille-leclerc commented 2 years ago

@yurishkuro FYI I clarified

weyert commented 2 years ago

It's not fully clear what's the benefit of this for a Opentelemetry standard consumer is when you are not using Elastic? Is the ECS format commonly supported by other logging vendors? I can't really find many references to it

For example, how does this help me when I am using Cloud Logging or Loggly?

jkowall commented 2 years ago

It's not fully clear what's the benefit of this for a Opentelemetry standard consumer is when you are not using Elastic? Is the ECS format commonly supported by other logging vendors? I can't really find many references to it

For example, how does this help me when I am using Cloud Logging or Loggly?

Not every logging system has parsing and schemas, but the good ones typically do. Even if it does, it makes sense to normalize the data before you send it to the system. If your logging system doesn't support schemas, you MUST map the data before you store it.

The reason is, so you can correlate the data from various sources. For example, if I am capturing logs from a Palo Alto Firewall which calls source ip something, and I'm capturing ipfw logs from a Linux host which calls source ip something different in the log data. How do I query these consistently?

If you are only capturing logs from custom software using Otel Logging then you will not have this issue, but unfortunately, we get logs from many sources, and we cannot easily correlate the data.

Mpdreamz commented 2 years ago

@jkowall @arminru and others on this thread :wave:, I'm supporting @cyrille-leclerc's efforts from Elastic's side.

Is there anything still blocking merging this PR? Very keen to hear if there are still open ends that need clarification on our end!

jkowall commented 2 years ago

I don't think so just need reviews from others to permit merging. I still see the need for this regularly with user discussions.

Mpdreamz commented 2 years ago

Thanks @jkowall :+1:! as discussed with @tigrannajaryan today during the Logs SIG meeting the next step for us would be to open it up to the wider group in the Specification SIG to get a stronger consensus around the intend behind this OTEP.

That would open us up to focus on the mechanics moving forward too.

kumoroku commented 2 years ago

It's not fully clear what's the benefit of this for a Opentelemetry standard consumer is when you are not using Elastic? Is the ECS format commonly supported by other logging vendors? I can't really find many references to it For example, how does this help me when I am using Cloud Logging or Loggly?

Not every logging system has parsing and schemas, but the good ones typically do. Even if it does, it makes sense to normalize the data before you send it to the system. If your logging system doesn't support schemas, you MUST map the data before you store it.

The reason is, so you can correlate the data from various sources. For example, if I am capturing logs from a Palo Alto Firewall which calls source ip something, and I'm capturing ipfw logs from a Linux host which calls source ip something different in the log data. How do I query these consistently?

If you are only capturing logs from custom software using Otel Logging then you will not have this issue, but unfortunately, we get logs from many sources, and we cannot easily correlate the data.

a little late on this comment, but just wanted to point out I represent a vendor (Sumo Logic) that does not use Elastic technology, and yet we are still very much excited about moving towards a standard schema. ECS is, in our minds, the clear frontrunner. we will be happy to support this path towards a standard as part of OT and are committed to implementing support to simplify life for our users.

cyrille-leclerc commented 2 years ago

@arminru said

Thank you for the explanation. The expectation is not that such vendors would pick up ECS or OTel semconv as their logging format in any foreseeable future yet (and certainly not for already deployed appliances) but that we would be able to define transformation/mapping rules that are then implemented in the respective OTel collector receivers or processors, right?

I see a future here vendors will be interested in publishing structured logs through OTLP adopting OTel Semantic Conventions. We already see a some vendors publishing JSON or logfmt logs. In the meantime, you are right, ingestion pipelines to parse/map/enrich will be needed.

Do such mapping rules for common (log) sources/formats already exist today in ECS so they can then be used in OTel or are they yet to be defined?

Yes Elastic publishes the parsing/mapping/enrichments rules of 175+ integrations (NGinx, Apache HTTPD, Cisco, F5, MySQL, PostgreSQL...) to ECS on https://github.com/elastic/integrations/tree/main/packages . The format is Elasticsearch ingest pipelines with Grok patterns (rather than regular expressions). It could help define the Otel collector ingestion pipelines.

cyrille-leclerc commented 2 years ago

@arminru said

@jkowall @cyrille-leclerc Can you please address the questions above?

Sorry for the delay @arminru , did I answer your pending questions? Did I miss a question?

arminru commented 2 years ago

@cyrille-leclerc Thank you for the explanation! That answers all the questions I raised.

Let's see what the other @open-telemetry/specs-approvers say to the proposal 🙂

reyang commented 2 years ago

@jsuereth I want to get your opinion here.

Specifically, I wonder if we should continue to take new semantic conventions PRs if these are already covered by ECS (e.g. https://github.com/open-telemetry/opentelemetry-specification/pull/2824 process/system uptime seems to be a common thing, if we envision that ECS and OpenTelemetry semantic convention would align AND ECS has already covered it, should we stop this PR and set the expectation, or we continue to let these PRs in, and create more work to smooth them out?)

@astencel-sumo FYI

ruflin commented 2 years ago

@reyang It would be great to have these fields in ECS / single place. It was always the goal to get metrics into ECS especially some of these fundamentals. See https://github.com/elastic/ecs/issues/474#issuecomment-1240633062 for more discussions.

tigrannajaryan commented 2 years ago

I think there is a great deal of value that we all can derive from this initiative. That said, I think there are a few things that the OTEP needs to address but doesn’t.

  1. It needs to make clear that "merging" does not result in one final set of semantic conventions that both OpenTelemetry and ECS use. "Merging" means adding new conventions to OpenTelemetry by borrowing their definitions from ECS.

  2. As a results of "merging" we will end up with OpenTelemetry semantic conventions that as a whole will still be different from ECS. OpenTelemetry semantic conventions and ECS will share a (possibly large) common subset, but they won’t be exactly the same as a whole.

  3. After the "merging" is complete OpenTelemetry semantic conventions and ECS will not be cemented. They will continue to evolve. The OTEP does not say whether this evolution will be done in any sort of synchronized manner or we should expect that the OpenTelemetry semantic conventions and ECS will gradually drift further apart over time.

  4. The OTEP does not address the topic of co-existence. Will it somehow enable the following scenario: ECS-compatible data sources to send data to OpenTelemetry-compatible backends (and vice versa)? Do we expect an ability for (unambiguous?) runtime transformation from ECS to OpenTelemetry (and vice versa) that can be done for example in the OpenTelemetry Collector processor or elsewhere in the collection pipeline or at query time in the backend? Unsurprisingly, this looks a lot like the problem that Telemetry Schemas solve for different versions of Schemas and solutions may look similar as well. It is unclear if this is considered at all.

Since this is more of a vision OTEP that is expected to be followed by a more specific "how do we do that" proposal I don’t expect this OTEP to necessarily have all detailed answers to these questions, but it needs to at least clarify that these are concerns that need to be addressed (the OTEP touches these tangentially in the very last paragraph but it is not very explicitly articulated).

I also generally feel that the "How would Otel users practically use" section is very sparse and would benefit from being more elaborate.

reyang commented 2 years ago

3. After the "merging" is complete OpenTelemetry semantic conventions and ECS will not be cemented. They will continue to evolve. The OTEP does not say whether this evolution will be done in any sort of synchronized manner or we should expect that the OpenTelemetry semantic conventions and ECS will gradually drift further apart over time.

Good point @tigrannajaryan. I think if this ended up with OpenTelemetry semantic conventions and ECS evolving independently after the initial "merge", it is kind of defeating the purpose here.

linux-foundation-easycla[bot] commented 1 year ago

CLA Signed

The committers listed above are authorized under a signed CLA.

Mpdreamz commented 1 year ago

@tigrannajaryan thanks for the feedback!

I updated the Proposed process to contribute ECS to OpenTelemetry Semantic Conventions section to include more details on the contribution and what coexistence could look like. It now also highlights in stronger terms that this is in fact a contribution and not a merger.

We still feel there is a massive benefit to closing the gap and ensure each other's success through aligning the two specifications closer.

cyrille-leclerc commented 1 year ago

As I am no longer part of Elastic and I am no longer connected to decisions on Elastic Common Schema (ECS), would it be better if I closed this PR and let @jamiehynds, @Mpdreamz, @AlexanderWert and @ruflin create a new PR?

cyrille-leclerc commented 1 year ago

I'm closing this PR to prevent misunderstandings as I'm no longer working with Elastic. I'll let @jamiehynds, @Mpdreamz, @AlexanderWert and @ruflin progress on this topic the way they want.

AlexanderWert commented 1 year ago

A new PR has been created for this proposal in https://github.com/open-telemetry/oteps/pull/222. We would appreciate the discussion to continue there and see the approvals from this PR to be "transferred" to https://github.com/open-telemetry/oteps/pull/222.