open-telemetry / oteps

OpenTelemetry Enhancement Proposals
https://opentelemetry.io
Apache License 2.0
326 stars 157 forks source link

Support Elastic Common Schema (ECS) in OpenTelemetry #222

Closed AlexanderWert closed 1 year ago

AlexanderWert commented 1 year ago

Reopening the OTEP (as the new owner of the OTEP) on supporting the Elastic Common Schema in OpenTelemetry (as a continuation of #199).

Would love to see the discussion from #199 to continue on this PR and approvals that have been already made on #199 being 'transitioned' / given here again.

trask commented 1 year ago

hi all!

we are trying to stabilize the HTTP semantic conventions in the next 12 weeks, so there's some urgency in deciding how this OTEP affects that stabilization effort.

I've outlined briefly the impact it could have in https://github.com/open-telemetry/opentelemetry-specification/issues/3163#issuecomment-1413011607.

it would be really helpful to get input from this group whether the changes I've outlined are the kind of thing you were imagining would take place if this OTEP is accepted.

jsuereth commented 1 year ago

I want to reiterate some of @tigrannajaryan's concerns from the previous PR here. I think we need to address them, but possibly not 100% within this PR as some will take time to figure out the details.

Particularly:

What happens after Elastic Common Schema is donated into OpenTelemetry semantic conventions?

ECS and OTEL realistically won't have 100% compatibility. So there needs to be some affordance for migration and realistic expectations for what existing ECS users would do.

We should avoid a scenario where we have two independently evolving specifications with no plans of convergence.

In this case, I would propose our long term plan is that while ECS + OTel would be in a "joint" existence state for some time, we would be working towards a sunset period to migrate users to one standard. This approach is similar to what we did for OpenTracing / OpenCensus. Compatibility would be provided for users to migrate from ECS to OTEL Semconv over time, and there is a sunsetting period for ECS such the new activity / continued work would happen in the newly shared semantic conventions.

A few important points:

trask commented 1 year ago

These are the specification changes that would be required to align OTel HTTP semantic conventions with ECS prior to declaring the OTel HTTP semantic conventions stable:

I don't think we should merge this OTEP unless we are committed to making the above changes and "actually aligning in practice".

reyang commented 1 year ago

I don't think we should merge this OTEP unless we are committed to making the above changes and "actually aligning in practice".

+1 👍

reyang commented 1 year ago

Update

Folks from Elastic.co and OpenTelemetry met at 8:30AM-8:52AM Mar. 10th 2023 and verbally agreed on the following things:

  1. There were feedback from the OpenTelemetry GC/TC on this OTEP, @reyang is going to add concrete comments and we (@AlexanderWert, @jsuereth, @reyang and @trask) are targeting next Fri (Mar. 17th) to address these comments and get the OTEP merged.
  2. Both Elastic.co and OpenTelemetry will have public announcements/posts about the merge, we expect both parties to prepare the draft doc and do peer review, once we got thumbs up from both sides, we'll coordinate and publish. @reyang volunteered to drive the OpenTelemetry announcement, @AlexanderWert to work with Craig Conboy to identify the Elastic.co announcement owner. Note that this does not block the OTEP.
  3. Elastic.co will provide a small group of schema experts, @jsuereth will work on getting them as OpenTelemetry semantic convention / schema approvers.
  4. @trask will work towards changing the OpenTelemetry Semantic Convention towards ECS (here is one example https://github.com/open-telemetry/opentelemetry-specification/issues/3181).
Oberon00 commented 1 year ago

Some key foo.bar should not ever have a different meaning in traces as it does in metrics for example.

FWIW, OTels's current semconv tooling should enforce uniqueness of attribute keys across signals already since the beginning. Although metric support has been added only relatively recently.

dyladan commented 1 year ago

Some key foo.bar should not ever have a different meaning in traces as it does in metrics for example.

FWIW, OTels's current semconv tooling should enforce uniqueness of attribute keys across signals already since the beginning. Although metric support has been added only relatively recently.

It appears you're right. I still think defining keys outside signals makes more sense. Even more so if the tooling already enforces this.

Oberon00 commented 1 year ago

I still think defining keys outside signals makes more sense.

To that end, @lmolkova has added attribute_groups recently: https://github.com/open-telemetry/build-tools/pull/124, I think the use case was actually sharing attributes between traces & metrics. So it looks like we have been already moving into that direction recently.

reyang commented 1 year ago

I am blocking this since I don't support this OTEP in the current form. I still support the idea of converging the ECS and Otel semantic conventions but not without addressing this comment.

Furthermore it seems the existence of the OTEP is cited as a reason for proposing changes to existing conventions (for example), which is premature. It seems like the fact of an open OTEP with some approvals is signalling that it is guaranteed to be accepted and merged, which is not the case.

I am blocking the OTEP to signal it is not ready and should not be used a justification for specification changes yet.

I will remove my block when the open questions are address and we agree to move forward with the OTEP.

@tigrannajaryan I think this is resolved, could you take a look?

dyladan commented 1 year ago

How are fields which don't map naturally onto existing OpenTelemetry signals going to be handled? For example the Rule, Related, or Risk fields. I'm not asking for specific answers for these particular fields, but are we in general expected to include all ECS fields even if they're not expected to be used by OpenTelemetry?

tedsuo commented 1 year ago

@dyladan I suggest we use these fields as guides, and adopt them whenever OpenTelemetry expands into the relevant domain.

dyladan commented 1 year ago

@dyladan I suggest we use these fields as guides, and adopt them whenever OpenTelemetry expands into the relevant domain.

The reason I ask is because the security domain fields are specifically mentioned several times and some of them don't have obvious OTel applications.

Excited this is finally happening! It's great to have ECS become part of the OpenTelemetry community.

I agree this has been a long time coming and it's good to see some alignment.


Want to be clear that I'm not trying to be a thorn here, just want to make sure everyone is on the same page. I don't want to end up in the situation in the future where elastic folks feel like we reneged on the agreement if some of the fields without obvious applications take a long time to be integrated, but also don't want to end up in a situation where OTel is taking on the maintenance burden for fields which we're not really using.

reyang commented 1 year ago

I am blocking this since I don't support this OTEP in the current form. I still support the idea of converging the ECS and Otel semantic conventions but not without addressing this comment. Furthermore it seems the existence of the OTEP is cited as a reason for proposing changes to existing conventions (for example), which is premature. It seems like the fact of an open OTEP with some approvals is signalling that it is guaranteed to be accepted and merged, which is not the case. I am blocking the OTEP to signal it is not ready and should not be used a justification for specification changes yet. I will remove my block when the open questions are address and we agree to move forward with the OTEP.

@tigrannajaryan I think this is resolved, could you take a look?

@tigrannajaryan I'll merge the OTEP now given we got a good set of approvals and your concerns have been addressed (I believe). Let's follow up offline once you are back from vacation and see if you have additional comments what we should follow up in another PR.