open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
256 stars 165 forks source link

Add CosmosDb Otel Specification #715

Open sourabh1007 opened 1 year ago

sourabh1007 commented 1 year ago

Context

CosmosDB SDK has 2 modes, Gateway (HTTP) and Direct (RNTBD), In different modes, for one operation call there can be multiple networks calls behind the scenes as failover/retries/replica selection, such logic runs in SDK itself. Cosmos DB SDK is going to generate Activity at Operation level and Network level carrying some attributes with information.

Proposed Semantic conventions for SDK operation level calls

https://github.com/Azure/azure-cosmos-dotnet-v3/issues/3058

Proposed Semantic conventions for SDK network (RNTBD) calls to CosmosDb

Attribute Value Comment
rntbd.url rntbd url with partitionid and replicaid
rntbd.operation_type Operation Type
rntbd.resource_type Resource Type
rntbd.status_code 201/200/204 network status code
rntbd.sub_status_code 1000/1002 Cosmos Db SubStatus Code

Status

Work is in progress to instrument .Net CosmosDb SDK with open telemetry support.

What I need

I need approval from open telemetry community to include cosmosdb specification in official opentelemetry specifications. As soon as I got approval here, I will get a PR out with the specifications.

Note: I am new in this community please let me know if I need to schedule a call or something in order to discuss this.

joaopgrassi commented 1 year ago

I think you could open up your proposal PR right away and the community will move on from there. I don't think there will be any "approval" in an issue. Most discussions happen on PRs directly, so feel free to do so!

Oberon00 commented 1 year ago

@joaopgrassi I think our on-paper contribution worklow was changed but the actual workflow never followed.

https://github.com/open-telemetry/opentelemetry-specification/blob/main/CONTRIBUTING.md#proposing-a-change

Follow the issue workflow and make sure the issue is accepted with a "Yes" response. If the response to the issue is not "Yes" then do not create a PR that implements the change since it will be rejected.

I think this is simply does not match reality. CC @open-telemetry/technical-committee

joaopgrassi commented 1 year ago

Thanks for pointing out! I missed this somehow. But agree, from the issues/PRs I look I didn't see this workflow in place.

sourabh1007 commented 1 year ago

thanks for the reply @joaopgrassi and @Oberon00 I tried to follow this only. As you said, I am going to create a PR with the proposal.

trask commented 8 months ago

@sourabh1007 was this issue resolved by https://github.com/open-telemetry/opentelemetry-specification/pull/3097? thx

sourabh1007 commented 8 months ago

There are 2 sections above. Operation level and Network level activities. So, attribute names for "operation level activity" is approved by the community as part of the mentioned PR. PR to get the approval on attributes for Network level Activity is yet to be raised but we have internally closed on it, ref. https://github.com/Azure/azure-cosmos-dotnet-v3/issues/4232. It will be this: rntbd.*: where can by any cosmos db specific attribute name `rntbd.timeline.` : * can be any request pipeline specific information

lmolkova commented 8 months ago

@sourabh1007

My understanding that rntbd is effectively a network protocol and is not strictly related to database conventions. It it correct? If so, I suggest to send a PR to add it into RPC spec.

Also, small comment on:

rntbd.url

any reason not to use url.full ?

It's also a good question what we want to stabilize for databases. For messaging semconv we explicitly decided to focus on logical operations and not the transport-layer for the time being.

jcocchi commented 8 months ago

@lmolkova rntbd is an internal network protocol built by Cosmos DB on top of tcp. Since rntbd is unique to Cosmos DB, does it make sense to keep it as part of the db spec itself? The thinking is it will be easier for customers to look at our semantic convention page as a single source of truth for all Cosmos DB instrumentation.

trask commented 5 months ago

@jcocchi should this issue be tracked as part of database semconv stabilization effort, or can it be addressed post-stability? Thanks

jcocchi commented 5 months ago

@trask this can be tracked post-stability. We decided not to expose rntbd network traces for now and will reconsider if we hear customer feedback. Thanks!