open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
256 stars 165 forks source link

Fields definitions for attributes registry #1073

Open trisch-me opened 4 months ago

trisch-me commented 4 months ago

In a couple of recent PRs there has been discussions about scope of existing namespace and different attributes belonging to them.

For example in #731 there was a discussion about user.domain field (Name of the directory the user is a member of. For example, an LDAP or Active Directory domain name.), which has more narrow scope related to auth/identity.

In #903 we discussed the naming rule itself and noted that while some fields (rule.id, rule.name) are generic enough, other fields (rule.reference, rule.license) are used in specific use cases and not applicable for all rules.

I want to discuss and define the purpose of the registry here. From all the discussions we are having in the semconv meeting about it I have an understanding, that registry is a standard common dictionary of all possible attributes for a particular namespace.

This means that the registry should include all attributes for that namespace, even if they are not closely related. For example, the registry should include both rule.fieldX and rule.fieldY, where fieldX will be used in one use case and fieldY will be used by another. They all belong to the rule namespace as they are semantically connected to the concept of a rule. In very specific, narrow use cases, we might also add additional sub-namespacing, such as rule.security.malware_score.

Therefore, I see a registry as a superposition of all attributes where each specific use case might select only the needed fields.

We already have examples of this in registry. For instance, we have common DB attributes and then specific attributes for particular databases, but they are all defined under the same parent namespace db. The same goes for gcp and aws namespaces, where we have completely different entities such as aws.ecs, aws.dynamodb etc but they are all merged under the common global aws name.

The http namespace also has multiple fields, and different http metrics are using different subset of http attributes.

So, taking this into account, here are the questions:

  1. Is registry a superset of all possible attributes within the same global namespace?

  2. When should we introduce sub-namespace (such as aws.ecs), and when should attributes remain at the root of the namespace despite having slightly different meanings? For example, in the rule namespace, fields like rule.reference and rule.license are not applicable to every rule but are also not specific to any domain. I propose that such fields remain at the top level, as they belong to the common part of the namespace even though they might not be used by every rule.

joaopgrassi commented 4 months ago

Is registry a superset of all possible attributes within the same global namespace?

My view and understanding is also this. I think the registry should be agnostic of the usage and should contain all things. Unless it's really just very specific, which is something that is easily seen in the database ones. (have a namespace for a particular db system)

trisch-me commented 4 months ago

I think the registry should be agnostic of the usage and should contain all things.

So do you agree then if we have multiple unspecified fields for the same namespace they should be under the root? As in example above we have discussed that user.domain field has meaning only from auth perspective, so it means not every usage of user would have it, but I don't feel we need to have specific sub-namespace for it such as user.auth.domain (though we might), I feel this one is generic enough so that it should be under top domain.

The same goes for rule.* namespace - the only specific thing is malware_score and could be indeed introduced under security sub-namespace. But others, such as rule.reference etc are agnostic to specific usage. If another use case has additional fields for rule they could be introduced later under the root as well.

lmolkova commented 4 months ago

My position on all of the comments that regardless of the registry, the naming should be precise and descriptive.

These are the naming principles we use when designing APIs for Azure SDKs. The litmus test for me - if I'm looking at the attribute name alone I want to be able to correctly guess what it means. If it's scoped to a specific domain, I'd like to be able to tell which domain (e.g. security_rule|alert or identity.issuer)

Let's discuss it during semconv meeting?

[Update] Google SDKs have the same design principles outlined here https://cloud.google.com/apis/design/naming_convention

Avoid name overloading. Use different names for different concepts. Avoid overly general names that are ambiguous within the context of the API and the larger ecosystem of Google APIs. They can lead to misunderstanding of API concepts. Rather, choose specific names that accurately describe the API concept.

lmolkova commented 4 months ago

One more thing: if we follow a principle that a namespace contains all applicable to that namespace attributes, we'd end up with attributes from different domains. E.g. rule.malware_score and rule.workflow.name and rule.filter.type and rule.foobar.

We should be able to write a descriptive brief for the namespace. If "The namespace foo describes foo" is the only thing we can tell, then it's not specific enough and namespace foo should not be added to the registry.

trisch-me commented 4 months ago

@lmolkova +1 to discuss it during semconv meeting. I wanted to do it yesterday but it was canceled due to holidays.

I also understand and partially support your ideas here. I am thinking from another POV - I'm coming with some domain knowledge and want to check how rules are defined in semconv. I would go to check rule namespace if any. I have no objections to introduce security subnamespace for rule i.e. rule.security.* but all the fields we have discussed in the rule PR are not security specific. We do use some of them in our use cases but it doesn't mean they belong to security domain.

Attributes in the namespace should be descriptive and make sense. Outside of the identity/auth context, user.domain (i.e. an issuer of the user identity) does not make sense. For e-shop, user identity is important but only during auth. The rest of the telemetry should probably have a hashed identifier of the user.

And this is where I believe registry should have its superset power - we define all possible values for user. And any particular use case is using fields it needs. In one use case it is a hash, in another it is a domain. You don't need to use all the attributes from registry to define particular entity. Although I'm not against adding a sub-namespace (user.auth.domain) for this particular field to describe it better.

We should be able to write a descriptive brief for the namespace. If "The namespace foo describes foo" is the only thing we can tell, then it's not specific enough and namespace foo should not be added to the registry.

to be honest almost all of our registry namespaces are defined as X: describes X attributes.

lmolkova commented 4 months ago

I'm coming with some domain knowledge and want to check how rules are defined in semconv. I would go to check rule namespace if any. I have no objections to introduce security subnamespace for rule i.e. rule.security.* but all the fields we have discussed in the rule PR are not security specific. We do use some of them in our use cases but it doesn't mean they belong to security domain.

What I'm saying that rule is not part of any domain. I'd never expect Azure Service Bus rules to be described by the rule namespace - it's not specific enough. My rules don't even have an id or name, they should exist in azure.servicebus.rule namespace. Having both the domain-specific rules and generic rules that can describe anything is not great.

And this is where I believe registry should have its superset power - we define all possible values for user. And any particular use case is using fields it needs.

This creates a problem that single namespace has a mixture of unrelated attributes. And attribute name not being descriptive (" In one use case it is a hash, in another it is a domain.").

to be honest almost all of our registry namespaces are defined as X: describes X attributes.

I hope it's because we are lazy (don't write good descriptions because we never render them) and not because we cannot write better ones.