Open trisch-me opened 6 months ago
Is registry a superset of all possible attributes within the same global namespace?
My view and understanding is also this. I think the registry should be agnostic of the usage and should contain all things. Unless it's really just very specific, which is something that is easily seen in the database ones. (have a namespace for a particular db system)
I think the registry should be agnostic of the usage and should contain all things.
So do you agree then if we have multiple unspecified fields for the same namespace they should be under the root? As in example above we have discussed that user.domain
field has meaning only from auth perspective, so it means not every usage of user
would have it, but I don't feel we need to have specific sub-namespace for it such as user.auth.domain
(though we might), I feel this one is generic enough so that it should be under top domain.
The same goes for rule.*
namespace - the only specific thing is malware_score
and could be indeed introduced under security
sub-namespace. But others, such as rule.reference
etc are agnostic to specific usage. If another use case has additional fields for rule they could be introduced later under the root as well.
My position on all of the comments that regardless of the registry, the naming should be precise and descriptive.
rule
can mean absolutely anything depending on the context - a generic filter, a security alert, part of business process, etc. There are no attributes that'd make sense for each of these domains. I.e. there should not be a generic rule
namespace, instead we can do
security_alert
, azure.servicebus.rule
, workflow.rule
, etc. rule.security
, rule.workflow
, etc namespaces, but it does not really make sense since generic rule.id
or rule.name
are so vague, it's not possible to meaningfully describe what they are. Phrases like rule.id is the unique id of the rule
can as well be applied to anything else - object.id is the unique id of the object
. user.domain
(i.e. an issuer of the user identity) does not make sense. For e-shop, user identity is important but only during auth. The rest of the telemetry should probably have a hashed identifier of the user. These are the naming principles we use when designing APIs for Azure SDKs.
The litmus test for me - if I'm looking at the attribute name alone I want to be able to correctly guess what it means. If it's scoped to a specific domain, I'd like to be able to tell which domain (e.g. security_rule|alert
or identity.issuer
)
Let's discuss it during semconv meeting?
[Update] Google SDKs have the same design principles outlined here https://cloud.google.com/apis/design/naming_convention
Avoid name overloading. Use different names for different concepts. Avoid overly general names that are ambiguous within the context of the API and the larger ecosystem of Google APIs. They can lead to misunderstanding of API concepts. Rather, choose specific names that accurately describe the API concept.
One more thing: if we follow a principle that a namespace contains all applicable to that namespace attributes, we'd end up with attributes from different domains. E.g. rule.malware_score
and rule.workflow.name
and rule.filter.type
and rule.foobar
.
We should be able to write a descriptive brief for the namespace. If "The namespace foo
describes foo" is the only thing we can tell, then it's not specific enough and namespace foo
should not be added to the registry.
@lmolkova +1 to discuss it during semconv meeting. I wanted to do it yesterday but it was canceled due to holidays.
I also understand and partially support your ideas here. I am thinking from another POV - I'm coming with some domain knowledge and want to check how rules are defined in semconv. I would go to check rule
namespace if any. I have no objections to introduce security subnamespace for rule i.e. rule.security.*
but all the fields we have discussed in the rule PR are not security specific. We do use some of them in our use cases but it doesn't mean they belong to security domain.
Attributes in the namespace should be descriptive and make sense. Outside of the identity/auth context, user.domain (i.e. an issuer of the user identity) does not make sense. For e-shop, user identity is important but only during auth. The rest of the telemetry should probably have a hashed identifier of the user.
And this is where I believe registry should have its superset
power - we define all possible values for user. And any particular use case is using fields it needs. In one use case it is a hash, in another it is a domain. You don't need to use all the attributes from registry to define particular entity. Although I'm not against adding a sub-namespace (user.auth.domain
) for this particular field to describe it better.
We should be able to write a descriptive brief for the namespace. If "The namespace foo describes foo" is the only thing we can tell, then it's not specific enough and namespace foo should not be added to the registry.
to be honest almost all of our registry namespaces are defined as X: describes X attributes
.
I'm coming with some domain knowledge and want to check how rules are defined in semconv. I would go to check rule namespace if any. I have no objections to introduce security subnamespace for rule i.e. rule.security.* but all the fields we have discussed in the rule PR are not security specific. We do use some of them in our use cases but it doesn't mean they belong to security domain.
What I'm saying that rule
is not part of any domain. I'd never expect Azure Service Bus rules to be described by the rule
namespace - it's not specific enough. My rules don't even have an id or name, they should exist in azure.servicebus.rule
namespace. Having both the domain-specific rules and generic rules that can describe anything is not great.
And this is where I believe registry should have its superset power - we define all possible values for user. And any particular use case is using fields it needs.
This creates a problem that single namespace has a mixture of unrelated attributes. And attribute name not being descriptive (" In one use case it is a hash, in another it is a domain.").
to be honest almost all of our registry namespaces are defined as X: describes X attributes.
I hope it's because we are lazy (don't write good descriptions because we never render them) and not because we cannot write better ones.
In a couple of recent PRs there has been discussions about scope of existing namespace and different attributes belonging to them.
For example in #731 there was a discussion about
user.domain
field (Name of the directory the user is a member of. For example, an LDAP or Active Directory domain name.), which has more narrow scope related to auth/identity.In #903 we discussed the naming
rule
itself and noted that while some fields (rule.id
,rule.name
) are generic enough, other fields (rule.reference
,rule.license
) are used in specific use cases and not applicable for all rules.I want to discuss and define the purpose of the registry here. From all the discussions we are having in the semconv meeting about it I have an understanding, that registry is a standard common dictionary of all possible attributes for a particular namespace.
This means that the registry should include all attributes for that namespace, even if they are not closely related. For example, the registry should include both
rule.fieldX
andrule.fieldY
, wherefieldX
will be used in one use case andfieldY
will be used by another. They all belong to therule
namespace as they are semantically connected to the concept of a rule. In very specific, narrow use cases, we might also add additional sub-namespacing, such asrule.security.malware_score
.Therefore, I see a registry as a superposition of all attributes where each specific use case might select only the needed fields.
We already have examples of this in registry. For instance, we have common DB attributes and then specific attributes for particular databases, but they are all defined under the same parent namespace
db
. The same goes forgcp
andaws
namespaces, where we have completely different entities such asaws.ecs
,aws.dynamodb
etc but they are all merged under the common globalaws
name.The
http
namespace also has multiple fields, and different http metrics are using different subset of http attributes.So, taking this into account, here are the questions:
Is registry a superset of all possible attributes within the same global namespace?
When should we introduce sub-namespace (such as
aws.ecs
), and when should attributes remain at the root of the namespace despite having slightly different meanings? For example, in therule
namespace, fields likerule.reference
andrule.license
are not applicable to every rule but are also not specific to any domain. I propose that such fields remain at the top level, as they belong to the common part of the namespace even though they might not be used by every rule.