Decide how to organize "sub namespaces" on registry YAML model files

joaopgrassi commented 6 months ago

Context

In some cases, the yaml model file in the attributes registry contains multiple "levels" of attributes. One example is the Database one: https://github.com/open-telemetry/semantic-conventions/blob/main/model/registry/db.yaml.

The top id is registry.db, and all attributes go into that. Since for databases, there's multiple db systems, each them have that appended to the id, like cassandra.* or mongodb.*.

When generating the markdown for these attributes in the registry, we rely on tags to render the individual db system attribute tables, like .

Problems with this approach:

The yaml file is large, and finding a "group" of attributes (say for cassandra) is hard, as they are all together under the same group id: registry.db
We have to rely on tags to be able to render the markdown table for each individual "thing". Because of this, we have to repeat the same tag for each attribute in yaml, like so: https://github.com/open-telemetry/semantic-conventions/blob/main/model/registry/db.yaml#L14

An alternative to this

Instead of relying in tags, in the model for the registry we can simply organize each individual group under it's own id. For example:

id: registry.db.cassandra
id: registry.db.mongodb

Pros of this option

Don't need to use tags
It's easier to find the attributes per group in the yaml files
It's easier/clearer to generate markdown tables for each group

An example of this approach can be found in this PR: https://github.com/open-telemetry/semantic-conventions/pull/848/files#diff-3efbd7bfaa9b1122d4421e83e19833ead514f4c41ef2c72450bb8abc725f35e1

What to do

We need to decide how we want to go forward and make it consistent across the repo.

AlexanderWert commented 6 months ago

I like the proposal!

We need to have meaningful guidelines on the following though:

When to split into sub-groups (which also implies sub-sections in the corresponding registry readme) vs. keeping it in one table / group. I think we should do the split only in cases when the overall table gets too large otherwise, as one of the original purposes of the registry view is to have a flat, ordered list of attributes (that is easily navigable). I agree though that in some cases (like the DB, AWS) it makes sense to split it into sub-namespaces.
When we do split into sub-groups, I think the splitting should be solely based on the sub-namespace! We should avoid semantic grouping of attributes in the registry (i.e. splitting a set of attributes into a separate group though the attributes have different sub-namespaces), because it would make navigation and discoverability difficult again.

trisch-me commented 6 months ago

Maybe we should add it to the guidelines? So new contributions will follow the process and semantic meaning of splitting the groups?

trisch-me commented 6 months ago

For the second option there will be no defined registry.db group. So we will not be able to generate list of all db attributes without grouping if need arises. Using tags this will be possible, but I'm not sure if this case is relevant

joaopgrassi commented 6 months ago

@trisch-me registry.db is already defined today, and it contains the general attributes :).

Maybe we should add it to the guidelines? So new contributions will follow the process and semantic meaning of splitting the groups?

Yeah once we agree I will add to the guidelines.

trisch-me commented 6 months ago

Yes it is defined and has all sub attributes under it, where grouping is happening through tags. So generic attributes are having tag db-generic If we will change it to the different ids, we will not have all attributes under main category. I'm not against second option. I just want to bring it to our attention that in that case generation of all sub attributes for given main category (db, aws, process etc) will not be possible (or I'm not aware how to do so)

lmolkova commented 6 months ago

I'd prefer to focus on the markdown and the final representation of the attributes. So far the yaml organization was not important.

Authors can split into subgroups, or use one group with tags when it helps them produce better markdown.

If we see that some groups became too big and we'd like to change it - let's do it, but I don't understand the benefit of having any rigid guidelines on yaml organization unless we need it for something very specific (like auto-generating registry).

lmolkova commented 5 months ago

I think we can provide soft-guidance (e.g. in contrib.md?) to use yaml-group per table to be rendered in the MD.

E.g.:

if http.request and http.response are rendered in the same registry table, they should be in the same yaml. If we ever feel like splitting them, we should be able to do it.
messaging.kafka and messaging.rabbitmq should probably appear in different tables, so they should be defined in different yaml groups

Usually it'd mean that system-specific attributes should be defined in the individual groups. Since registry will be auto-generated, tags will be useless and it all will prevent registry groups from growing up too much.

See #952 for the implementation on db/messaging.

joaopgrassi commented 5 months ago

So I think then the initial idea of using groups of general + specific attributes is the way to go. I will try to add some guidance on the docs for this. Assigning to me.

open-telemetry / semantic-conventions