Add `embed` concept for attributes reuse (in addition to `ref`)

AlexanderWert commented 10 months ago

This issue describes a concrete proposal on how to realise attributes reuse through yaml syntax and tooling.

Related issues / PRs:

Proposal

I'm proposing to introduce a new, additional type of attribute reference called embed (alternatively: nest) in the syntax that would behave similarly to ref property BUT with the following difference: When embed is used the referenced attribute will be nested under the namespace of the attribute group in which the attribute is being referenced.

Example

See the following example that illustrates the idea and the difference to ref:

We would define geo attributes in a geo.yaml file:

groups:
  - id: geo
    prefix: geo
    type: attribute_group
    brief: Describes the geo location
    attributes:
      - id: location.lon
        type: double
        brief: Longitude of the geo location in [WGS84](https://en.wikipedia.org/wiki/World_Geodetic_System#WGS84).
        examples: [ -73.614830 ]
      - id: location.lat
        type: double
        brief: Latitude of the geo location in [WGS84](https://en.wikipedia.org/wiki/World_Geodetic_System#WGS84).
        examples: [ 45.505918 ]

In client and server files we would embed the attributes. ! Note: Here I consciously embed the geo.location.lon attribute BUT ref the geo.location.lat attribute to demonstrate the difference:

groups:
  - id: client
    prefix: client
    type: attribute_group
    brief: ...
    attributes:
      - id: address
        ...
      - embed: geo.location.lon            <<-- THAT IS NEW
      - ref: geo.location.lat

When rendering (or code generating) the above it will result in the following attributes:

client.geo.location.lon
geo.location.lat

Note the difference that geo.location.lon is now nested under the client namespace because it was embedded (embed) while geo.location.lat keeps it's original full name because it was referenced (ref).

Extended proposal

We could even think about an embed_namespaces property on the attribute_group level to simplify embeddings of entire namespaces into other namespace. For example for geo we could do something like the following to embed ALL of the geo.* fields into the client:

groups:
  - id: client
    prefix: client
    type: attribute_group
    brief: ...
    embed_namespaces:
      - id: geo                                        <<-- EMBED the entire geo namespace under client
    attributes:
      - id: address
        ...

AlexanderWert commented 10 months ago

@open-telemetry/specs-semconv-approvers @open-telemetry/specs-semconv-maintainers would love your thoughts on the above!

ChrsMark commented 10 months ago

Most probably embed_namespaces is more handy because it would save us time from having to embed all of the attributes one by one. However I would like to have support for both so as to have the flexibility to embed only specific attributes if needed.

I cannot think of any particular pitfall here (maybe others can spot sth) but in general I like this proposal +1.

trisch-me commented 7 months ago

Bumping up this issue and add more information.

I propose to add following keywords:

1. includes

This one will make the whole namespace re-usable and adds it to the current group on the same level - without adding a prefix to the group. It should allow to add multiple namespaces

Example: in the current schema we do have browser.yaml inside models

groups:
  - id: browser
    prefix: browser
    type: resource
    brief: >
        The web browser in which the application represented by the resource is running.
        The `browser.*` attributes MUST be used only for resources that represent applications
        running in a web browser (regardless of whether running on a mobile or desktop device).
    attributes:
      - ref: browser.brands
      - ref: browser.platform
      - ref: browser.mobile
      - ref: browser.language
      - ref: user_agent.original

using new keyword it will be written as following:

groups:
  - id: browser
    prefix: browser
    type: resource
    includes: [browser] <--- NEW keyword
    brief: >
        The web browser in which the application represented by the resource is running.
        The `browser.*` attributes MUST be used only for resources that represent applications
        running in a web browser (regardless of whether running on a mobile or desktop device).
    attributes:
      - ref: user_agent.original

where in md file we could either unpack namespace into specific fields or just give a link to the registry for the whole namespace:

| [`browser.*`](../attributes-registry/browser.md) | Namespace | Browser namespace | | Opt-In | <---- includes

2. embeds

This one will make the whole namespace re-usable and adds it to the current group UNDER the group using its prefix. It should allow to add multiple namespaces

Example: we would like to add whole geo.* namespace to the client and server to make it possible to attach additional attributes to the both client and server.

groups:
  - id: client
    prefix: client
    type: attribute_group
    brief: >
      These attributes may be used to describe the client in a connection-based network interaction
      .......
    attributes:
      - id: address
        stability: stable
        type: string

using new keyword it will be written as following:

groups:
  - id: client
    prefix: client
    type: attribute_group
    embeds: [geo] <---- **NEW**
    brief: >
      These attributes may be used to describe the client in a connection-based network...
    attributes:
      - id: address
      - .....

which resolves into following construction in the md file:

| [`client.geo.*`](../attributes-registry/geo.md) | Namespace | Geo namespace | | Opt-In | <---- embeds

3. embed

This one will make possible to re-use only specific attributes of the namespace, similar to the ref but with a prefix

Example: we don't want to add the whole namespace to the client but only a few attributes.

groups:
  - id: client
    prefix: client
    type: attribute_group
    brief: >
      These attributes may be used to describe the client in a connection-based network...
    attributes:
      - ref: client.geo.location.lon
      - ...
      - embed: geo.location.lat

which resolves into following construction in the md file:

| [`geo.location.lon`](../attributes-registry/geo.md) | type | Description | | Opt-In | <---- ref
| [`client.geo.location.lat`](../attributes-registry/geo.md) | type | Description | | Opt-In | <---- embed

To summarise we would have following representation of all different types (I have changed real browser example to artificial one for similarity with others):

| [`geo.*`](../attributes-registry/geo.md) | Namespace | Geo namespace | | Opt-In | <---- includes whole namespace
| [`client.geo.*`](../attributes-registry/geo.md) | Namespace | Geo namespace | | Opt-In | <---- embeds whole namespace under client
| [`geo.location.lon`](../attributes-registry/geo.md) | type | Description | | Opt-In | <---- referenced attribute
| [`client.geo.location.lat`](../attributes-registry/geo.md) | type | Description | | Opt-In | <---- embedded attribute under client

Please let me know what do you think about this idea, I would like to start to work on it soon.

trisch-me commented 4 months ago

A new proposal for embedded namespaces. I have started to implement this approach and found out that not all use cases will be covered by proposed design. The problem is following - in current approach we embed one namespace directly under another one i.e. if we embed geo under client we will have structure client.geo.*, where * means all attributes of geo namespace. This might work for this particular use case but this will not work when a namespace should embed multiple namespaces under different names. Se following example: We do have a process namespace. Process has multiple fields, including process.parent which should be of type process namespace. Process could also have fields like real_user, effective_user, saved_user all of the type of user namespace.

In most cases we do have the same name resolution as namespace, i.e. client.geo is of type geo namespace or file.hash could be of type hash namespace and that could lead to false assumption that just embeded namespace would be sufficient. But what we need to have to resolve this is to create a new attribute with provided name and new type as a namespace

I propose following concept:

we introduce an embed keyword (proposal 3 from previous comment) for particular fields from other namespaces that should be embedded under existing namespace

groups:
  - id: client
    prefix: client
    type: attribute_group
    brief: >
      These attributes may be used to describe the client in a connection-based network...
    attributes:
      - ref: geo.location.lon <- referenced attribute
      - ...
      - embed: geo.location.lat <- embedded attribute

is resolved to

| [`geo.location.lon`](../attributes-registry/geo.md) | double | Description | | Opt-In | <---- ref
| [`client.geo.location.lat`](../attributes-registry/geo.md) | double | Description | | Opt-In | <---- embed

2a. We introduce a new type for id declaration, i.e. in addition to string, enum, etc we will add a namespace

groups:
  - id: process
    prefix: process
    type: attribute_group
    brief: ...
    attributes:
      - id: parent
        type: process <- this should be existing namespace in registry
        stability: experimental
      - id: real_user
        type: user <- this should be existing namespace in registry
        stability: experimental

2b. We introduce a new keyword for attributes

groups:
  - id: process
    prefix: process
    type: attribute_group
    brief: ...
    attributes:
      - embed_ns: parent
        type: process <- this should be existing namespace in registry
        stability: experimental
      - embed_ns: real_user
        type: user <- this should be existing namespace in registry
        stability: experimental

| [`process.parent.*`](../attributes-registry/parent.md) | Namespace | Parent namespace | | Opt-In | <---- includes whole namespace
| [`process.real_user.*`](../attributes-registry/user.md) | Namespace | User namespace | | Opt-In | <---- includes whole namespace

@lquerel I think from the implementation POW the second option 2b looks better, WDYT?

AlexanderWert commented 4 months ago

Option 2b is cleaner than 2a in my opinion. 2a is very misleading as indicates that it's a single attribute but actually is resolved to a sub-namespace of multiple attributes.

How about the original proposal with a small extension to make the above-mentioned scenario work?. Something like:

groups:
  - id: client
    prefix: client
    type: attribute_group
    brief: ...
    embed_namespaces:
      - id: user   
        as: real_user
      - id: user   
        as: saved_user

That would avoid the confusion that we would define an entire structure of attributes in the attributes property (as it is the case with 2a and 2b).

trisch-me commented 4 months ago

that makes sense @AlexanderWert, I like the clear splitting between usual attributes and namespaces

trisch-me commented 4 months ago

Another thing we have discussed today is that due to similarity in ref and embed we might go another way for a embed option, i.e. there will be no new option but a new flag instead

groups:
  - id: client
    prefix: client
    type: attribute_group
    brief: >
      These attributes may be used to describe the client in a connection-based network...
    attributes:
      - ref: geo.location.lon <- referenced attribute
      - ...
      - ref: geo.location.lat <- embedded attribute
        prefix: true

Those will be resolved into

| [`geo.location.lon`](../attributes-registry/geo.md) | double | Description | | Opt-In | <---- ref
| [`client.geo.location.lat`](../attributes-registry/geo.md) | double | Description | | Opt-In | <---- embed

Do you have any comments for this approach?

open-telemetry / build-tools