open-telemetry / weaver

OTel Weaver lets you easily develop, validate, document, and deploy semantic conventions
Apache License 2.0
55 stars 19 forks source link

Support multiple semantic convention registries #215

Open lquerel opened 4 months ago

lquerel commented 4 months ago

We'd like to be able to combine multiple registries into a single master resolved semantic convention registry. For example, combining the official OpenTelemetry semantic convention with one or several vendor/enterprise registries.

MadVikingGod commented 3 months ago

Here are some thoughts on how this might look. Don't take any of this as how it should be done.

Base Registry

I think there should be a base registry, this is what dictates what other registries or parts of registries should be included and what should be over written. These are assumed to not be written by otel, but by customers using this feature.

Includes (Dependency Management)

Here there is a trade off. We could require that any dependency is fully resolved (into a single json/yaml file), or we could have a more ephemeral reference (to a directory or git repository). The former will be a usability hurdle, because it puts more work up front to use this feature, while the former will introduce problems like availability and circular dependencies. I would probably suggest the former at least for prototypes.

What is Rendered

When rendering documentation or code generation we can either include all of any included registries or we can opt to only include the portions that are referenced. We should make it easy to include groups from a different registry.

References

I think everything but an include (the high level name I made for repository reference) should be a referenceable type, so right now groups, and attributes. These should be referenced by both their repository name and the name in that repository, and any field should be overwriteable.

An example:

includes:
- name: "otel_1.27.0"
  file: "includes/otel-v1.27.0.json"
groups:
- ref_repo: otel_1.27.0
  ref: metric.http.server.request.duration
  unit: "ms"
  attributes:
  - id: environment
    type: string
    brief: This is an additional attribute included.
    examples:
    - dev
    - prod
  - ref_repo: otel_1.27.0
    ref: server.address
    brief: This overwrites an included attribute.  It should be the service name of the server.
    examples:
    - foo
    - bar

This should yield one metric group, a few attribute_groups that have the different attributes in metric.http.server.request.duration

lquerel commented 3 months ago

I'm very pleased to see this topic being explored again (for reference, the concept of app telemetry schema touches on the same subject with a few minor differences). I believe we are getting closer to the moment when Weaver will finally be able to support this type of use case, which, in my view, is the original use case I had in mind when starting this project.

Before diving into the details of the dependency-based registry format, I would like to take some time to outline the overall vision that seems interesting to pursue. To me, Weaver is a platform that can be utilized not only by OTel SemConv core maintainers but also by any vendors or enterprises that wish to leverage these registries and define their own, so they can be reused by others. The goal is to rationalize the use of signals among all these actors and also to enable developers to generate their SemConv SDK code to match exactly what they are using in terms of signals and attributes defined by OTel, vendors, or internal teams within their organization. In the long term, I believe we should go much further by using the same mechanism to generate not just SemConv SDKs as defined today, but type-safe OTel Client SDKs that semantically expose the signals described in the registries attached to a particular application (an example of this exists in the Weaver repo, see crates/weaver_codegen_test). On the same principle, we should also be able to generate many other types of assets such as pre-configured dashboards, database schemas, configurations for data pipelines, and more.

A typical hierarchy of registries would have the following structure: registry_hierarchy

This hierarchy is fully distributed and operated by different actors with different priorities and release cycles. Weaver plays a crucial role in this by enabling 1) unified loading of registries whether they are accessible locally, via a git repo, or as a local or remote archive, 2) parsing and interpreting them, 3) resolving them individually, 4) resolving the dependency graph, 5) merging these registries to create a coherent whole, 6) enforcing policies to maintain a high level of quality in these registries and ensuring that backward compatibility rules are observed, and 7) generating documentation, code, stats, and other assets.

Weaver must be able to be integrated into any CI/CD pipeline and at any level of this hierarchy. For example, in the diagram below, Weaver is used to generate the SemConv SDK for a specific application (App B) that defines its own signals and inherits signals from another team within the company developing this application, as well as signals defined by a vendor and by OTel. app_registry

To conclude, I think that indeed, as you pointed out, we need to be able to reference other local or remote registries from the components of a registry. We need to support local overrides. We need to be able to reuse policies established by OTel or other actors. We need to be able to reuse codegen packages to generate the local SDK in the preferred language (same for the documentation), etc.

All of this comes with some challenges to resolve, to name a few:

Ideally, this thread should lead to the creation of a design document on which we can base our implementation within Weaver.

jsuereth commented 3 months ago

My only thought here is that we MAY want to take some time to remove wonkiness in the core model.

E.g. We want to add multiple registries, but I think this foundationally means we understand what it means to "re-use" a metric, or event.

There may be a set of core-modelling tasks we could outline and try to own before reaching the full "another registry source" solution.

lquerel commented 3 months ago

Yes I agree, we should start to identify these core-modeling tasks in a separate GH issue and add a link to this one.

lmolkova commented 3 months ago

Super-excited about this! Also going to be an active customer of this feature - we have some conventions for azure sdks defined here where I'm referencing general semconv spec, but can't reuse any tooling.

Some thoughts:

  1. maybe we should namespace group-ids? E.g. have otel.registry.attributes.url... and recommend others to do your-company.registry....

    It allows us to

    • import external semconv once and not qualify each attribute
    • enforce group/attribute version consistency - you can't import two different versions of otel (at least in one file)

    I.e.

    includes:
    - schema_url: https://opentelemetry.io/schemas/1.27.0
      namespaces: # list specific ones, or import all
      - otel.registry.attributes.url # or `otel.registry.attributes` to import whole attributes registry
      - otel.client.sdk.imaginary
    
    groups:
    - id: azure.sdk.storage.blobs
      ...
      extends: otel.client.sdk.imaginary... # fully qualified and unique across all imported semconv
      attributes:
      - ref: url.full # auto-resolved from `otel.registry.attributes.url`
      - 
  2. we can start with attributes where we already have registry, referencing mechanism, and clear understanding of what it all means.

lquerel commented 3 months ago

@lmolkova It's always nice to know that what we're building will be used on a large scale.

Several questions regarding your comment:

Otherwise, I like the idea of starting small with a focus on attributes, as it is the most mature part of SemConv.

lmolkova commented 3 months ago

Can we currently retrieve a semconv registry archive/repo from a schema_url

the best we can do now is https://github.com/open-telemetry/semantic-conventions/archive/v1.27.0.zip, but that's the source code. Agree, we'll need to figure out where/how to host resolved version.

Ideally we should have it under schema URL, but this needs schema https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/schemas/file_format_v1.1.0.md v2(?)

In your example, to import all, do you specify a namespace?

I'd like to be able to import all (from certain source like otel) in addition to being able to specify a subset. Not sure I understand how not allowing import-all helps with duplicates. Explicitly importing two namespaces with duplicates should fail, implicitly importing them should fail too.

So group-id, attribute/metric/event names must be unique across all imported namespaces. If it's not enough we can introduce some ways to qualify, but I hope it never comes to it.

MadVikingGod commented 3 months ago

This continues a few different threads:

Schema_URL

I think that we should treat Schema URL as something fundamentally different from a registry. It may be how we have chosen to represent them, but the content at the URL is how a registry has evolved, not what is available to be used. It also has been tried to evolved to include inheritance and other features, but it has always lacked substantial content. I think if we added anything to the Schema is some way to reference where the content is, e.g. a URL to a resolved schema.

Namespaces

I think that ALL namespaces should be local. This allows us to have multiple imports that have the same attribute, but be able to reference them both. Doing this prevents any clashes in attribute names, and only leaves the current error where two attributes are named the same thing.

include:
  - name: "otel_1270"
    file: "..."
  - name: "vendor_a"
    file: "..."
groups:
  - name: "example"
    attributes:
    - ref_repository: "otel_1270"
      ref: "geo.lat"
      rename: "latitude"
    - ref_repository: "vendor_a"
      ref: "geo.lon"
      rename: longitude

This does mean that all referencable items need to have some namespace, and I think the default one for local items should be empty "".

You might also notice that I have not been concatenating the namespace with the name of the attribute. This is to prevent a repeat of prefix, in that the name will be used in a different way from the tuple of (namespace, name)

Dependencies

These should be transported as a form of resolved convention, which only includes attributes/groups that were referenced. I strongly suggest this, so we don't have to build a graph solver a'la pip, go, etc. This does mean that users will have to vendor in their dependencies resolved config

Implicit References

When resolving a registry anything that is

Semantic Version

Semantic Version Does NOT help us in any of the endeavors we are undertaking in this design. It shouldn't be a first class citizen in this area. If a repository used v1.a, v1.b, and v1.c, but had the appropriate upgrade paths in it's schema (from the schema_url) to get from 1.a to 1.c it should work just as fine as if it used v1.2.0, v1.2.1, and v1.3.0. To this end there need to be space to record the version, but that can already be achieved in either the name of the import, or the file/url used.

lmolkova commented 3 months ago

I think that ALL namespaces should be local. This allows us to have multiple imports that have the same attribute, but be able to reference them both.

what would be the use-case for it? Can we start without allowing clashes? I'd start by prohibiting it and would be very cautious to introduce it.

MadVikingGod commented 3 months ago

Sure let's talk about Laurent's Example, specifically from the point of view of the "Enterprise Registry". If Vendor B happens to export semconv v1.25.0 http.server.traces group alongside its custom attributes, and of course Otel exports http.server.traces. If we can't reference them independently we can't actually combine them in a sane way.

If we implicitly export all of the fields then we either have duplicates (and can reference both of them in some way) or we have to resolve the duplicates, giving rise to questions like: Should we follow semconv? Should the latter one take priority? How do we incorporate modifications done by Vendor B?

In my proposal we would have the Enterprise Registry say "Reference otel_v1.27.0 http.server.traces, and add vendor_B attributes a-f"

lmolkova commented 3 months ago

If Vendor B happens to export semconv v1.25.0 http.server.traces group alongside its custom attributes, and of course Otel exports http.server.traces. If we can't reference them independently we can't actually combine them in a sane way.

only if vendor B has duplicates. I'm saying they should not have duplicates. They can remove them or not import corresponding namespace and then leverage the benefit of otel schema.

The reason: telemetry consumers don't see any group-id/import registry magic - they only see schema url and attribute name. If these two don't uniquely identify a specific attribute, nobody can sanely consume this telemetry.

It does not matter how things are defined and how tooling processes them, the attribute uniqueness is a requirement on produced telemetry.

MadVikingGod commented 2 months ago

Vendor B, and their libraries would ultimately produce attributes that have a scope using the Vendor B schema_url. I'm suggesting that those libraries use HTTP attributes, so must export them.

Enterprise libraries would use the Enterprise schema_url. If it wants to use new versions of the otel http data that would conflict with the http data from Vendor B. My proposal says that Enterprise can reference from either source, combine them, but would only export anything that was referenced for inclusion in the final output. It should still be an error to reference the same attribute from both source.

While I agree that it might not matter the exact mechanism, but I'm trying to iron out a feasible way to actually do this so we need to come to some agreement on how these things look and how they operate. I'm trying to do this at the same time as steering away from pitfalls and very large scopes.

lmolkova commented 2 months ago

if VendorB does not want to use HTTP attributes from otel, they should explicitly import specific otel namespaces and avoid importing otel.http.

Someone who wants to import full otel, should be able to import everything from otel.

Allowing to import url.full from otel vN, url.scheme from otel vM and url.query from custom_schema vK is not meaningful and I think we should not build something that allows it.

lquerel commented 2 months ago

@MadVikingGod @lmolkova @jsuereth It seems we’ve reached a point where continuing this discussion as a long thread is becoming unwieldy. To streamline our efforts, I’ve initiated a spec-oriented PR in Weaver. This will allow us to iteratively refine a series of documents that represent our latest thinking on the concept of multi-registry.

I’ve tried to incorporate most of the elements discussed here, proposed alternatives where applicable, created a small/concrete use case example, and started identifying the changes needed within the OTEL ecosystem to support this idea.

For everyone following this thread, please review the documents in the PR and provide your feedback in PR #348 . This way, we can leverage GitHub’s native review process to manage the discussion more effectively.