open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
218 stars 142 forks source link

Code generation: how to avoid naming collisions #1118

Open lmolkova opened 3 weeks ago

lmolkova commented 3 weeks ago

As we discovered in #1031, certain semantic conventions changes (together with default code generator behavior) result in ambiguous constant names generated in the code.

For example, when foo.bar_baz is renamed to foo.bar.baz, the code generator produces the same constant name for both -FOO_BAR_BAR , FooBarBaz, etc depending on the preferred casing.

We don't remove attributes/enum members/metrics from the semantic conventions anymore (old property is deprecated), so both constants would exist in the same version.

So, the semantic conventions together with the codegen should prevent such collisions from happening:

Please comment/vote on specific options listed in the comments.

Note: there could be edge cases when ambiguity is fine and tolerable or another collision resolution approach could be used. Here we want to pick a default behavior - it does not prevent someone from implementing a different approach.

lmolkova commented 3 weeks ago

Option 1: foo.bar_baz is generated as FOO_BAR__BAZ, FooBar_Baz, ...

Cons:

lmolkova commented 3 weeks ago

Option 1.5: foo.bar_baz is generated as FOO__BAR_BAZ, Foo_BarBaz, ...

(same as Option 1 but a different format is used)

lmolkova commented 3 weeks ago

Option 2: foo.bar_baz is generated as FOO_BARBAZ, FooBarbaz, ...

Cons:

lmolkova commented 3 weeks ago

Option 3: do nothing

Cons:

lmolkova commented 3 weeks ago

Option 4: foo.bar_baz is generated as FOO_BAR_BAZ, FooBarBaz. When foo.bar.baz is added, the collision is detected, so the new attribute is called FOO_BAR_BAZ_NEW, FooBarBazNew.

Cons:

The rename can be done for the old attribute, resulting in Option 3.

marcalff commented 3 weeks ago

When 2 different symbols in semantic conventions generate the same symbol on the generated code, that is, when there is a collision, one of the name mapping has to change, and this change is by definition a breaking change: code instrumented that refers to a given name has to be adjusted to use the adjusted name instead.

To minimize the overall impact, it is preferable that only symbols that are the least frequent gets to be broken.

Any solution that changes the output for foo.bar to be different, which is the most common case, should be discarded.

So, solution 1.5 is definitively out in my opinion, it will break everything.

Assuming semantic conventions containing the _ character are relatively less represented (in all the semconv that exists), the change should be on how _ is represented, to minimize the overall impact, without touching ..

~Option 2 is out, it collides on foo.bar_baz and foo.barbaz.~

[Edit] Looks like I misunderstood option 2. Sounds viable if these collisions are forbidden, and it avoids causing renames with the current generated code. +1 then

Option 4 does not scale in my opinion, after several renames, and it assumes there is some "state" stored somewhere to remember if a collision ever existed in the past or not. Does not sound viable.

Option 3 does not resolve the issue, this is bound to be repeated. For new semantic conventions, a desirable goal is to be able to decide the most natural name for a given semconv, without having to look at possible collisions, that would prevent to use a good name ... especially when the colliding semconv is deprecated and will eventually be removed over time.

The only viable solution looks like option 1 or option 2.

[EDIT 2024-06-11]

See newer proposal as option 5 in this thread, which is better to 1 and 2 in my opinion.

codeboten commented 3 weeks ago

Option 2 seems the preferable approach, but just to confirm this means that any semconv generated code after this is implemented will cause a breaking change with previous versions as the variables will have been renamed, correct?

I'm specifically trying to understand what this looks like in the context of the message client id change, does option 2 mean that the client ID change will be rolled back?

austinlparker commented 3 weeks ago

Option 2: foo.bar_baz is generated as FOO_BARBAZ, FooBarbaz, ...

  • renames that only remove or add _ are not allowed even for experimental attributes.
  • _ is removed during code generation

Cons:

  • limited rename options. Partially mitigated by the fact that we didn't recall making such renames in the past.

just to check, does that mean that messaging.client_id -> messaging.client.id gets rolled back and there's no change?

lmolkova commented 3 weeks ago

According to the Option2, there will be no collision for messaging.client_id -> messaging.client.id:

So I don't think we need to roll it back.

Option 2 seems the preferable approach, but just to confirm this means that any semconv generated code after this is implemented will cause a breaking change with previous versions as the variables will have been renamed, correct?

That's correct. Any option we pick (except opt3) will result in breaking changes in generated code. In most languages this is tolerable since semconv artifact is experimental. So far we've identified JS and PHP which shipped stable semconv artifacts and JS was going to make some breaking changes anyway. It will be a problem for PHP.

joaopgrassi commented 3 weeks ago

Another point that in this specific case: The new attribute messaging.client.id needs to produce a breaking change. We do not want to have the old const name carry the new attribute key value. That would cause instrumentations to send data in a mixed "schema" where some attributes are old and some are using the new keys.

To me, option 2 is the one that makes the most sense.

marcalff commented 3 weeks ago

Another point that in this specific case: The new attribute messaging.client.id needs to produce a breaking change.

It will not.

Existing code using MESSAGING_CLIENT_ID or MessagingClientId, corresponding to messaging.client_id, will need to be fixed to use MESSAGING_CLIENTID or MessagingClientid instead, once code generation is fixed.

joaopgrassi commented 3 weeks ago

Ah yes, that for sure. I think I didn't do a good job explaining, but what I was after is: whatever we do, the old const name can't carry the new attribute name. Because as you said, that wouldn't break, and be even worse because it would start sending the new attribute name.

marcalff commented 3 weeks ago

See my previous comment:

When 2 different symbols in semantic conventions generate the same symbol on the generated code, that is, when there is a collision, one of the name mapping has to change, and this change is by definition a breaking change: code instrumented that refers to a given name has to be adjusted to use the adjusted name instead.

I don't think there is any way around this.

Failure to update the name will use the new semantic convention, so using MESSAGING_CLIENT_ID or MessagingClientId will mean messaging.client.id while it meant messaging.client_id before.

joaopgrassi commented 3 weeks ago

You're right, seems I also misunderstood option 2. I somehow thought the new one would be changed and not be used automatically, I see it now.

I think this is rather bad :(. It will be hard to not overlook something and end up with this wrong situation of old instrumentation producing/using the new attributes while not fully yet converted to the stable conventions.

The only way to not have this, would be to break both old and new generated consts. But breaking the mapping of . to _ in code gen is also bad.

dyladan commented 3 weeks ago

So far we've identified JS and PHP which shipped stable semconv artifacts and JS was going to make some breaking changes anyway.

In JS we're just marking everything currently exported by the package as deprecated and keeping it until we go to 2.0. We're exporting all the new names with a new style anyway to be friendlier to tree shakers and code minifiers (important in JS) so any breaking name changes due to a change in code generation are fine. If it happens again we'll just rev the major version as this package is entirely separate from the rest of the JS client.

In short: option 2 is fine for JS and is my preference

dyladan commented 3 weeks ago

Here's what the option 2 codegen looks like in JS https://github.com/open-telemetry/opentelemetry-js/pull/4690/commits/dbb8328ddc072adf5bd32768fe28529328135f53

note: we aren't merging any changes until this issue is resolved, this is just to prototype and make sure we're ready to move quickly when a final decision is made.

lmolkova commented 3 weeks ago

There is an overwhelming support for option 2:

We're working on the tooling updates and have a couple of draft PRs which show how things would look like:

Before we ship tooling update, we'd like to get any last-minute feedback from everyone interested, specifically @lzchen, @jack-berg, @trask on the above PRs.

jack-berg commented 3 weeks ago

Discussed at today's Java SIG and option 2 was the preference. Thanks!

lmolkova commented 3 weeks ago

Option 2 does not look good in the real life:

https://github.com/open-telemetry/semantic-conventions-java/pull/75

AWS__ELASTIC__BEAN__STALK, etc would be more readable.

I'd like to go back and explore Option 1 (__) or Option 3 (do nothing - _ and . are the same in the code).

Option 3+ may look like:

We can also prohibit . <-> _ renames for all attributes including experimental.

lmolkova commented 3 weeks ago

Here's more details on the Option 3.5:

Pros:

Cons:

Prototypes:

dyladan commented 2 weeks ago

I think option 3.5 with prohibiting . <-> _ for all including experimental would be my preferred solution as it generates the nicest code and retains compatibility with the current generation. I think it is important to include that restriction in order to avoid telemetry shape changing without user intervention and the mentioned name and schema version mismatch.

The biggest question IMO would be then what to do about the collisions that have already happened. We can probably just accept the renames that have already happened as an accident of history.

trisch-me commented 2 weeks ago

i'm feeling prohibiting . <-> _ will limit our option to extract a namespace out of existing attributes in the future.

consider x.user_id and x.user_name which should become x.user.id and x.user.name This is especially relevant for embedded feature where we could embed existing namespaces/fields under other namespaces

marcalff commented 2 weeks ago

Here's more details on the Option 3.5:

* allow collisions between deprecated and non-deprecated attributes, ignoring deprecated.

I don't understand how this would work, what is the code generation supposed to do then:

To expand on this, if the plan is to allow deprecated and non-deprecated semconv to coexist when there is a collision, "ignoring the deprecated", how about removing (instead of deprecating) the old semconv ... removal will surely resolve the collision problem.

trask commented 2 weeks ago

I'll take this (back) to the Java SIG on Thursday.

I suspect that we may prefer this trade off:

note that Java already recommends that libraries make copies of experimental attributes in order to avoid the diamond dependency problem:

Generated code for experimental semantic conventions.
NOTE: This artifact has the -alpha and comes with no compatibility guarantees. Libraries can use this for testing, but should make copies of the attributes to avoid possible runtime errors from version conflicts.

see https://github.com/open-telemetry/semantic-conventions-java/issues/50#issuecomment-1998332270 for a bit more background on this recommendation:

  • opentelemetry-semconv-incubating contains stable and incubating semantic conventions
    • Classes live in io.opentelemetry.semconv.incubating package. Note this is different than stable artifact to make java module system work.
    • Stable attributes will be annotated @deprecated with javadoc pointing to the equivalent location in the stable artifact
    • The semantic-conventions repo will keep incubating attributes around indefinitely (or for a long time) and mark as deprecated. These deprecated, experimental attributes will be annotated @deprecated.
    • This artifact will always be marked alpha, since we may eventually remove some deprecated experimental attributes, and since the types of experimental attributes are subject to change.
    • We will discourage library authors from using this artifact. They can use it for making assertions in tests, but should make copies of any experimental attributes they depend on in instrumentation.
lmolkova commented 2 weeks ago

Discussed at Semconv and Maintainers calls:

One of the possible solutions includes a grace period during which _ <-> . are allowed.

lmolkova commented 2 weeks ago
List of existing (non-deprecated) attributes with _
  • k8s.container.restart_count
  • k8s.container.status.last_terminated_reason
  • aws.request_id
  • aws.dynamodb.table_names
  • aws.dynamodb.consumed_capacity
  • aws.dynamodb.item_collection_metrics
  • aws.dynamodb.provisioned_read_capacity
  • aws.dynamodb.provisioned_write_capacity
  • aws.dynamodb.consistent_read
  • aws.dynamodb.attributes_to_get
  • aws.dynamodb.index_name
  • aws.dynamodb.global_secondary_indexes
  • aws.dynamodb.local_secondary_indexes
  • aws.dynamodb.exclusive_start_table
  • aws.dynamodb.table_count
  • aws.dynamodb.scan_forward
  • aws.dynamodb.total_segments
  • aws.dynamodb.scanned_count
  • aws.dynamodb.attribute_definitions
  • aws.dynamodb.global_secondary_index_updates
  • aws.lambda.invoked_arn
  • aws.s3.copy_source
  • aws.s3.upload_id
  • aws.s3.part_number
  • gen_ai.system
  • gen_ai.request.model
  • gen_ai.request.max_tokens
  • gen_ai.request.temperature
  • gen_ai.request.top_p
  • gen_ai.response.id
  • gen_ai.response.model
  • gen_ai.response.finish_reasons
  • gen_ai.usage.prompt_tokens
  • gen_ai.usage.completion_tokens
  • gen_ai.token.type
  • gen_ai.prompt
  • gen_ai.completion
  • gen_ai.operation.name
  • user.full_name
  • opentracing.ref_type
  • container.image.repo_digests
  • container.command_line
  • container.command_args
  • aspnetcore.rate_limiting.policy
  • aspnetcore.rate_limiting.result
  • aspnetcore.routing.is_fallback
  • aspnetcore.request.is_unhandled
  • aspnetcore.routing.match_status
  • log.file.name_resolved
  • log.file.path_resolved
  • android.os.api_level
  • http.request.method_original
  • http.request.resend_count
  • http.response.status_code
  • session.previous_id
  • url.registered_domain
  • url.top_level_domain
  • otel.status_code
  • otel.status_description
  • tls.client.certificate_chain
  • tls.client.not_after
  • tls.client.not_before
  • tls.client.server_name
  • tls.client.supported_ciphers
  • tls.next_protocol
  • tls.server.certificate_chain
  • tls.server.not_after
  • tls.server.not_before
  • db.cassandra.consistency_level
  • db.cassandra.page_size
  • db.cassandra.speculative_execution_count
  • db.cosmosdb.client_id
  • db.cosmosdb.connection_mode
  • db.cosmosdb.operation_type
  • db.cosmosdb.request_charge
  • db.cosmosdb.request_content_length
  • db.cosmosdb.status_code
  • db.cosmosdb.sub_status_code
  • db.elasticsearch.path_parts
  • user_agent.original
  • user_agent.name
  • user_agent.version
  • cloud.resource_id
  • cloud.availability_zone
  • feature_flag.key
  • feature_flag.provider_name
  • feature_flag.variant
  • faas.max_memory
  • faas.invoked_name
  • faas.invoked_provider
  • faas.invoked_region
  • faas.invocation_id
  • heroku.release.creation_timestamp
  • rpc.connect_rpc.error_code
  • rpc.connect_rpc.request.metadata
  • rpc.connect_rpc.response.metadata
  • rpc.grpc.status_code
  • rpc.jsonrpc.error_code
  • rpc.jsonrpc.error_message
  • rpc.jsonrpc.request_id
  • rpc.message.compressed_size
  • rpc.message.uncompressed_size
  • process.parent_pid
  • process.session_leader.pid
  • process.group_leader.pid
  • process.command_line
  • process.command_args
  • process.real_user.id
  • process.real_user.name
  • process.saved_user.id
  • process.saved_user.name
  • process.context_switch_type
  • process.paging.fault_type
  • cloudevents.event_id
  • cloudevents.event_source
  • cloudevents.event_spec_version
  • cloudevents.event_type
  • cloudevents.event_subject
  • gcp.cloud_run.job.execution
  • gcp.cloud_run.job.task_index
  • os.build_id
  • system.cpu.logical_number
  • messaging.batch.message_count
  • messaging.destination_publish.anonymous
  • messaging.destination_publish.name
  • messaging.message.conversation_id
  • messaging.rabbitmq.destination.routing_key
  • messaging.rabbitmq.message.delivery_tag
  • messaging.rocketmq.client_group
  • messaging.rocketmq.consumption_model
  • messaging.rocketmq.message.delay_time_level
  • messaging.rocketmq.message.delivery_timestamp
  • messaging.gcp_pubsub.message.ordering_key
  • messaging.gcp_pubsub.message.ack_id
  • messaging.gcp_pubsub.message.ack_deadline
  • messaging.gcp_pubsub.message.delivery_attempt
  • messaging.servicebus.message.delivery_count
  • messaging.servicebus.message.enqueued_time
  • messaging.servicebus.destination.subscription_name
  • messaging.servicebus.disposition_status
  • messaging.eventhubs.message.enqueued_time

Some examples where _ -> . could make sense:

...

I believe we don't always know whether more attributes about something (e.g. request) are expected and default to adding an attribute with _.

Some vague semconv guidance could be:

marcalff commented 2 weeks ago

Changing code generation rules in general for _ will affect a lot of existing, non deprecated, non colliding semantic conventions, which is not desirable.

OPTION 5

How about:

For example:

Taking C++ as an example, code generation will look like:

  kMessagingClientId = "messaging.client.id"; // new semconv
  kMessagingClientIdDeprecated = "messaging.client_id"; // old semconv.

[EDIT 2024-06-11]

Alternate example, if we want to change the new name instead:

Taking C++ as an example, code generation will look like:

  kMessagingClientId2 = "messaging.client.id"; // new semconv
  kMessagingClientId = "messaging.client_id"; // old semconv.

The major benefit I see is that existing semconv like:

are unchanged.

Only symbols that actually collide need special treatment, and only user code that depends on these symbols needs to change (to use kMessagingClientIdDeprecated if this is really what the instrumentation wants).

What generate_as provides, is to decouple symbol names from semantic conventions values, so resolving a conflict in values (due to differences only with _ and .) does not force to revise code generation rules names for everything, impacting many unrelated symbols (all the existing semconv that contains a _ today).

marcalff commented 2 weeks ago

@lmolkova Please see proposal for option 5 above.

lmolkova commented 2 weeks ago

@marcalff thanks!

It does not make send to generate the deprecated attribute differently - nobody will update their code to point to a deprecated attribute - we don't want them to.

I.e. having generate_as would only make sense on the new attribute. But also we don't want to change how the new attribute is generated in the stable conventions.

But I really like the idea of manual resolution:

More context on this here: https://github.com/open-telemetry/semantic-conventions-java/pull/75#discussion_r1630446636

and here's the prototype: https://github.com/open-telemetry/semantic-conventions-java/pull/76 - effectively there is a way to remap constant name and/or remove attribute.

TL;DR: SIGs decide to drop the deprecated attribute or assign a different constant name - they pick a new constant name in case of a collision.

marcalff commented 2 weeks ago

TL;DR: SIGs decide to drop the deprecated attribute or assign a different constant name - they pick a new constant name in case of a collision.

I don't understand.

I was hopping for a solution where the alternate name comes from metadata added in the semantic convention for collisions, so that the code generation scripts do not have to be changed all the time to account for special cases as they arise.

Now it looks like each SIG, in their code generation scripts, it to add code for special cases.

Is the conclusion really that SIGs are on their own and must fix collisions created "upstream" in semantic conventions ?

In this case, opentelemetry-cpp already complies: it drops the deprecated messaging.client_id.

lmolkova commented 2 weeks ago

I was hopping for a solution where the alternate name comes from metadata added in the semantic convention for collisions, so that the code generation scripts do not have to be changed all the time to account for special cases as they arise.

Assuming there is an alternative constant name in the schema, would you drop the attribute or would you generate both attributes (with different names) in C++?

The prescriptive solution coming from semconv would be:

Jinja scripts would need to be changed to either drop based one collides_with or use constant name from the unque_const_name (converting it to proper case).

Now it looks like each SIG, in their code generation scripts, it to add code for special cases.

The templates need to be changed now to support dropping/renaming. If a collision happens again, the change would be to add another attribute name to the list of excluded attributes (or to a manually maintained map). E.g. this change is java https://github.com/open-telemetry/semantic-conventions-java/pull/76 adds all you need to either drop or rename constants by making a small config change.

Is the conclusion really that SIGs are on their own and must fix collisions created "upstream" in semantic conventions ?

I think it's a path forward and not a conclusion. We need need to add checks to semconv to know when they happen before we release. We should probably block such renames until we have a reasonable conclusion.

Manual resolution seems like an easy and cheap thing to do to solve this and similar collisions in the same way. We can automate it, but it's still not clear to me what are all the options we need to provide.

Note: We are migrating tooling from build-tools to weaver. Code generation migration which would involve some jinja updates and this would be a good time to add the `final' solution.

TL;DR:

lmolkova commented 1 week ago

Discussed at Semconv WG 6/17

So: