open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
266 stars 169 forks source link

What to do about config options & schema versions #1483

Open hannahramadan opened 1 week ago

hannahramadan commented 1 week ago

Looking for some guidance.

The OTel Ruby agent is adding a configuration option (and I believe already has some) that are tied to older semconv naming conventions. When introducing new configs options and perhaps cleaning up old ones, I'm looking for guidance on what the config should be named/ if both should be recognized.

For example, a config would exist if users wanted to receive information on the table name of sql queries. The old semantic convention for this is db.sql.table, and the new name is db.collection.name. The configuration might look like db_sql_table and db_collection_name, which the option to omit or include.

Should we support both db_sql_table and db_collection_name and have them named as such or is there a better path forward.

trask commented 1 week ago

hi @hannahramadan!

it sounds like OTEL_SEMCONV_STABILITY_OPT_IN is what you're look for, check out https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/README.md:

Existing database instrumentations that are using v1.24.0 of this document (or prior):

  • SHOULD NOT change the version of the database conventions that they emit by default until the database semantic conventions are marked stable. Conventions include, but are not limited to, attributes, metric and span names, and unit of measure.
  • SHOULD introduce an environment variable OTEL_SEMCONV_STABILITY_OPT_IN in the existing major version which is a comma-separated list of values. If the list of values includes:
    • database - emit the new, stable database conventions, and stop emitting the old experimental database conventions that the instrumentation emitted previously.
    • database/dup - emit both the old and the stable database conventions, allowing for a seamless transition.
    • The default behavior (in the absence of one of these values) is to continue emitting whatever version of the old experimental database conventions the instrumentation was emitting previously.
    • Note: database/dup has higher precedence than database in case both values are present
  • SHOULD maintain (security patching at a minimum) the existing major version for at least six months after it starts emitting both sets of conventions.
  • SHOULD drop the environment variable in the next major version.
hannahramadan commented 1 week ago

Thank you for the quick response @trask ◡̈

Outside of controlling which conventions to use for all database conventions, we want to provide users with a config option whether to send a specific attribute or not. The concern is around tying config options to a specific schema version.

Should users be able to configure, for example, both db_sql_table: omit or db_collection_name: omit and have the attribute omitted, no matter the schema version? We were weary of needing to create so many configs for both schema versions.

trask commented 1 week ago

We were weary of needing to create so many configs for both schema versions.

yeah, I'd personally suggest sticking with the simple OTEL_SEMCONV_STABILITY_OPT_IN, and users can always customize via span processors if they have really specific needs. this is what we're doing in Java at least.

joaopgrassi commented 4 days ago

It seems like OTEL_SEMCONV_STABILITY_OPT_IN is the way to go, and also agree with @trask about using others ways to filter that the user have to control/configure instead of the SDK.

@hannahramadan does that help you? Can we close this issue?

hannahramadan commented 2 days ago

Hi @trask, @joaopgrassi - We may be talking about different things or I'm misunderstanding—thanks for bearing with me!

From what I understand, OTEL_SEMCONV_STABILITY_OPT_IN is a way for users to decide which conventions they want to send: new, old, or both.

I'm curious about what should happen if someone wants to omit one specific attribute from an instrumentation via config.

In my example, collecting db.sql.table/ db.collection.name can have an impact on performance, so users should be able to turn on/off its collection. Downstream options are unideal because of the performance hit that would've already been taken.

I may be complicating the question, but what I'm wondering is it makes sense to create configs that reflect both semantic conventions. For example, both the following options should work to include this attribute (will be omitted by default), no matter what OTEL_SEMCONV_STABILITY_OPT_IN is configured like.

OpenTelemetry::SDK.configure do |c| 
  c.use 'OpenTelemetry::Instrumentation::Mysql2', { db_sql_table: :include }
end 
OpenTelemetry::SDK.configure do |c| 
  c.use 'OpenTelemetry::Instrumentation::Mysql2', { db_collection_name: :include }
end 
trask commented 2 days ago

In my example, collecting db.sql.table/ db.collection.name can have an impact on performance, so users should be able to turn on/off its collection. Downstream options are unideal because of the performance hit that would've already been taken.

is this because you are parsing the sql statements?

if so, Java also has that concern also, and so we have a configuration property that can be used to opt-out of parsing (and sanitization): otel.instrumentation.jdbc.statement-sanitizer.enabled

I would definitely support something standardized for opt-ing out of database query parsing / sanitization

hannahramadan commented 1 day ago

@trask - yes exactly. We are parsing sql statements.

To confirm, Java has a single config called statement-sanitizer, and in addition to sql sanitization, it also controls the parsing of sql for db.sql.table/ db.collection.name (and presumably anything else that needs to parse sql)?

trask commented 1 day ago

To confirm, Java has a single config called statement-sanitizer, and in addition to sql sanitization, it also controls the parsing of sql for db.sql.table/ db.collection.name (and presumably anything else that needs to parse sql)?

correct (though not saying it's necessarily the best way of doing it, I think there's a bit of historical bias there b/c we started with it only affecting statement sanitization, and later we started extracting table and operation names at the same time as sanitization)