Open alanwest opened 12 months ago
In the messaging space we have a similar scenario, there we settled on the more or less generic term destination, which covers both queues and topics.
We also ended up making messaging.destination
a namespace. There's messaging.destination.name
, and then there are further attributes to describe the destination. Maybe something similar makes sense if one decides to go with the db.target
approach.
Perhaps parsing
db.statement
can be an opt-in feature?
FWIW the javaagent already does that. We have a query sanitization feature that replaces all the query variables with ?
. It uses a performant tokenizer that scans the query looking for the interesting parts: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api-semconv/src/main/jflex/SqlSanitizer.jflex
Since we're already iterating over the query once, we figured we might as well try to extract the table name while we're at it.
Currently, there exists database system specific attributes for describing this. Examples include:
Cosmos DB also has a custom attribute for this called db.cosmosdb.container
. Standardizing on a term for the semantic convention makes sense to me.
Perhaps parsing
db.statement
can be an opt-in feature?
Wouldn't the source library be responsible for populating the db.target
, so each library would presumably only need to know how to parse its own db.statement
(vs. the APM tool doing it)? Cosmos DB currently doesn't populate db.statement
at all, and it doesn't apply to most of our operations.
This is one of our biggest barriers from moving from app insights to otel. App insights includes the whole statement, we use mainly stored procs though, and where we don't its parameterised query strings. It would cause cardinality issues if people are injecting values into the query. But I think people shouldn't be doing this anyway.
I don't see the value of the target begin the table alone. Think about what you need to debug a problem. If you put only a table then these two queries will be grouped together
SELECT * FROM users
SELECT name FROM users WHERE userid = @userid
They will have drastically different performance right? So why group them?
I think adding the whole query is the way to go imo.
It would cause cardinality issues if people are injecting values into the query. But I think people shouldn't be doing this anyway.
I believe even properly parameterized statements are too high-cardinality for metrics, e.g. it's not uncommon to have 100's of tables
If you don't have the query. I would question the use of the metric. What are you measuring?
Naming idea from semconv meeting: db.entity
Related: "entity" term may be used for identifying resource attributes
Compiling the query syntax docs from a sampling of the different db systems we reference:
from_item
table_references
index
collection
TableName
table_name
index
container
db.target
seems like a decent choice.
I also like db.collection
, since it implies a collections of rows, columns, records, documents, entries in an index, etc.
You could also be calling a stored procedure too
If you don't have the query. I would question the use of the metric. What are you measuring?
One could measure all db.operation
in a certain db.target
- say "Give me a count of all SELECT
in target MYTABLE
. Or "Give me a count of all EXEC
in target MY_SPROC
". If that's useful, that's another question.
The statement in a metric makes not much sense, but makes total sense in spans.
Assuming I do cross-table query, would db.operation
and db.target
be arrays?
Also, it'd be great to have it as opt-in on metrics, but array attribute on metrics is questionable.
I.e.
db.target
and db.operation
, but it's still far from perfect, especially if there is more than one operation/targetAn 'ideal' solution as I see it would be if users could optionally give queries a nick-name which would appear on spans and metrics.
We don't have a common way to achieve it, but if there was one and we could get this nick-name, we would put it into attribute like db.query.name
.
The propagation mechanism could look like this:
Context dbScope = Context.current();
try(Scope scope = dbScope.with(ContextKey.named("db.query.name"), "get user by id").makeCurrent()) {
// SELECT name FROM users WHERE userid = @userid
}
It could be fragile in some languages/cases and probably needs a spec change to make it possible in all languages.
What we can do now is to define such attribute, add it to metrics as conditionally_required: when available
(even though it never is). This way semconv can go stable, default experience would not be great, but we'll have means in the semconv to create a good one.
We have similar cases (messaging.destination.template, http.route on client)
Another option could be that:
assuming db.statement
can be templated, user can opt-in into collecting a limited number of db.statement
on metrics.
We could potentially extend metrics advisory params api to configure an explicit allow-list for db.statement
values.
this way cardinality would be fully in the user hands.
I think this is resolved by #870?
Proposal for new
db.target
attributeThe type of targets available for a database operation is dependent on a database system. For example, the target of a SQL database operation may be a table, function, or stored procedure. Other database systems may refer to their targets as collections or indexes, etc.
Currently, there exists database system specific attributes for describing this. Examples include:
db.sql.table
db.redis.database_index
db.mongodb.collection
db.cassandra.table
Introducing a general-purpose attribute would enable backends to provide a generic experience for faceting database operations by their target without requiring the backend have knowledge of all the database system specific attributes. It would also enable end users to more easily explore their data at a high-level without having to know the specific database systems in use.
I propose a new call-level attribute
db.target
for describing the target of a database operation. The term "target" may not be the greatest, but I'm using it for lack of a better term for the moment. I considereddb.operation.target
but of coursedb.operation
is already an attribute. Maybedb.operation
could becomedb.operation.type
thereby allowingdb.operation.target
.Issues to consider
db.operation
, the value of thedb.target
may not be readily available without parsing the value ofdb.statement
, e.g.SELECT * FROM Users
thedb.operation
isSELECT
and thedb.target
isUsers
. The specification advises against parsingdb.statement
to derive other attributes. I consider this a limitation particularly once we have a database client operation metric (see #512). Thedb.statement
attribute is fine for spans but because of its high-cardinality will not work for metrics. Therefore, havingdb.operation
anddb.target
attributes on metrics will be useful.SELECT * FROM Users U JOIN Company C ON ...
.Perhaps parsing
db.statement
can be an opt-in feature?Various backends have prior art for addressing these issues. For example:
/* nrhint:Company */ SELECT * FROM Users U JOIN Company C ON ...
.