Open lmolkova opened 7 months ago
Option 2: capture both db.operation.name and db.operation.names (same for collections and namespaces).
I wonder if we are over-engineering here. Taking SQL statements as an example, they can be parsed into an AST which has a top-level statement, which tells you whether you SELECT
, INSERT
, DROP
, CREATE
... It's important to have this top-level statement in db.operation.name
, as it tells you about the effect the operation has.
I'm not sure about db.collection.name
. However, we should avoid a situation where we define a field as an array, which in 95% of all cases only has a single value.
Another thought I have is that what we actually want is to capture duration per query template. Operation/collection/db names on the measurement is a proxy to identify a subset of queries.
So we want to have a metric for a thing that has high-cardinality in general case (query template) which seems to be impossible.
Ways to limit the cardinality explored above:
SELECT * from foo
is represented exactly as SELECT * from foo JOIN bar on baz
)What if we give up on a general case support in default experience?
Option 4:
db.query.text
. We can templatize even further to something like SELECT ? from users where ?
or even SELECT ? from users where user_id = ?
Pros: explicit choice for users who don't expect high cardinality Cons: perf, still no solution for apps that expect high-cardinality queries
Option 5:
db.query.name
(e.g. SQL commenter, or in some other way) like get user by user id
or get user address by user name
Pros:
Cons:
so the proposal (Options 3, 4, and 5 combined):
db.query.name|template|alias
attribute that'd capture low-to-medium-cardinality name when it's available
{operation[i]} ? from {namespace[i]}.{collection[i]}
(maybe with where
if present). It'd result in the same cardinality as array attributes, but much better experiencedb.operation.name
, db.collection.name
, db.collection.namespace
ONLY if there is one operation (table or db involved). E.g.:
select * from mydb.foo join mydb.bar on baz
would result in only the db.collection.namespace=mydb
collectedAssuming we could collect operation/collection/namespace by default, we could consider {operation[i]} ? from {namespace[i]}.{collection[i]}
joining trick for the default experience as well
It seems the dynamodb (which btw don't live under the db.
namespace 😓) use an array attribute for the tables aws.dynamodb.table_names
Taking SQL statements as an example, they can be parsed into an AST which has a top-level statement, which tells you whether you SELECT, INSERT, DROP, CREATE ... It's important to have this top-level statement in db.operation.name, as it tells you about the effect the operation has.
I agree. What's the rational is for considering JOIN
a operation?
More complicated queries involve multiple operations, tables, or even databases. E.g. in
SELECT * from foo JOIN bar ON baz
we have two operations (SELECT
andJOIN
), two tables (foo
andbar
), and just one database.
I agree with this proposal, but think that we should best effort populate db.operation.name
with a single operation even for the complex case. Fundamentally, selects with joins, or selects to insert are all ultimately doing one thing: either selecting data to return, or updating, or inserting, or deleting.
@lmolkova is this resolved now that db.collection.name
and db.operation.name
are specified as the "first found in the query" (when parsed from db.query.text
)?
Reopening based on the community feedback (thanks a lot @pellared for bringing it up).
Discussed and tentatively agreed on the following approach
Non-query operations:
findAndModify foo
findAndModify
foo
Query-based operations:
multiple operations
{ db.query.synthetic }
SELECT foo, DELETE bar, INSERT baz
- recommended on metricsdb.collection.name
or db.operation.name
on spans or metricssingle operation
{ db.query.synthetic }
or {operation} {collection}
which are the sameSELECT foo
- recommendedSELECT
foo
For a simple database queries (such as
SELECT * from foo where bar="baz"
), we'd like to capture the following attributes when possible (on spans):db.operation.name = SELECT
db.collection.name = foo
(akadb.sql.table
and other similar system-specific attributes)db.collection.namespace = mydb
(akadb.name
along withdb.instance.id
and similar)db.query.text = SELECT * from foo where bar=?
(akadb.statement
)db.query.parameter.bar=baz
(attribute names are being discussed and are not final).
This simple case is supported by current version of DB semantic conventions. It's also a common one in the NoSQL world if we exclude bulk operations (or non-homogeneous batch operations). Attributes (except
db.query.*
ones) have reasonable cardinality and can be used on traces and metrics.More complicated queries involve multiple operations, tables, or even databases. E.g. in
SELECT * from foo JOIN bar ON baz
we have two operations (SELECT
andJOIN
), two tables (foo
and`bar
), and just one database.db.query.text
anddb.query.parameter.*
for such queries are still relevant and make sense on spans (still cardinality is a problem for metrics)DB WG is considering multiple options:
Option 1: always capture
db.operation.names
,db.collection.names
,db.collection.namespaces
as arraysPros: consistent understandable model
Cons:
db.query.text
Option 2: capture both
db.operation.name
anddb.operation.names
(same for collections and namespaces).The array attributes are only captured when more than one operation is performed. In this case we can entertain different options for
db.operation.name
- it may contain the first operation, operations joined as string, or shouldn't be reported at all.Pros:
db.operation.name
contains joined listSELECT JOIN
)Cons:
db.query.text
Option 3: don't capture multiple operation names, collection names, namespaces
Pros: simple case is easy Cons: nothing distinguishes different operations in the complex case
There could be other options including opting into collecting templatized query string on metrics, but none of those is perfect. Still, we'd like to provide a default experience which could be improved with users providing query nick-names (see https://github.com/open-telemetry/semantic-conventions/issues/521 for the context).
Additional context
{[\"SELECT\", \"JOIN\"]}
making them even harder to use. Space (or comma) delimited list would be nicer ([SELECT, JOIN]
)