open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
268 stars 173 forks source link

How to record multi-operation/table/dbs operations on DB metrics #805

Open lmolkova opened 7 months ago

lmolkova commented 7 months ago

For a simple database queries (such as SELECT * from foo where bar="baz"), we'd like to capture the following attributes when possible (on spans):

(attribute names are being discussed and are not final).

This simple case is supported by current version of DB semantic conventions. It's also a common one in the NoSQL world if we exclude bulk operations (or non-homogeneous batch operations). Attributes (except db.query.* ones) have reasonable cardinality and can be used on traces and metrics.


More complicated queries involve multiple operations, tables, or even databases. E.g. in SELECT * from foo JOIN bar ON baz we have two operations (SELECT and JOIN), two tables (foo and `bar), and just one database.

DB WG is considering multiple options:


Option 1: always capture db.operation.names, db.collection.names, db.collection.namespaces as arrays

Pros: consistent understandable model

Cons:


Option 2: capture both db.operation.name and db.operation.names (same for collections and namespaces).

The array attributes are only captured when more than one operation is performed. In this case we can entertain different options for db.operation.name - it may contain the first operation, operations joined as string, or shouldn't be reported at all.

Pros:

Cons:


Option 3: don't capture multiple operation names, collection names, namespaces

Pros: simple case is easy Cons: nothing distinguishes different operations in the complex case

There could be other options including opting into collecting templatized query string on metrics, but none of those is perfect. Still, we'd like to provide a default experience which could be improved with users providing query nick-names (see https://github.com/open-telemetry/semantic-conventions/issues/521 for the context).


Additional context

pyohannes commented 7 months ago

Option 2: capture both db.operation.name and db.operation.names (same for collections and namespaces).

I wonder if we are over-engineering here. Taking SQL statements as an example, they can be parsed into an AST which has a top-level statement, which tells you whether you SELECT, INSERT, DROP, CREATE ... It's important to have this top-level statement in db.operation.name, as it tells you about the effect the operation has.

I'm not sure about db.collection.name. However, we should avoid a situation where we define a field as an array, which in 95% of all cases only has a single value.

lmolkova commented 7 months ago

Another thought I have is that what we actually want is to capture duration per query template. Operation/collection/db names on the measurement is a proxy to identify a subset of queries.

So we want to have a metric for a thing that has high-cardinality in general case (query template) which seems to be impossible.

Ways to limit the cardinality explored above:

What if we give up on a general case support in default experience?

Option 4:

Pros: explicit choice for users who don't expect high cardinality Cons: perf, still no solution for apps that expect high-cardinality queries

Option 5:

Pros:

Cons:

lmolkova commented 7 months ago

so the proposal (Options 3, 4, and 5 combined):

Assuming we could collect operation/collection/namespace by default, we could consider {operation[i]} ? from {namespace[i]}.{collection[i]} joining trick for the default experience as well

joaopgrassi commented 7 months ago

It seems the dynamodb (which btw don't live under the db. namespace 😓) use an array attribute for the tables aws.dynamodb.table_names

jack-berg commented 7 months ago

Taking SQL statements as an example, they can be parsed into an AST which has a top-level statement, which tells you whether you SELECT, INSERT, DROP, CREATE ... It's important to have this top-level statement in db.operation.name, as it tells you about the effect the operation has.

I agree. What's the rational is for considering JOIN a operation?

More complicated queries involve multiple operations, tables, or even databases. E.g. in SELECT * from foo JOIN bar ON baz we have two operations (SELECT and JOIN), two tables (foo and bar), and just one database.

I agree with this proposal, but think that we should best effort populate db.operation.name with a single operation even for the complex case. Fundamentally, selects with joins, or selects to insert are all ultimately doing one thing: either selecting data to return, or updating, or inserting, or deleting.

trask commented 6 months ago

@lmolkova is this resolved now that db.collection.name and db.operation.name are specified as the "first found in the query" (when parsed from db.query.text)?

lmolkova commented 2 weeks ago

Reopening based on the community feedback (thanks a lot @pellared for bringing it up).

Discussed and tentatively agreed on the following approach

Non-query operations:

Query-based operations: