open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
231 stars 146 forks source link

DB metrics: add query-level metrics #723

Open dnaik-sfx opened 5 months ago

dnaik-sfx commented 5 months ago

Currently, there are some conventions for connection-level metrics defined on the Database Metrics page. Proposing that query-level metrics are added (in a different section on the same page) to define conventions for the stats and info that can be captured for individual queries.

Some examples could include:

db.query.count (number of times a query was executed) db.query.plan (execution plan for a query) db.query.duration (amount of time taken by the query) db.query.rows_returned (number of rows returned by a query)

trask commented 5 months ago

related to #512

dnaik-sfx commented 5 months ago

@trask Any objections to me putting up a PR for this, or is this being actively discussed by the DB Client working group? I did see there was some discussion around common DB attributes, which I imagine might be relevant for these metrics as well (i.e, do we create common query metrics conventions like the examples I gave, or database-specific query metrics).

trask commented 5 months ago

@dnaik-sfx does #735 address your needs?

dnaik-sfx commented 5 months ago

We'd like to be able to collect database metrics like duration, count, rows returned, etc... at the query/statement level. That PR does add a new metric on duration, but it seems that db.statement is being left out of the initial set of attributes; to get the duration information for a given query, that attribute would be needed, but the metric itself would likely suit our needs if that attribute was added. We would also want to add the other metrics mentioned above and some others though (these would also need db.statement or a similar attribute).

I realize there are concerns about the high cardinality when adding db.statement as an attribute on metrics, but I was thinking that the metrics themselves would either be optional (if there is no value in collecting them without tying them to a query/statement), or the db.statement attribute itself would be optional.

dnaik-sfx commented 4 months ago

@trask Sorry to bother you again, but is this (or related topics) being, or planned to be, discussed by the working group? And if so, is there somewhere I can go to follow the discussion (if not here)? Also happy to put up a PR for this myself and help drive it if this would be beyond the scope of what the working group was intended to accomplish.

trask commented 3 months ago

hey @dnaik-sfx, sorry for the slow response, this issue is on the working group's project board, we are planning to discuss it prior to stabilization, but we may push it post stabilization if we think it can be added after stabilization without introducing any breaking changes

a couple thoughts/questions in the meantime...

would you still want these two:

if db.statement is an optional attribute on db.client.operation.duration? (since it's a histogram, you can get counts)

I could see db.query.rows_returned being added, e.g. db.client.operation.rows_returned, it's similar to http.client.response.body.size

for db.query.plan, where would you get the query plan? is this something you're thinking to collect in client instrumentation or some other way?

seonsfx commented 3 months ago

Hi @trask , thanks for responding. Since we just learned there's an upcoming change to add db.client.operation. metrics, the more important question would be "Are db.query. and db.client.operation. the same thing?" or more precisely, "Is db.query a subset of db.client.operation.?".

If the answer is yes, we could be adding more metrics in the db.client.operation. namespace instead of creating another. If not, replacing db.query.duration with db.client.operation.duration wouldn't make sense and make the metric set incomplete.

I feel pretty confident that there's no reason why we cannot go with db.client.operation. metrics but let us go through the PR and the proposed metric. Also please feel free to let us know what you think.

trask commented 2 months ago

@seonsfx db.client.operation.duration includes all database operations, including queries

trask commented 2 months ago

Moving this to post-stability since we can add new metrics later, but we can continue discussing and clarifying the issue in the meantime.

dnaik-sfx commented 2 months ago

@trask thanks! What does the current timeline look like for stability? Wondering what the timeframe might look like for these query-level metric conventions to go in.

Given the db.client.operation convention, I think the following would be the full list of metric conventions we would want to add (and eventually collect):

Some of these are kind of specific to a given type of database, but it could still be worth capturing them as a shared convention in case smaller or future technologies share the underlying concepts. Open to feedback about where these conventions should be added though. As far as collection goes, we envision typically collecting them from system tables depending on the database implementation. For example, MySQL has the Performance Schema with tables providing statistics about queries that ran in the past.

I'm happy to put up a PR for these metrics, as well as the attributes that would be desired on them, to discuss specifics and in further depth.

dnaik-sfx commented 1 month ago

@trask I know work towards stability is still ongoing, but any thoughts about the metrics and collection method I listed above and any possible conflicts with patterns/metrics being added now?

dnaik-sfx commented 3 weeks ago

Put up a PR for the sake of discussion (with the understanding that work towards stability is still underway and these additional metrics would come after)