open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.1k stars 2.39k forks source link

[receiver/sqlserverreceiver] Add query-level data that contain query text and execution plan #36462

Open XSAM opened 2 days ago

XSAM commented 2 days ago

Component(s)

receiver/sqlserver

Is your feature request related to a problem? Please describe.

Currently, this receiver only produces metrics from the instance perspective, like the number of batch requests received by the entire SQL Server instance.

But, there are no query-level metrics or logs to allow users to debug their queries. If a user wants to investigate the slow query issue, instance-level metrics are not gonna help a lot, as it does not show how a certain query is processed by the server. In this case, query-level metrics for a query that runs slowly could bring more context and greatly help the process of debugging and optimizing.

For instance, we can obtain physical reads of a single query consumed. This metric indicates disk I/O activity, and having a high number on this may suggest the query consumed too much I/O that the instance can serve, so we know where the bottleneck is and how to fix it.

Describe the solution you'd like

SQL server has an internal table sys.dm_exec_query_stats that stores query-level data. We can update the SQL server receiver to fetch the data from this table and produce corresponding metrics. Considering reducing the load when the SQL server is handling such a query from the receiver, it may be good to limit the number of queries returned, which says 200.

Concerns

However, the length of the sanitized query text and the execution plan can be very long, which exceeds the length limit of metric storage. The execution plan has a high cardinality value that does not fit into the metric storage.

To solve this problem, I want to propose sending query texts and execution plans in logs.

The query-level metrics would

For the corresponding log, it would

So, every query result fetched from the SQL server would populate a metric and a corresponding log.

Describe alternatives you've considered

No response

Additional context

We (a Splunk team) are interested in implementing this feature. But, since using logs like this is not what a common receiver would do, we would like to know whether the community can accept this or not before we work on the implementation.

Other receivers that have a similar approach:

Related

It is possible to fetch the trace_id and span_id of a query from the SQL server, but it may be in the future plan. Not in the current scope.

github-actions[bot] commented 2 days ago

Pinging code owners:

XSAM commented 2 days ago

By the way, is an integration test required for such a change?

dmitryax commented 2 days ago

Sounds reasonable to me. A number of top slow queries should be configurable. 200 may be too much for the default given that they are prone to churn and can cause big cardinality.

cc @djaglowski, we discussed this at KubeCon.

dmitryax commented 1 day ago

By the way, is an integration test required for such a change?

It’s not required, but it would be great to have them. That way, we can ensure the component’s stability. Otherwise, we can miss if some change on the receiver or SQL server itself (new version) breaks this particular functionally. This can be part of a separate issue.