Reconsider inclusion of opid in mongodb_currentop_query_uptime

nick-jones commented 10 months ago

Describe the bug

PR #706 introduced mongodb_currentop_query_uptime, which is derived from the results of db.currentOp(). The labels used here include opid. I don't know how opid is generated, but from what I've observed so far, opid appears to be distinct for every single executed operation. The general recommendation is to avoid high cardinality labels, as the inclusion of these can be problematic - https://prometheus.io/docs/practices/naming/#labels

connid may also cause problems, depending on how that is generated mongodb side and general connection patterns.

To Reproduce Run the exporter against a mongodb instance that sees at least some traffic, and observe how the number of unique mongodb_currentop_query_uptime combinations grown over time. Something like sum(count_over_time(mongodb_currentop_query_uptime[30d])) may help visualise the growth in datapoints.

Expected behavior Prometheus documentation recommends avoiding high cardinality labels, so opid should perhaps not be included.

Logs N/A

Environment

OS: Linux
environment: k8s
MongoDB version: 5.0.10

Additional context N/A

tregubov-av commented 9 months ago

Hello. This metric is not intended for plotting. The data depends on the scrape interval. You will not see requests that were completed between taking this metric. The metric is intended exclusively for building notifications that have been in the process of execution for too long. For example:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    release: kube-prometheus-stack
  name: mongodb-rules
  namespace: ${NAMESPACE}
spec:
  groups:
  - name: mongodb.rules
    rules:
    ## Slow query alerts
    # 5 minutes trigger, E-mail + Slack (critical)
    - alert: MongodbCurrentQueryTime
      labels:
        severity: critical
        database: mongo
      expr: (mongodb_currentop_query_uptime > 3e+8) / 1000
      annotations:
        summary: "{{ $labels.endpoint }}: slow query detected"
        description_email: |
          Opid: {{ $labels.opid }}<br/>
          Uptime: {{ $value }}ms<br/>
          Ns: <strong>{{ $labels.ns }}</strong><br/>
          Desc: {{ $labels.desc }}<br/>
          Op: {{ $labels.op }}<br/>
        description_slack: |
          *Opid:* `{{ $labels.opid }}`
          *Uptime:* `{{ $value }}ms`
          *Ns:* {{ $labels.ns }}
          *Desc:* `{{ $labels.desc }}`
          *Op:* {{ $labels.op }}
    # 1 minute trigger, E-mail (warning)
    - alert: MongodbCurrentQueryTime
      labels:
        severity: warning
        database: mongo
      expr: (mongodb_currentop_query_uptime > 6e+7) / 1000
      annotations:
        summary: "{{ $labels.endpoint }}: slow query detected"
        description_email: |
          Opid: {{ $labels.opid }}<br/>
          Uptime: {{ $value }}ms<br/>
          Ns: <strong>{{ $labels.ns }}</strong><br/>
          Desc: {{ $labels.desc }}<br/>
          Op: {{ $labels.op }}<br/>

Unfortunately, there is no other way to track a request that is in progress.

If you are worried about an overabundance of data in the Prometheus database, then I think the following solution may be appropriate: You can enter a variable (for example, CurrentOpSlowTime) that will limit the entries to only those time values to which we will react. https://github.com/percona/mongodb_exporter/blob/main/exporter/currentop_collector.go

for example:

if microsecs_running <= CurrentOpSlowTime {
  break
}

tregubov-av commented 9 months ago

Of course, you can refuse opid and desc, but in this case, you will have to manually look for the stuck query in the output of db.currentOp().

nick-jones commented 9 months ago

The data depends on the scrape interval. You will not see requests that were completed between taking this metric.

My original concern relates to the amount of time series that would be recorded even on the basis of regular scrapes - at least we, at the moment, scrape every 30 seconds.

In 1 week that amounts to 20,160 scrapes in total. Say every scrape we observe 500 "fresh" queries running, that amounts to 10,080,000 unique time series recorded over that time. According to https://www.robustperception.io/why-does-prometheus-use-so-much-ram/ that would amount to around 30GB of RAM usage.

Of course, you can refuse opid and desc, but in this case, you will have to manually look for the stuck query in the output of db.currentOp().

I think my original suggestion on this was somewhat misguided, as removing these labels means.. well, you somehow have to merge these values together, and just naively summing the values is not going to be helpful. Perhaps using histograms could help, but then you'd perhaps need some control over the bucket sizes.

In any case, I'm happy to close this if the consensus is towards this being a non-issue - I just understood it to be best practice to avoid high cardinality metrics.

tregubov-av commented 9 months ago

In 1 week that amounts to 20,160 scrapes in total. Say every scrape we observe 500 "fresh" queries running, that amounts to 10,080,000 unique time series recorded over that time. According to https://www.robustperception.io/why-does-prometheus-use-so-much-ram/ that would amount to around 30GB of RAM usage.

This problem may occur if you are requesting historical data. I would not recommend creating charts based on this metric. Graphs for this metric will not be informative. This metric is designed to identify the problem that is currently occurring. For example, searching on a field without an index in a very large collection.

BupycHuk commented 9 months ago

Hi @nick-jones, I'm re-openning this issue, because I think that your points make sense and we need to discuss within a team to provide a best experience to our users without blowing their storage.

tregubov-av commented 9 months ago

Hi @nick-jones, I'm re-openning this issue, because I think that your points make sense and we need to discuss within a team to provide a best experience to our users without blowing their storage.

Hi @BupycHuk I think that an acceptable solution might be to introduce a variable that limits the recording of requests whose execution time we are not interested in. For example: In this context, we are unlikely to be interested in queries that are executed faster than the seed. In our case, we are not interested in requests that take less than a minute to complete. Because such queries, although they load the database, are easy to track through a profiler or through the total execution time. And it is very difficult to catch such requests in CuretnOp(). This metric is aimed at identifying very difficult requests that are in the process of execution. I guess there is no point in processing requests that take less than a minute to complete. There should be very few such requests in a normally working database, and viewing the history of such requests should not cause problems.

tregubov-av commented 6 months ago

Possible Solution https://github.com/percona/mongodb_exporter/pull/812

nick-jones commented 5 months ago

Given #812 has been merged, I'm closing this now. Many thanks @tregubov-av

percona / mongodb_exporter

Reconsider inclusion of opid in mongodb_currentop_query_uptime #746