justwatchcom / sql_exporter

Flexible SQL Exporter for Prometheus.
MIT License
391 stars 106 forks source link

Ability to run queries in batch size Postgres #134

Closed rezonick closed 1 week ago

rezonick commented 1 month ago

Hi,

We use the following database discovery support for Postgres (https://github.com/justwatchcom/sql_exporter/pull/124). We have many databases and some queries, and executing them simultaneously for all the defined databases causes a high CPU load on our cluster, and impacts performance and QoS. Currently, we're working on optimizing these queries. can we limit the exporter to not run all the queries simultaneously, but in some batch sizes?

e.g:

    - name: "partition-count"
      cron_schedule: "40 2 * * *"  # Every day at 2:40 AM
      connections:
      - 'postgres://{{POSTGRES_USER}}:{{POSTGRES_PASSWORD}}@{{POSTGRES_URL_PROD}}:5432/include:tenant_*?sslmode=disable'
      queries:
      - name: "partition_count"
        help: "Count of partitions for each partitioned table"
        labels:
          - "parent_table"
        values:
          - "partition_count"
        query:  |
                SELECT
                    pc.parent_table,
                    COALESCE((
                        SELECT count(partition_tablename)
                        FROM partman.show_partitions(p_parent_table => pc.parent_table, p_include_default => true)
                    ), 0) AS partition_count
                FROM
                    partman.part_config pc;

We have many databases tenant_* and we don't want to query all of these at once.

marevers commented 1 month ago

Hi @rezonick , my open PR #132 kind of does what you're asking. It allows you to retrieve a list of values using a certain query and then iterate over those values executing a query for each one. In your case you could retrieve your list of databases through the iterator query and then run it for each database separately.

rezonick commented 1 month ago

Hi @marevers,

Thank you for your help. Just to clarify, I am currently able to query all the databases using the include feature that supports the regex syntax *:

'postgres://{{POSTGRES_USER}}:{{POSTGRES_PASSWORD}}@{{POSTGRES_URL_PROD}}:5432/include:tenant_*?sslmode=disable'

However, this approach fires all the queries simultaneously for all open connections towards the tenant_* databases.

From what I understand, using the iterate feature mentioned in the following PR #132, the queries will be executed one by one. Is that correct?

marevers commented 1 month ago

Correct. Instead of using the include feature you would pick for instance the postgres database instead in the connection string and in your query, in the FROM statement, you will want to include the database name using the placeholder.

In your case you could call it {{DATABASE_NAME}} which needs to be configured as the placeholder. As your iterator query, you could use something like below. This will return a list of databases matching your pattern.

SELECT datname FROM pg_database WHERE datname LIKE 'tenant_%';

The exporter will then generate an individual query for each database, replacing the {{DATABASE_NAME}} placeholder with each database name. These queries are then executed one by one and the results are merged into one metric series using the placeholder as a label to ensure non-duplicate results.

rezonick commented 1 week ago

The feature presented in PR #132 works, and we were able to use it. However, it didn't work for us in the specific use-case of running queries from postgres to tenant_% databases, as our Postgres cluster doesn't allow cross-database queries.