timescale / promscale_extension

[DEPRECATED] Tables, types and functions supporting Promscale
Other
37 stars 17 forks source link

Maintenance jobs metrics without backend local side channel #555

Closed sumerman closed 2 years ago

sumerman commented 2 years ago

Description

When I started down the backend buffer approach, I didn't realize just how much transaction control happens in our background jobs. Carefully threading instrumentation through it proved an unreadable mess. But while doing so, I figured that separating maintenance jobs by type wouldn't require quite as big of a refactoring as I initially presumed and would allow us to implement 90% of what we wanted. E.g.

SELECT
    coalesce(config ->> 'signal', 'traces') AS signal_type,
    coalesce(config ->> 'type', 'compression') AS job_type,
    MAX(js.last_run_duration) AS last_duration,
    SUM(js.total_failures) AS failures_count,
    SUM(js.total_runs) AS total_runs_count
FROM timescaledb_information.job_stats js
JOIN timescaledb_information.jobs j ON j.job_id = js.job_id
WHERE proc_schema = '_prom_catalog' OR proc_schema = '_ps_trace'
GROUP BY 1, 2;

Gets us all that's left unchecked in #492 except for breakdown by a failure type. Of course, we can't go more granular in the future, nor can we ship logs that way, but overall it seems to be the better option after all.

Merge requirements

Please take into account the following non-code changes that you may need to make with your PR: