When I started down the backend buffer approach, I didn't realize just how much transaction control happens in our background jobs. Carefully threading instrumentation through it proved an unreadable mess. But while doing so, I figured that separating maintenance jobs by type wouldn't require quite as big of a refactoring as I initially presumed and would allow us to implement 90% of what we wanted. E.g.
SELECT
coalesce(config ->> 'signal', 'traces') AS signal_type,
coalesce(config ->> 'type', 'compression') AS job_type,
MAX(js.last_run_duration) AS last_duration,
SUM(js.total_failures) AS failures_count,
SUM(js.total_runs) AS total_runs_count
FROM timescaledb_information.job_stats js
JOIN timescaledb_information.jobs j ON j.job_id = js.job_id
WHERE proc_schema = '_prom_catalog' OR proc_schema = '_ps_trace'
GROUP BY 1, 2;
Gets us all that's left unchecked in #492 except for breakdown by a failure type. Of course, we can't go more granular in the future, nor can we ship logs that way, but overall it seems to be the better option after all.
Merge requirements
Please take into account the following non-code changes that you may need to make with your PR:
Description
When I started down the backend buffer approach, I didn't realize just how much transaction control happens in our background jobs. Carefully threading instrumentation through it proved an unreadable mess. But while doing so, I figured that separating maintenance jobs by type wouldn't require quite as big of a refactoring as I initially presumed and would allow us to implement 90% of what we wanted. E.g.
Gets us all that's left unchecked in #492 except for breakdown by a failure type. Of course, we can't go more granular in the future, nor can we ship logs that way, but overall it seems to be the better option after all.
Merge requirements
Please take into account the following non-code changes that you may need to make with your PR: