Open ruckc opened 9 years ago
@sodabrew I've been trying to modify my prune job to perform this query. If it wasn't for tracking deletion_count and batching this would be extremely simple.
Also, found this: http://stackoverflow.com/questions/21662726/delete-using-left-outer-join-in-postgres so maybe the delete should work differently.
This is the lowest Cost way of pruning in postgres i've found...
delete from metrics WHERE NOT EXISTS (SELECT 1 FROM reports WHERE metrics.report_id = reports.id) delete from report_logs WHERE NOT EXISTS (SELECT 1 FROM reports WHERE report_logs.report_id = reports.id) delete from resource_statuses where not exists ( select 1 from reports where resource_statuses.report_id = reports.id ); delete from resource_events where not exists (SELECT 1 FROM resource_statuses where resource_events.resource_status_id = resource_statuses.id)
I'm investigating runaway queries in my PostgreSQL dashboard database.
It appears my nightly prune cron is going near infinite (taking over 24 hours) in the "delete from resource_events" portion of the prune rake task.
When investigating the query, it appears that puppet-dashboard first counts the records to delete (expensive) then actually tries deleting them (expensive query again).
It appears in PostgreSQL the NOT IN isn't optimal for this query:
I believe an optional query would use a join (since resource_events is hopefully smaller). The below join would make this query extremely trivial and should support any ANSI SQL compatible database.