sodabrew / puppet-dashboard

The Puppet Dashboard is a web interface providing node classification and reporting features for Puppet, an open source system configuration management tool
Other
540 stars 184 forks source link

Prune Dashboard Performance #314

Open ruckc opened 9 years ago

ruckc commented 9 years ago

I'm investigating runaway queries in my PostgreSQL dashboard database.

It appears my nightly prune cron is going near infinite (taking over 24 hours) in the "delete from resource_events" portion of the prune rake task.

When investigating the query, it appears that puppet-dashboard first counts the records to delete (expensive) then actually tries deleting them (expensive query again).

It appears in PostgreSQL the NOT IN isn't optimal for this query:

dashboard=# explain select count(*) from resource_events where resource_status_id not in (select id from resource_statuses);

                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Aggregate  (cost=28055423930.51..28055423930.52 rows=1 width=0)
   ->  Seq Scan on resource_events  (cost=0.00..28055423886.85 rows=17462 width=0)
         Filter: (NOT (SubPlan 1))
         SubPlan 1
           ->  Materialize  (cost=0.00..1538250.21 rows=27380614 width=4)
                 ->  Seq Scan on resource_statuses  (cost=0.00..1294391.14 rows=27380614 width=4)
(6 rows)

I believe an optional query would use a join (since resource_events is hopefully smaller). The below join would make this query extremely trivial and should support any ANSI SQL compatible database.

dashboard=# explain select count(*) from resource_events e LEFT JOIN resource_statuses s ON (e.resource_status_id = s.id) WHERE s.id IS NULL;

                                                    QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=293234.43..293234.44 rows=1 width=0)
   ->  Nested Loop Anti Join  (cost=0.44..293234.42 rows=1 width=0)
         ->  Seq Scan on resource_events e  (cost=0.00..1279.23 rows=34923 width=4)
         ->  Index Only Scan using resource_statuses_pkey on resource_statuses s  (cost=0.44..8.35 rows=1 width=4)
               Index Cond: (id = e.resource_status_id)
(5 rows)
ruckc commented 9 years ago

@sodabrew I've been trying to modify my prune job to perform this query. If it wasn't for tracking deletion_count and batching this would be extremely simple.

Also, found this: http://stackoverflow.com/questions/21662726/delete-using-left-outer-join-in-postgres so maybe the delete should work differently.

ruckc commented 9 years ago

This is the lowest Cost way of pruning in postgres i've found...

delete from metrics WHERE NOT EXISTS (SELECT 1 FROM reports WHERE metrics.report_id = reports.id) delete from report_logs WHERE NOT EXISTS (SELECT 1 FROM reports WHERE report_logs.report_id = reports.id) delete from resource_statuses where not exists ( select 1 from reports where resource_statuses.report_id = reports.id ); delete from resource_events where not exists (SELECT 1 FROM resource_statuses where resource_events.resource_status_id = resource_statuses.id)