Closed jpmckinney closed 4 years ago
I am not sure the problem is really in select id from selected_collections
as this just equates to another join which will make things slower but should only really should do so marginally.
I think it is due to the query planner getting things very wrong.
Having looked into this for a while I ran analyze
on the database. I also set random-page-cost
to 2.0 for the database. This is advise given for an SSD
Anyway with the above the query plan is now.
Nested Loop (cost=1.85..3879803.29 rows=303894 width=207)
-> Nested Loop Left Join (cost=1.28..2512221.68 rows=303894 width=638)
-> Nested Loop (cost=0.85..1192288.65 rows=303894 width=63)
-> Nested Loop (cost=0.28..5.31 rows=1 width=8)
-> Seq Scan on selected_collections (cost=0.00..1.01 rows=1 width=4)
-> Index Only Scan using collection_pkey on collection c (cost=0.28..4.29 rows=1 width=4)
Index Cond: (id = selected_collections.id)
-> Index Scan using release_collection_id_idx on release r (cost=0.57..1185010.83 rows=727251 width=63)
Index Cond: (collection_id = c.id)
-> Index Scan using package_data_pkey on package_data pd (cost=0.44..4.34 rows=1 width=579)
Index Cond: (id = r.package_data_id)
-> Index Scan using data_pkey on data d (cost=0.57..4.48 rows=1 width=696)
Index Cond: (id = r.data_id)
(13 rows)
The only seq scan is on selected collections (which is fine as it only has 1 row in it!)
Also the run is now
(.ve) ocdskfp@process1:~/ocdskingfisherviews$ python ocdskingfisher-views-cli refresh-views collection_534
[I 200316 13:01:50 refresh_views:39] running script: 001-functions
[I 200316 13:01:50 refresh_views:47] running time: 0.023357978090643883s
[I 200316 13:01:50 refresh_views:39] running script: 002-tmp_release_and_parties
[I 200316 13:02:13 refresh_views:47] running time: 23.020595516078174s
[I 200316 13:02:13 refresh_views:39] running script: 003-buyer-procuringentity-tenderer
[I 200316 13:02:16 refresh_views:47] running time: 2.988911972846836s
[I 200316 13:02:16 refresh_views:39] running script: 004-planning
[I 200316 13:02:17 refresh_views:47] running time: 0.7676320909522474s
[I 200316 13:02:17 refresh_views:39] running script: 005-tender
[I 200316 13:02:22 refresh_views:47] running time: 5.147628736216575s
[I 200316 13:02:22 refresh_views:39] running script: 006-awards
[I 200316 13:02:37 refresh_views:47] running time: 14.928037466015667s
[I 200316 13:02:37 refresh_views:39] running script: 007-contracts
[I 200316 13:02:59 refresh_views:47] running time: 21.61525326874107s
[I 200316 13:02:59 refresh_views:39] running script: 008-release
[I 200316 13:03:12 refresh_views:47] running time: 13.56462255306542s
[I 200316 13:03:12 refresh_views:49] total running time: 82.05779977981001s
Which is much better.
My reservation with hard coding the collection_id in to queries is that it will require some kind of templating or plpgsql
in order to do it making the scripts more complicated.
The above changes should also help out make all other queries faster too.
Let’s do your change, since it’s suggested in PostgreSQL docs.
I knew it was a planning issue, but didn’t know which variables could be safely changed without worsening performance elsewhere.
If you’ve already made the change, please add it to Salt.
Also, in what scenarios is it appropriate to run ANALYZE on the database?
We should also add a note in the docs about this tuning for SSDs.
Added it to salt.
ANALYSE is run with autovacuum
but as the selected_collections
tables is so small it does not appear to be run as autovauum
is not run on small tables.
I was thinking of running ANALYSE selected_collections
in the views SQL on that table but thought that was an odd thing to do.
Will add it to docs before this is closed.
Salt work is in https://github.com/open-contracting/deploy/commit/54be5983f060b4ebffcd823bcee12292c904569d .
My reading of the conversation above is that the only remaining TODOs are:
random_page_cost
value chosen to the docsIncidentally, I had a look at the random_page_cost
stuff as I was looking at this issue, and I think that we could set it lower in the future; the difference between sequential and random reads for NVMe is even less than regular SSD.
Re: docs, let's instead add a comment to the Salt state to explain how we arrived at that page cost, with a link to https://www.postgresql.org/docs/current/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS and any other relevant links.
Performance reported in https://github.com/open-contracting/kingfisher-views/issues/92#issuecomment-599508625 is much improved. All good in that respect.
@jpmckinney will add a quick update to the Views docs to suggest to anyone running their own Views that they consider this if they encounter similar issues
Done!
Thanks to #93, we have timing logs, and I see that we have bad performance again from this query, even though random_page_cost is 2 (try with the schema in the issue description, or with dominican_republic
).
I ran SET random_page_cost = 1;
(the minimum value that makes sense) before the query, and I still get a bad plan. So, I think we need to use another option.
TLDR: Don't use a
select id from selected_collections
subquery. Instead, get that result in one query, and substitute it into the big query.Produces:
However, using explicit values:
Produces:
In a recent run, 97% of the time was spent doing these table scans:
Resource: https://thoughtbot.com/blog/reading-an-explain-analyze-query-plan