Closed ingalls closed 4 years ago
Posting the query plans/execution time for the current feature history query vs the new proposed feature history query:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
-------------------------
Aggregate (cost=3942.07..3942.08 rows=1 width=32) (actual time=99.199..99.199 rows=1 loops=1)
-> Subquery Scan on t (cost=2608.76..3940.75 rows=264 width=49) (actual time=25.665..99.173 rows=3 loops=1)
Filter: (((t.feat ->> 'id'::text))::bigint = 123)
Rows Removed by Filter: 11756
-> ProjectSet (cost=2608.76..2886.75 rows=52700 width=57) (actual time=24.044..68.504 rows=11759 loops=1)
-> Result (cost=2608.76..2615.35 rows=527 width=45) (actual time=0.462..0.467 rows=3 loops=1)
-> Sort (cost=2608.76..2610.08 rows=527 width=45) (actual time=0.461..0.463 rows=3 loops=1)
Sort Key: deltas.id DESC
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=1310.40..2584.93 rows=527 width=45) (actual time=0.445..0.452 rows=3 loops=1)
Hash Cond: (deltas.uid = users.id)
-> Bitmap Heap Scan on deltas (cost=1308.09..2575.37 rows=527 width=36) (actual time=0.413..0.416 rows=3
loops=1)
Recheck Cond: (affected @> '{123}'::bigint[])
Heap Blocks: exact=3
-> Bitmap Index Scan on deltas_affected_idx (cost=0.00..1307.95 rows=527 width=0) (actual time=0.40
8..0.408 rows=3 loops=1)
Index Cond: (affected @> '{123}'::bigint[])
-> Hash (cost=2.14..2.14 rows=14 width=17) (actual time=0.024..0.024 rows=14 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on users (cost=0.00..2.14 rows=14 width=17) (actual time=0.008..0.015 rows=14 loops=1)
Planning Time: 0.218 ms
Execution Time: 99.391 ms
The new feature history query would then be able to simply select the feature history out of a single table with no additional parsing
This is not exactly true-- it still needs to pull data from the deltas
and users
table and convert everything to JSON. Here's the query i'm using now to replicate our existing functionality:
SELECT json_agg (
JSON_Build_Object(
'id', deltas.id,
'uid', deltas.uid,
'feat', JSON_Build_Object(
'id', geo_history.id,
'action', geo_history.action,
'key', geo_history.key,
'type', 'Feature',
'version', geo_history.version,
'geometry', ST_AsGeoJSON(geom)::JSON,
'properties', geo_history.props
),
'username', users.username
)
)
FROM
geo_history,
deltas,
users
WHERE
geo_history.id = 366672 AND
geo_history.delta = deltas.id AND
deltas.uid = users.id
I'd welcome suggestions on how to make this more performant.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=188344.67..188344.68 rows=1 width=32) (actual time=3.987..3.987 rows=1 loops=1)
-> Hash Join (cost=1474.94..113572.67 rows=29760 width=192) (actual time=3.918..3.925 rows=2 loops=1)
Hash Cond: (geo_history.delta = deltas.id)
-> Bitmap Heap Scan on geo_history (cost=1411.48..112342.15 rows=46312 width=152) (actual time=0.023..0.026 rows=2 loops=1)
Recheck Cond: (id = 366672)
Heap Blocks: exact=2
-> Bitmap Index Scan on geo_history_pkey (cost=0.00..1399.90 rows=46312 width=0) (actual time=0.018..0.018 rows=2 loops=1)
Index Cond: (id = 366672)
-> Hash (cost=61.85..61.85 rows=129 width=48) (actual time=3.887..3.887 rows=1500 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 99kB
-> Hash Join (cost=1.32..61.85 rows=129 width=48) (actual time=0.042..2.890 rows=1500 loops=1)
Hash Cond: (deltas.uid = users.id)
-> Seq Scan on deltas (cost=0.00..52.36 rows=1836 width=16) (actual time=0.009..0.955 rows=1500 loops=1)
-> Hash (cost=1.14..1.14 rows=14 width=40) (actual time=0.026..0.026 rows=14 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on users (cost=0.00..1.14 rows=14 width=40) (actual time=0.005..0.014 rows=14 loops=1)
Planning time: 0.180 ms
Execution time: 4.046 ms
It does still improve query time by a factor of 25, 99.391 ms
vs 4.046 ms
:taco:
Context
We currently support full feature history retrieval that is based on the stored FeatureCollection within a given delta.
Although this allows us to return a full geometry for any given feature in time, it is very slow due to having to iterate through all the features in a given delta to find the requested feature.
This PR scopes breaking out the features in a given delta to be contained in their own table.
The new feature history query would then be able to simply select the feature history out of a single table with no additional parsing
Feature History
Delta Retrieval
cc/ @ingalls