Closed peterdudfield closed 1 year ago
Ive added some logging in api 0.0.21 to see what is taking a long time
Running your query directly in psql
takes about 500ms, which is still awfully slow.
This is not surprising given that no indexes have been defined on the database.
This is the EXPLAIN ANALYZE
output:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=16917.77..20807.38 rows=32695 width=146) (actual time=304.545..333.378 rows=9332 loops=1)
-> Gather Merge (cost=16917.77..20725.64 rows=32695 width=146) (actual time=304.544..327.944 rows=38976 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=15917.74..15951.80 rows=13623 width=146) (actual time=284.618..287.307 rows=12992 loops=3)
Sort Key: datetime_intervals.start_utc, forecast_values.created_utc DESC
Sort Method: quicksort Memory: 3886kB
Worker 0: Sort Method: external merge Disk: 2440kB
Worker 1: Sort Method: quicksort Memory: 3010kB
-> Hash Join (cost=518.44..14982.27 rows=13623 width=146) (actual time=33.265..258.976 rows=12992 loops=3)
Hash Cond: (forecast_values.datetime_interval_uuid = datetime_intervals.datetime_interval_uuid)
-> Hash Join (cost=97.62..14515.31 rows=17575 width=106) (actual time=17.753..228.061 rows=12992 loops=3)
Hash Cond: (forecast_values.forecast_uuid = forecasts.forecast_uuid)
-> Parallel Seq Scan on forecast_values (cost=0.00..13633.72 rows=298355 width=60) (actual time=13.703..134.736 rows=221755 loops=3)
Filter: (created_utc >= '2023-02-02 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 30528
-> Hash (cost=94.47..94.47 rows=252 width=46) (actual time=2.076..2.077 rows=231 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 26kB
-> Seq Scan on forecasts (cost=0.00..94.47 rows=252 width=46) (actual time=0.027..1.879 rows=231 loops=3)
Filter: (site_uuid = '725a8670-d012-474d-b901-1179f43e7182'::uuid)
Rows Removed by Filter: 3712
-> Hash (cost=287.27..287.27 rows=10683 width=40) (actual time=12.546..12.547 rows=10668 loops=3)
Buckets: 16384 Batches: 1 Memory Usage: 879kB
-> Seq Scan on datetime_intervals (cost=0.00..287.27 rows=10683 width=40) (actual time=0.872..4.645 rows=10668 loops=3)
Filter: (start_utc >= '2023-02-02 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 3017
Planning Time: 0.348 ms
Execution Time: 334.118 ms
I'm not a pro at interpreting those but in general "Seq scan" are bad. Adding indexes on these should fix that:
forecast_values.created_utc
datetime_intervals.start_utc
forecasts.site_uuid
Those basically correspond to the WHERE
clause.
Indexes are probably needed on most of the ForeignKeys.
Ah I just realized I misread your post! Sorry! The index stuff is still valid though I think, but won't fix your 30 seconds issue!
Once all the datetimes are in 5 mins blocks, this will speed up
Currently pulling one forecast takes 30 seconds on API call - http://pvsite-development-api-sites.eu-west-1.elasticbeanstalk.com/docs#/default/get_pv_forecast_sites_pv_forecast__site_uuid__get
SQL is
which only takes ~1 seconds on
pgAdmin
The slowness might be due to
swagger
docs