Open peterdudfield opened 1 year ago
Above didnt really help
I also tried
create index ix_forecasts_timestamp_utc_desc on forecasts (timestamp_utc desc)
but it didnt really help
Tyring create index ix_forecast_values_start_utc_horizon_minutes on forecast_values (start_utc, horizon_minutes)
It took 1.26 minutes to make
Want to try cliustering on horizon_minutes
on forecast_values (horizon_minutes)
ALTER TABLE forecast_values
CLUSTER ON ix_forecast_values_horizon_minutes;```
This took 1.11 seconds to make
Can you share the query you are trying to optimize?
The forecast past one
I thought we could partition the table into horizon_minutes=0 and horizon_minutes>0
SELECT DISTINCT ON (forecasts.site_uuid, forecasts.timestamp_utc) *
FROM forecasts JOIN forecast_values ON forecasts.forecast_uuid = forecast_values.forecast_uuid
WHERE forecasts.site_uuid IN ('04d67e7f-7bcf-4b55-8ced-14d311e0a5f5',
'0bd3dc11-8160-46a0-9ff7-36e125750aeb')
AND forecasts.timestamp_utc >= '2023-06-15'
AND forecasts.timestamp_utc < '2023-06-16 15:45'
AND forecast_values.horizon_minutes = 0
AND forecast_values.start_utc >= '2023-06-15'
AND forecast_values.start_utc < '2023-06-16 15:45'
ORDER BY forecasts.site_uuid, forecasts.timestamp_utc
You mean the 2 queries called when this is called? https://github.com/openclimatefix/pv-site-api/blob/411277ea1585bb4a41982f363561afece86ee6d1/pv_site_api/_db_helpers.py#L138
If so I guess the query on past forecasts is the slow one? It's worth noting that the data returned by this query never changes (because it's in the past). As time goes by the only thing that changes is that the latest forecast (which is faster to query, that's the second query in our 2 queries).
My point is that instead of calling both queries systematically, we could call the first query once (slow but can be done async and that's typically not what the user is looking for) and then update it with the most latest forecast (which is fast) as they become available. We could even keep the data in the browser cache so that if they refresh it's already there. And if they close their window, come back 30 minutes later, we could just query the missing bit, which would be faster.
I'm not sure it's an easy problem to fix otherwise.
I agree, they might not want that immediately,
I'm going to try some partitioning, and see if it helps
Cool! It could be a good idea to make a separate toy database filled with a lot of (random) data and do tests there, to remove possible external factors, and probably iterate faster.
Partition from table could be
-- drop table forecast_values_new2;
CREATE TABLE forecast_values_temp
(like forecast_values including all)
alter table forecast_values_temp
drop constraint forecast_values_temp_pkey,
add primary key (forecast_value_uuid, horizon_minutes);
create table forecast_values_new
(like forecast_values_temp including all)
partition by range (horizon_minutes);
CREATE TABLE forecast_values_0 PARTITION OF forecast_values_new
FOR VALUES FROM (0) TO (1);
CREATE TABLE forecast_values_g0 PARTITION OF forecast_values_new
FOR VALUES FROM (1) TO (10000000);
-- This can take some time > 10
INSERT INTO forecast_values_new
SELECT * FROM forecast_values
ALTER TABLE forecast_values RENAME TO forecast_values_old;
ALTER TABLE forecast_values_new RENAME TO forecast_values;
And then do it in the datamodel
I want to try
alter table forecast_values
drop constraint forecast_values_pkey,
add primary key (forecast_value_uuid, horizon_minutes);
first
It sounds dangerous to try those in the Dev/Prod databases. You could end up in a state that alembic doesn't recognize by mistake no?
alter table forecast_values
drop constraint forecast_values_pkey,
add primary key (forecast_value_uuid, horizon_minutes);
This didnt work, Ill role this back.
I'm doing it in 'dev', the above meanes I can always role back, never deleting anything
I made a parititon table and done the renaming, will have to leave it over the weekend to see if it speeds things up or not.
I rolled the partition back
It might be sueful to add some index which desc in time. This is becasue we load the most recent data (often)
I`ve tried adding
It took 24 seconds to make