Closed peterdudfield closed 2 years ago
SQL for national ME
select
avg(forecast_value_latest.expected_power_generation_megawatts - gsp_yield.solar_generation_kw / 1000) as err
from forecast_value_latest
join location on location.gsp_id = forecast_value_latest.gsp_id
join gsp_yield on gsp_yield.location_id = location.id
where forecast_value_latest.created_utc < '2022-06-21'
and forecast_value_latest.created_utc >= '2022-06-20'
and gsp_yield.datetime_utc = forecast_value_latest.target_time
and forecast_value_latest.gsp_id = 0
MAE of latest forecast | Date | MAE |
---|---|---|
2022-06-15 | 294 | |
2022-06-16 | 293 | |
2022-06-17 | 278 | |
2022-06-18 | 162 | |
2022-06-19 | 273 | |
2022-06-20 | 615 |
Looks good! Well done for computing these! Does the code filter out "nighttime" before computing the metrics? (No worries if not, just keen to understand!)
Looks good! Well done for computing these! Does the code filter out "nighttime" before computing the metrics? (No worries if not, just keen to understand!)
No this is for full day. So I think there is bit of a gap between training MAE and production. Although in training the metric is through the year, so would be good to know the 'summer' MAE for training
update sql
select
target_time::TIMESTAMP::DATE as date,
avg(abs(forecast_value_latest.expected_power_generation_megawatts - gsp_yield.solar_generation_kw / 1000)) as err
from forecast_value_latest
join location on location.gsp_id = forecast_value_latest.gsp_id
join gsp_yield on gsp_yield.location_id = location.id
where target_time < '2022-06-22'
and target_time >= '2022-06-15'
and gsp_yield.datetime_utc = forecast_value_latest.target_time
and forecast_value_latest.gsp_id = 0
and regime = 'day-after'
group by date
order by date
MAE of latest forecast | Date | MAE |
---|---|---|
2022-06-15 | 296 | |
2022-06-16 | 292 | |
2022-06-17 | 271 | |
2022-06-18 | 166 | |
2022-06-19 | 266 | |
2022-06-20 | 606 | |
2022-06-21 | 436 | |
2022-06-23 | 238 | |
2022-06-24 | 156 | |
2022-06-25 | 202 | |
2022-06-26 | 260 | |
2022-06-27 | 312 |
Very usfeul - I've been wondering what the error statistics are looking like, and also on individual GSPs, as I have a suspicion that some are serial offenders.
Regarding hours of the day - I think a few metrics could be good to calculate - I.e. midday, as well as all daylight hours.
Is it possible to easily get a csv extract of the GSP and national errors so we can generate a range of stats? I'm thinking that we might have a number on the internal dashboard. But we will likely want to play around as we work out which are the key ones we need to monitor and even put KPIs on.
Very usfeul - I've been wondering what the error statistics are looking like, and also on individual GSPs, as I have a suspicion that some are serial offenders. Regarding hours of the day - I think a few metrics could be good to calculate - I.e. midday, as well as all daylight hours. Is it possible to easily get a csv extract of the GSP and national errors so we can generate a range of stats? I'm thinking that we might have a number on the internal dashboard. But we will likely want to play around as we work out which are the key ones we need to monitor and even put KPIs on.
Yea I can give you a csv of the last week, and you can play around. As always, itll probably be good to implement a few simple ones first, and then start adding to them
select target_time, expected_power_generation_megawatts, forecast_value_latest.gsp_id, label, solar_generation_kw
from forecast_value_latest
join location on location.gsp_id = forecast_value_latest.gsp_id
join gsp_yield on gsp_yield.location_id = location.id
where target_time < '2022-06-28'
and target_time >= '2022-06-15'
and gsp_yield.datetime_utc = forecast_value_latest.target_time
and forecast_value_latest.gsp_id = 0
and regime = 'day-after'
order by target_time, forecast_value_latest.gsp_id
This is to get National gsp values
@dantravers
I've started a list of suggested metrics against which we should compare forecast methodologies in a google doc which we used earlier for metrics discussions.
to explore what can be done
This can be used to get gsps levels
select
target_time::TIMESTAMP::DATE as date,
avg(abs(forecast_value_latest.expected_power_generation_megawatts - gsp_yield.solar_generation_kw / 1000)) as err
from forecast_value_latest
join location on location.gsp_id = forecast_value_latest.gsp_id
join gsp_yield on gsp_yield.location_id = location.id
where target_time < '2022-08-05'
and target_time >= '2022-08-01'
and gsp_yield.solar_generation_kw +1 > gsp_yield.solar_generation_kw
and gsp_yield.datetime_utc = forecast_value_latest.target_time
and forecast_value_latest.gsp_id != 0
and regime = 'day-after'
group by date
order by date
Make stats for database truth and forecast
Would be great to run it once a day, and save it in the database The data table could be