openclimatefix / nowcasting_datamodel

Datamodel for the nowcasting project
6 stars 6 forks source link

Get MAE from database #80

Closed peterdudfield closed 2 years ago

peterdudfield commented 2 years ago

Make stats for database truth and forecast

  1. Metrics
    • MAE
    • RMSE
  2. Forecast horizonGood to look at different hoizons
  3. Good to look at average GSP, and national

Would be great to run it once a day, and save it in the database The data table could be

peterdudfield commented 2 years ago

SQL for national ME

select 
avg(forecast_value_latest.expected_power_generation_megawatts - gsp_yield.solar_generation_kw / 1000) as err
from forecast_value_latest
join location on location.gsp_id = forecast_value_latest.gsp_id
join gsp_yield on gsp_yield.location_id = location.id
where forecast_value_latest.created_utc < '2022-06-21'
and forecast_value_latest.created_utc >= '2022-06-20'
and gsp_yield.datetime_utc = forecast_value_latest.target_time
and forecast_value_latest.gsp_id = 0
MAE of latest forecast Date MAE
2022-06-15 294
2022-06-16 293
2022-06-17 278
2022-06-18 162
2022-06-19 273
2022-06-20 615
JackKelly commented 2 years ago

Looks good! Well done for computing these! Does the code filter out "nighttime" before computing the metrics? (No worries if not, just keen to understand!)

peterdudfield commented 2 years ago

Looks good! Well done for computing these! Does the code filter out "nighttime" before computing the metrics? (No worries if not, just keen to understand!)

No this is for full day. So I think there is bit of a gap between training MAE and production. Although in training the metric is through the year, so would be good to know the 'summer' MAE for training

peterdudfield commented 2 years ago

update sql

select 
target_time::TIMESTAMP::DATE as date,
avg(abs(forecast_value_latest.expected_power_generation_megawatts - gsp_yield.solar_generation_kw / 1000)) as err
from forecast_value_latest
join location on location.gsp_id = forecast_value_latest.gsp_id
join gsp_yield on gsp_yield.location_id = location.id
where target_time < '2022-06-22'
and target_time >= '2022-06-15'
and gsp_yield.datetime_utc = forecast_value_latest.target_time
and forecast_value_latest.gsp_id = 0
and regime = 'day-after'
group by date
order by date
MAE of latest forecast Date MAE
2022-06-15 296
2022-06-16 292
2022-06-17 271
2022-06-18 166
2022-06-19 266
2022-06-20 606
2022-06-21 436
2022-06-23 238
2022-06-24 156
2022-06-25 202
2022-06-26 260
2022-06-27 312
dantravers commented 2 years ago

Very usfeul - I've been wondering what the error statistics are looking like, and also on individual GSPs, as I have a suspicion that some are serial offenders.
Regarding hours of the day - I think a few metrics could be good to calculate - I.e. midday, as well as all daylight hours.
Is it possible to easily get a csv extract of the GSP and national errors so we can generate a range of stats? I'm thinking that we might have a number on the internal dashboard. But we will likely want to play around as we work out which are the key ones we need to monitor and even put KPIs on.

peterdudfield commented 2 years ago

Very usfeul - I've been wondering what the error statistics are looking like, and also on individual GSPs, as I have a suspicion that some are serial offenders. Regarding hours of the day - I think a few metrics could be good to calculate - I.e. midday, as well as all daylight hours. Is it possible to easily get a csv extract of the GSP and national errors so we can generate a range of stats? I'm thinking that we might have a number on the internal dashboard. But we will likely want to play around as we work out which are the key ones we need to monitor and even put KPIs on.

Yea I can give you a csv of the last week, and you can play around. As always, itll probably be good to implement a few simple ones first, and then start adding to them

peterdudfield commented 2 years ago
select target_time, expected_power_generation_megawatts, forecast_value_latest.gsp_id, label, solar_generation_kw
from forecast_value_latest
join location on location.gsp_id = forecast_value_latest.gsp_id
join gsp_yield on gsp_yield.location_id = location.id
where target_time < '2022-06-28'
and target_time >= '2022-06-15'
and gsp_yield.datetime_utc = forecast_value_latest.target_time
and forecast_value_latest.gsp_id = 0
and regime = 'day-after'
order by target_time, forecast_value_latest.gsp_id

This is to get National gsp values

peterdudfield commented 2 years ago

@dantravers

dantravers commented 2 years ago

I've started a list of suggested metrics against which we should compare forecast methodologies in a google doc which we used earlier for metrics discussions.

peterdudfield commented 2 years ago

Screenshot 2022-07-29 at 09 35 33

to explore what can be done

peterdudfield commented 2 years ago

This can be used to get gsps levels

select 
target_time::TIMESTAMP::DATE as date,
avg(abs(forecast_value_latest.expected_power_generation_megawatts - gsp_yield.solar_generation_kw / 1000)) as err
from forecast_value_latest
join location on location.gsp_id = forecast_value_latest.gsp_id
join gsp_yield on gsp_yield.location_id = location.id
where target_time < '2022-08-05'
and target_time >= '2022-08-01'
and gsp_yield.solar_generation_kw +1 > gsp_yield.solar_generation_kw
and gsp_yield.datetime_utc = forecast_value_latest.target_time
and forecast_value_latest.gsp_id != 0
and regime = 'day-after'
group by date
order by date