openclimatefix / open-source-quartz-solar-forecast

Open Source Solar Site Level Forecast
MIT License
69 stars 55 forks source link

Eval: missing real generation_power values #110

Open JasonFengGit opened 7 months ago

JasonFengGit commented 7 months ago

Describe the bug

In evaluation, some of the real/expected values of generation_power are missing.

To Reproduce

Steps to reproduce the behavior:

  1. Run python scripts/run_evaluation.py with the following testset.csv (a small test to illustrate the bug):
    pv_id,timestamp
    9531,2021-05-08 10:00:00
  2. Some values missing in results.csv in the generation_power columns Example results.csv:
    ,forecast_power,horizon_hour,pv_id,timestamp,generation_power
    0,0.5382338261787198,0,9531,2021-05-08 10:00:00,
    1,0.6805504837540712,1,9531,2021-05-08 11:00:00,
    2,0.6950511506600507,2,9531,2021-05-08 12:00:00,
    3,0.7507192765284325,3,9531,2021-05-08 13:00:00,
    4,0.6222327619232007,4,9531,2021-05-08 14:00:00,
    5,0.46010747864610435,5,9531,2021-05-08 15:00:00,
    6,0.2792985706278065,6,9531,2021-05-08 16:00:00,
    7,0.11883538094408863,7,9531,2021-05-08 17:00:00,0.19273080444335938
    8,0.03377143967258781,8,9531,2021-05-08 18:00:00,0.05239992141723633
    9,0.004003063439732276,9,9531,2021-05-08 19:00:00,0.0
    10,0.0,10,9531,2021-05-08 20:00:00,0.0
    11,0.0,11,9531,2021-05-08 21:00:00,0.0
    12,0.0,12,9531,2021-05-08 22:00:00,0.0
    13,0.0,13,9531,2021-05-08 23:00:00,0.0
    14,0.0,14,9531,2021-05-09 00:00:00,0.0
    15,0.0,15,9531,2021-05-09 01:00:00,0.0
    16,0.0,16,9531,2021-05-09 02:00:00,0.0
    17,0.0,17,9531,2021-05-09 03:00:00,0.0
    18,0.0006960749166189652,18,9531,2021-05-09 04:00:00,0.0
    19,0.021830932182701164,19,9531,2021-05-09 05:00:00,0.002466707944869995
    20,0.04920016630787139,20,9531,2021-05-09 06:00:00,0.12896760559082032
    21,0.16425460389406232,21,9531,2021-05-09 07:00:00,0.22877279663085937
    22,0.2536578989915163,22,9531,2021-05-09 08:00:00,0.8414171752929688
    23,0.3202140667660062,23,9531,2021-05-09 09:00:00,0.6911544189453125
    24,0.6471341332970747,24,9531,2021-05-09 10:00:00,0.8355504150390625
    25,0.7728203006501675,25,9531,2021-05-09 11:00:00,1.15409765625
    26,0.6856276972650501,26,9531,2021-05-09 12:00:00,0.6737999877929688
    27,0.7735971877911895,27,9531,2021-05-09 13:00:00,1.11731640625
    28,0.6681219518935074,28,9531,2021-05-09 14:00:00,0.20179200744628906
    29,0.49810158614186933,29,9531,2021-05-09 15:00:00,0.45828359985351563
    30,0.3536980181332593,30,9531,2021-05-09 16:00:00,0.35039999389648435
    31,0.19379396872601617,31,9531,2021-05-09 17:00:00,0.2593247985839844
    32,0.05294271353381089,32,9531,2021-05-09 18:00:00,0.17835600280761718
    33,0.00577927292344424,33,9531,2021-05-09 19:00:00,0.07551947784423828
    34,0.0,34,9531,2021-05-09 20:00:00,3.235164058423834e-09
    35,0.0,35,9531,2021-05-09 21:00:00,0.0
    36,0.0,36,9531,2021-05-09 22:00:00,0.0
    37,0.0,37,9531,2021-05-09 23:00:00,0.0
    38,0.0,38,9531,2021-05-10 00:00:00,0.0
    39,0.0,39,9531,2021-05-10 01:00:00,0.0
    40,0.0,40,9531,2021-05-10 02:00:00,0.0
    41,0.0,41,9531,2021-05-10 03:00:00,0.0
    42,0.0016835594981394644,42,9531,2021-05-10 04:00:00,0.0
    43,0.04807132423975142,43,9531,2021-05-10 05:00:00,0.01917263984680176
    44,0.2019059924841576,44,9531,2021-05-10 06:00:00,0.20261639404296874
    45,0.4591377241020738,45,9531,2021-05-10 07:00:00,0.33280679321289064
    46,0.7547477658079034,46,9531,2021-05-10 08:00:00,0.34174200439453123
    47,1.068172900817906,47,9531,2021-05-10 09:00:00,0.9841751708984375

Expected behavior

No missing values (or maybe some fallbacks to handle missing values).

peterdudfield commented 7 months ago

Thanks @JasonFengGit for this

We'll have to think how to perhaps create a new test dataset that doesnt have any missing generation values

JasonFengGit commented 7 months ago

We could filter out timestamps with missing values, but that would introduce some biases that are hard to analyze.

peterdudfield commented 7 months ago

We could filter out timestamps with missing values, but that would introduce some hard to explain bias.

I think we could filter out the missing ones, and introduce new ones. As long as we then do some analysis on the new test set and check its not bias, then it should be ok.

What bias' were you thinking about?

JasonFengGit commented 7 months ago

For example, the missing values might be due to similar reasons and could share some patterns that are either easier or harder to predict, thereby making the evaluation biased.

peterdudfield commented 7 months ago

For example, the missing values might be due to similar reasons and could share some patterns that are either easier or harder to predict, thereby making the evaluation biased.

ah I see, from what I've seen, there are normally quite random as they are all random pv panels throughout the UK. But we can check this

JasonFengGit commented 7 months ago

Oh OK! That would make it easier.

zakwatts commented 6 months ago

@JasonFengGit Nice spot! thanks for this