openclimatefix / open-source-quartz-solar-forecast

Open Source Solar Site Level Forecast
MIT License
66 stars 54 forks source link

Benchmark #27

Open peterdudfield opened 10 months ago

peterdudfield commented 10 months ago

Detailed Description

It would be great to bench mark the model

Context

Always good to benchmark

Possible Implementation

felipewhitaker commented 8 months ago

Hi! Could I take this one? Is there any deadline? I would aim at doing it in the following weeks.

peterdudfield commented 8 months ago

Thanks @felipewhitaker , there is no deadline, So really appreciate you taking this on

ombhojane commented 8 months ago

Hello, can anyone please guide to how to perform it in correct way I'm thinking to perform evaluation using Mean Absolute Error to compare with train and valid data of PV values Is this is a correct way, it would be great if you explain it once

felipewhitaker commented 8 months ago

@ombhojane, it is quite common to use Mean Absolute Error (MAE) for evaluating models, including in the weather research area. Another common metric is Continuous Ranked Probability Score (CRPS), which is a generalization of MAE to take scenarios into consideration (properscoring has an implementation of it).

Independent of the metric, what do you expect to be a correct way? When comparing models, it is important that both are compared by using a dataset that neither have used to learn (test dataset), and that the comparison is fair (it doesn't make much sense to compare two models that predict different things).

felipewhitaker commented 8 months ago

After exploring psp, my next step is to use the dataset available in Hugging Face (linked in the first comment of #30) to make an historic average model. What interface should it support? The current model has some attributes (e.g. _config, _nwp_tolerance, _nwp_dropout): should every model include these?

peterdudfield commented 8 months ago

I think ideall it would be similar to this https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/forecast.py#L11. Does this answer you question?

peterdudfield commented 8 months ago

Or perhaps something like this https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/forecasts/v1.py#L12

It would be good to be able to switch it into the evaulation script easier too, here - https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/eval/forecast.py#L19,

felipewhitaker commented 8 months ago

I think ideall it would be similar to this https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/forecast.py#L11. Does this answer you question?

It does help, thanks! I might've missed some details there. Moreover, is there a file containg how the current model was trained (which I believe is in psp)? It would be nice to be able to use the same rough steps.

ombhojane commented 7 months ago

After exploring psp, my next step is to use the dataset available in Hugging Face (linked in the first comment of #30) to make an historic average model. What interface should it support? The current model has some attributes (e.g. _config, _nwp_tolerance, _nwp_dropout): should every model include these?

peterdudfield commented 7 months ago

I think ideall it would be similar to this https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/forecast.py#L11. Does this answer you question?

It does help, thanks! I might've missed some details there. Moreover, is there a file containg how the current model was trained (which I believe is in psp)? It would be nice to be able to use the same rough steps.

The running of the model is in here - https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/quartz_solar_forecast/forecasts/v1.py. I'm hoping we can make v2, v3, ... e.t.c. The actual model is in pv-site-prediction but I'm not sure its worth going into that code as it might be a bit dense. The train script is here though

A really simple benchmark could be the prediction is always half the capacity and then run the evaluation. Oviously it would a very bad model, but helps give an impression on what the MAE numbers mean