openclimatefix / nowcasting_datamodel

Datamodel for the nowcasting project
6 stars 6 forks source link

Probabilistic forecasts #172

Closed peterdudfield closed 1 year ago

peterdudfield commented 1 year ago

Detailed Description

Thought it was worth trying to put a few thoughts down on how to make a probabilistic forecast Of course, there are lots of different options

  1. Add to gradboost_pv - https://github.com/openclimatefix/gradboost_pv/issues/66. This probably can be done with PV levels of [10,90]. Of course, we can discuss what levels to make
  2. Create a new model, and with a mixed gaussian density outputs.
  3. Could add to PVNet2.0 and save the results there

Context

Possible Implementation

peterdudfield commented 1 year ago

Thoughts @JackKelly @jacobbieker @dantravers

JackKelly commented 1 year ago

On the topic of how to represent the distribution in the API, and how to display in the UI, I'd propose that we should ask NG what they want. Although they may say that they don't know!

I agree that adding a column to the existing table to store JSON is nice and flexible. And flexibility is important while we test things out. And I guess we'll never SELECT based on that column, so we needn't worry that JSON isn't exactly optimal wrt performance!

In general, my preffered approach for getting probs out of a neural net is a mixture density network. It's easy to train, stable, expressive, compact, simple to implement, simple to plot, simple to explain. We can sample ensembles from the MDN if users want. Or sample to get quantiles. And I'd be keen to use a distribution that can express multi-modal distributions.

jacobbieker commented 1 year ago

Yeah, agree on asking NG what they want. If its easy, seems like maybe doing the quantile regression could be good for experimenting how to show the probabilistic forecasts while working on a MDN or whatever NG prefers.

peterdudfield commented 1 year ago

There's definitely some balance here, we should work out whats right, quantile regressions will take a few weeks of work, MDN might take 6 months +, and then is only available for the late autumn.

peterdudfield commented 1 year ago

Also I would recommend we come up with a way to show them, and see what they think. e.g.

https://robjhyndman.com/hyndsight/visualization-of-probabilistic-forecasts/

JackKelly commented 1 year ago

(I've updated my comment above... Sorry, my first version of the comment was a bit rushed!)

peterdudfield commented 1 year ago

Thanks @JackKelly and @jacobbieker

JackKelly commented 1 year ago

Also I would recommend we come up with a way to show them, and see what they think. e.g. https://robjhyndman.com/hyndsight/visualization-of-probabilistic-forecasts/

That's a great idea. And that web page has some great examples!

There's definitely some balance here, we should work out what's right, quantile regressions will take a few weeks of work, MDN might take 6 months +

Yeah, I agree, quantile regression is probably the right thing to start with.

BTW, I remember that we've briefly discussed probability stuff with Alex Carter & Lyndon. I searched through our meetings notes with ESO. All I've found (from this doc) is that ESO currently use deciles for their wind power forecast. And that ESO feed a "low" and a "high" forecast into their downstream systems.

So, yeah, maybe we could start with deciles?

dantravers commented 1 year ago

coming late to this. In terms of complexity of model - I think we should start with the simplest reasonable model and start to do some validation of that (whcih is quite hard in itelf for probabilistic forecasts).

For visualisation, I like the Hyndman thing. We used to have a display like this in a tool for stochastic variable distributions, and it was pretty popular (ignore the wiggly line, it's an example brownian motion path). It's very similar. We can run that past them and see what they think. We could do this in a couple of shades of yellow.
image

dantravers commented 1 year ago

Given the discussion yesterday, where they highlight it's the big errors that cost the most, it would be nice if we could show them 90%, and 99%, but I'm concious 99% is hardest to be confident of, so I would suggest we don't commit to the percentiles we will show yet, but work that out once we've done some internal validation.

peterdudfield commented 1 year ago

Task list is here - https://github.com/openclimatefix/nowcasting_api/issues/213