openclimatefix / power_perceiver

Machine learning experiments using the Perceiver IO model to forecast the electricity system (starting with solar)
MIT License
7 stars 1 forks source link

Recall similar days from the history #174

Open JackKelly opened 2 years ago

JackKelly commented 2 years ago

To help forecast individual PV power, and GSP power, find a set of "similar" periods from the history, and feed those "similar periods" into the ML model at inference time, along with the recent history.

For training to run as quickly as necessary, this probably requires that we can fit a representation of the history into RAM. (In production, we probably have time to load history from disk).

Could maybe use contrastive learning (#155) to get an encoder to map from, say, NWPs and satellite to a vector, such that vectors are similar when the resulting PV power is similar (and similarity of PV power could be judged with NMAE).

jacobbieker commented 2 years ago

We could use FAISS? Its what HuggingFace uses for similar search in their vectors, and a lot of other places, is quick, can run on the GPU, and scales well to millions of vectors, we chatted about it a little here: https://github.com/openclimatefix/satflow/issues/65

JackKelly commented 2 years ago

SGTM! Thanks for digging out that discussion! I had a memory that we'd talked about this before but I could find the discussion! So thanks for the link!

I'm in two minds about how important it'll be to recall "similar" days from the history. Right now I'm definitely thinking this is a fairly low priority, not least because I suspect it'll be a fair chunk of work. But I am super-curious to see how well it works!

Please do shout if this is something you might be interested in exploring? And/or something we could ask a student to look into?

jacobbieker commented 2 years ago

Yeah, I'd be interested in tackling it! I think it could be interesting to see how well it works, and even for just having the similarity search could be helpful for finding examples similar to where the models fail more often too

JackKelly commented 2 years ago

Fab! That's be awesome! I've put it on the agenda for next week's meeting(s)!

jacobbieker commented 2 years ago

For the contrastive encoding, there is a PyTorch implementation here: https://github.com/rschwarz15/CPCV2-PyTorch

JackKelly commented 1 year ago

This paper might be relevant (I've only read the abstract):

Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

JackKelly commented 1 year ago

Also worth noting that Alex Carter at NG-ESO says he'd quite like our web UI to display days from the last few years, and to automatically suggest "similar" days.