Open JackKelly opened 2 years ago
We could use FAISS? Its what HuggingFace uses for similar search in their vectors, and a lot of other places, is quick, can run on the GPU, and scales well to millions of vectors, we chatted about it a little here: https://github.com/openclimatefix/satflow/issues/65
SGTM! Thanks for digging out that discussion! I had a memory that we'd talked about this before but I could find the discussion! So thanks for the link!
I'm in two minds about how important it'll be to recall "similar" days from the history. Right now I'm definitely thinking this is a fairly low priority, not least because I suspect it'll be a fair chunk of work. But I am super-curious to see how well it works!
Please do shout if this is something you might be interested in exploring? And/or something we could ask a student to look into?
Yeah, I'd be interested in tackling it! I think it could be interesting to see how well it works, and even for just having the similarity search could be helpful for finding examples similar to where the models fail more often too
Fab! That's be awesome! I've put it on the agenda for next week's meeting(s)!
For the contrastive encoding, there is a PyTorch implementation here: https://github.com/rschwarz15/CPCV2-PyTorch
This paper might be relevant (I've only read the abstract):
Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning
Also worth noting that Alex Carter at NG-ESO says he'd quite like our web UI to display days from the last few years, and to automatically suggest "similar" days.
To help forecast individual PV power, and GSP power, find a set of "similar" periods from the history, and feed those "similar periods" into the ML model at inference time, along with the recent history.
For training to run as quickly as necessary, this probably requires that we can fit a representation of the history into RAM. (In production, we probably have time to load history from disk).
Could maybe use contrastive learning (#155) to get an encoder to map from, say, NWPs and satellite to a vector, such that vectors are similar when the resulting PV power is similar (and similarity of PV power could be judged with NMAE).