openclimatefix / nowcasting_dataset

Prepare batches of data for training machine learning solar electricity nowcasting data
https://nowcasting-dataset.readthedocs.io/en/stable/
MIT License
24 stars 6 forks source link

Create a proportion of examples without PV data, outside the UK #93

Open JackKelly opened 3 years ago

JackKelly commented 3 years ago

We currently only have PV data for the UK. We will at some point want to get PV data for elsewhere but, in the meantime, we'll need nowcasting_dataset to optionally output examples from outside the UK (to train the "image prediction" part of the model on the entire geospatial extent of the satellite imagery).

Maybe we should create two sets of batches on disk: one set which always has PV data (and is over the UK), and another set which is always from outside the UK (and doesn't have PV). Then the ML training script can mix-and-match examples on the fly to vary the ML training curriculum. To keep each batch ballanced, the ML training script will need to load at least two batches at once from disk (one with PV data, the other without) and create a single batch with a mixture of examples.

peterdudfield commented 2 years ago

Not sure this is essential. For WP1 there are some GSP where there are very few / No PV systems i.e Scotland

JackKelly commented 2 years ago

@jacobbieker in order to train your models in SatFlow, do you think it's essential for the dataset to include training examples from outside the UK? (these examples wouldn't have any PV data yet...)

jacobbieker commented 2 years ago

Its probably not essential, and for a model that will primarily be focused on the UK for now anyway, it probably doesn't matter as much!

peterdudfield commented 2 years ago

Ill remove this from the NG project. Just to keep things really high priority in there

JackKelly commented 2 years ago

This could probably be done as part of #202