Open jacobbieker opened 3 years ago
Sounds good!
A related trick up our sleeves would be to train on the ~10 years of data available from EUMETSAT: https://github.com/openclimatefix/nowcasting_dataset/issues/81
(Training on more data isn't exactly "pre-training" :) But it might be worth trying. What do you think the priority should be: training on ~ 10 years of data; or pre-training using 'auxillary' tasks? Although it'll likely take a while to download & prepare ~10 years of data, so maybe we should get that going 'in the background' soonish?)
I think yeah, getting it started in the background would be good, having all that data could also help if we want to try the similarity idea mentioned here https://github.com/openclimatefix/satflow/issues/65, I think the extra data is probably a higher priority, but while that's running, trying the auxiliary tasks would be helpful.
For the simulated clouds/optical flow, more data could also help with getting real clouds that we could possibly "copy/paste" for the simulated optical flow? As in, get the cloud pixel values by subtracting the base ground data for real clouds, save out those clouds, and then paste random combos or crops of those clouds and generate the optical flow from that?
get the cloud pixel values by subtracting the base ground data for real clouds, save out those clouds, and then paste random combos or crops of those clouds and generate the optical flow from that
Sounds great to me!
I think the extra data is probably a higher priority
Cool, in our next meeting we can chat a bit about getting more data! I agree, it feels like a priority to grab more data!
The HuggingFace PerceiverIO has the weights for optical flow task and others, so we can use that and then pre-train some more on the historical satellite imagery
Various ideas
Some from @JackKelly: