Pre-train PerceiverIO - Githubissues

openclimatefix / satflow

Satellite Optical Flow with machine learning models

https://satflow.readthedocs.io/en/stable/

MIT License

61 stars 10 forks source link

Pre-train PerceiverIO #85

Open jacobbieker opened 3 years ago

jacobbieker commented 3 years ago

Various ideas

Some from @JackKelly:

[ ] Pretrain predicting next frame from past two
[ ] Simulated clouds/optical flow
[ ] Try AutoFlow like in Perceiver paper

JackKelly commented 3 years ago

Sounds good!

A related trick up our sleeves would be to train on the ~10 years of data available from EUMETSAT: https://github.com/openclimatefix/nowcasting_dataset/issues/81

(Training on more data isn't exactly "pre-training" :) But it might be worth trying. What do you think the priority should be: training on ~ 10 years of data; or pre-training using 'auxillary' tasks? Although it'll likely take a while to download & prepare ~10 years of data, so maybe we should get that going 'in the background' soonish?)

jacobbieker commented 3 years ago

I think yeah, getting it started in the background would be good, having all that data could also help if we want to try the similarity idea mentioned here https://github.com/openclimatefix/satflow/issues/65, I think the extra data is probably a higher priority, but while that's running, trying the auxiliary tasks would be helpful.

For the simulated clouds/optical flow, more data could also help with getting real clouds that we could possibly "copy/paste" for the simulated optical flow? As in, get the cloud pixel values by subtracting the base ground data for real clouds, save out those clouds, and then paste random combos or crops of those clouds and generate the optical flow from that?

JackKelly commented 3 years ago

get the cloud pixel values by subtracting the base ground data for real clouds, save out those clouds, and then paste random combos or crops of those clouds and generate the optical flow from that

Sounds great to me!

JackKelly commented 3 years ago

I think the extra data is probably a higher priority

Cool, in our next meeting we can chat a bit about getting more data! I agree, it feels like a priority to grab more data!

jacobbieker commented 2 years ago

The HuggingFace PerceiverIO has the weights for optical flow task and others, so we can use that and then pre-train some more on the historical satellite imagery