openclimatefix / nowcasting_dataset

Prepare batches of data for training machine learning solar electricity nowcasting data
https://nowcasting-dataset.readthedocs.io/en/stable/
MIT License
25 stars 6 forks source link

Add EUMETSAT cloud masks #83

Open JackKelly opened 3 years ago

JackKelly commented 3 years ago

@jacobbieker are you still using cloud masks when training ML models?! :)

jacobbieker commented 3 years ago

Not really, I'm still waiting on the Optimum Cloud masks, which have a lot more information and I think would be more helpful for the model. The binary ones didn't seem to make a difference when training the current models. I do think the other derived products could be helpful though, satpy doesn't support them yet though (https://github.com/pytroll/satpy/issues/1768) but they give more details on how likely clouds might form, etc.

jacobbieker commented 3 years ago

We could use the binary ones for selecting the cloud pixels to use for the optical flow aux task though in https://github.com/openclimatefix/satflow/issues/85 though, so it could still be good to include? They are quite small and might be useful, e.g. if we get the optimum cloud masks and they turn out to be helpful, could use the basic cloud mask to help interpolate between the optimum cloud masks, which are every 15-45min I think?

JackKelly commented 3 years ago

Sounds good! In general, and consistent with your comment, I'd advocate for including too much data in the pre-prepared dataset, rather than too little (as long as it doesn't slow down training too much) :) Our ML models can always ignore data :)