Open JackKelly opened 3 years ago
Not really, I'm still waiting on the Optimum Cloud masks, which have a lot more information and I think would be more helpful for the model. The binary ones didn't seem to make a difference when training the current models. I do think the other derived products could be helpful though, satpy doesn't support them yet though (https://github.com/pytroll/satpy/issues/1768) but they give more details on how likely clouds might form, etc.
We could use the binary ones for selecting the cloud pixels to use for the optical flow aux task though in https://github.com/openclimatefix/satflow/issues/85 though, so it could still be good to include? They are quite small and might be useful, e.g. if we get the optimum cloud masks and they turn out to be helpful, could use the basic cloud mask to help interpolate between the optimum cloud masks, which are every 15-45min I think?
Sounds good! In general, and consistent with your comment, I'd advocate for including too much data in the pre-prepared dataset, rather than too little (as long as it doesn't slow down training too much) :) Our ML models can always ignore data :)
@jacobbieker are you still using cloud masks when training ML models?! :)