openclimatefix / ocf-data-sampler

A repo for sampling from weather data for renewable energy prediction
MIT License
4 stars 4 forks source link

Add a higher level function which wraps the dropout functions #17

Open dfulu opened 2 months ago

dfulu commented 2 months ago

This should help keep the code cleaner but also allow us to use simultaneous dropout when needed

Saswatsusmoy commented 1 month ago

@dfulu @peterdudfield

Can we approach this by creating a new function named apply_dropout and then

It takes the input tensor x and a dropout_rate (defaulting to a custom rate). It has boolean flags for each dropout type: spatial, channel, and temporal. It applies the corresponding dropout function if its flag is set to True.

So that we can use like

# Apply only spatial dropout
x = apply_dropout(x, dropout_rate=0.3, spatial=True)

# Apply both channel and temporal dropout
x = apply_dropout(x, dropout_rate=0.4, channel=True, temporal=True)

# Apply all three dropout techniques
x = apply_dropout(x, dropout_rate=0.5, spatial=True, channel=True, temporal=True)
dfulu commented 1 month ago

Hi @Saswatsusmoy, thanks for the message. So just to add some motivation, we intend our dropout to mimic what happens in a live production service. From our experience running live solar forecasts we know that dropout of the data sources only happens temporally. i.e. the satellite data feed goes down or a solar farm stops reporting its live generation temporarily. Therefore in our current dropout functions we only consider temporal dropout. We don't tend to see one channel from the satellite going missing or some geographic region going missing as you've described.

This is different to the kind of dropout that might be applied inside a neural network, where channel or spatial dropout makes sense to aid in model training. In this library, we simply try to mimic the input data that our model will see in a production setting.

This issue is just to wrap our two dropout functions into a single function which selects a dropout time and applies it immediately. We do want to keep the two individual functions since sometimes we want data sources to drop out simultaneous. For example we may use two different input products from the same satellite which dropout at the same time.

Hope that makes sense

Saswatsusmoy commented 4 weeks ago

@dfulu Understood....

It seems then this issue might not be of concern right now as it might occur sometimes that the data sources are required to dropout simultaneously. Better to close this issue as it might create some confusions later for new contributors like me.

Cheers

dfulu commented 4 weeks ago

Well we still do want to have a version of the function which wraps the other two. i.e. for the case where a data source drops out independently. This still needs to be done, so I'm going to leave this issue live. It just means this issue is quite small