openclimatefix / uk-pvnet-app

Application for running PVNet in production
MIT License
1 stars 5 forks source link

Limit how we infill the satellite data with NaNs #135

Open dfulu opened 5 days ago

dfulu commented 5 days ago

Currently, when the most recent satellite images are missing, we infill these timestamps with NaNs. This allows us to run models which use satellite data, but have been trained with dropout of the most recent satellite images.

Recently, the satellite feed went down for approximately 2 days. This means we were infilling 2 days worth of satellite images with NaNs in this step. Even though some of our models which don't use satellite should have been able to run the app failed. I think this was likely a memory error since our inference vm is small. In the logs there are messages such as:

[2024-10-09 20:50:44,566] {/app/pvnet_app/data/satellite.py:229} INFO - Filling most recent 1 days 11:20:00 with NaNs

We should avoid infilling the satellite data if the most recent satellite time stamp is older than some threshold. We could find this threshold from the models we intend to use by find the maximum delay that any one of these model was trained with. This would look like some of the logic we have here but perhaps it would be best to refactor for this new usage

peterdudfield commented 1 day ago

Or just add a maximum infill of x hours, like 2 hours, as we dont expect any model to have more of a delay than that. Could even do a maximum infill of 1 hour