Closed HF7weatherman closed 5 days ago
Hi @HF7weatherman. I think it is worth the effort if it is affecting us in effectively generating the reports. We can perhaps keep the changes limited by using a cache (https://docs.python.org/dev/library/functools.html#functools.lru_cache) in functions such as hifs.get_forecast
and _get_rolling_daily_climatology
.
Hi @HF7weatherman. I think it is worth the effort if it is affecting us in effectively generating the reports. We can perhaps keep the changes limited by using a cache (https://docs.python.org/dev/library/functools.html#functools.lru_cache) in functions such as
hifs.get_forecast
and_get_rolling_daily_climatology
.
This could be done by decorating as follows:
@lru_cache
def get_forecast(
self,
variable: str,
briefing_time: pd.Timestamp,
lead_hours: str,
current_time: pd.Timestamp,
forecast_type: str = "oper",
differentiate: bool = False,
differentiate_unit: str = "s",
) -> tuple[pd.Timestamp, xr.DataArray]:
...
This would keep a cache with the timestamp and dataarray generated on previous calls and return them if the arguments are the same.
Hi @sortega87! I wasn't aware that python offers such a cache possibility. It seems like that's exactly what we need in our case and thus I agree that we should definitely implement it. Just as a conformation for urgently implementing this change, it took more than an hour for @karstenpeters to generate the briefing this morning...
One question regarding the implementation: You wrote
This would keep a cache with the timestamp and dataarray generated on previous calls and return them if the arguments are the same.
In the case of _get_forecast()
, the arguments wouldn't be exactly the same since lead_hours
changes with each call. Do you know whether it would be possible to keep the whole forecast in the cache? This would then further reduce the necessity to reload data. If I see it correctly, _load_forecast_dataset()
loads all lead times, so perhaps it would make sense to apply @lru_cache
before that function?
And one last question: Who of us should implement it?
Hi @HF7weatherman. Actually, now I think neither _load_forecast_dataset
nor get_forecast
loads the data. Xarray should load the data just before the plots are generated. Thus, we must also force xarray to return a in memory dataset to effectively implement the cache: this is done by calling dataset.load()
or dataset.compute()
.
One problem with _load_forecast_dataset
is that the dataset it loads includes all variables and timestamps. Thus, including the cache in _load_forecast_dataset
and forcing it to actually load the data might result in slower times. We can however modify it to just load the variables that will be used in the briefings.
I can implement these changes once in Barbados. Is this a good timeline for the implementation?
Yes, I agree in that case it would be nice to at least implement the storage of the specific variables in memory. Doing that change on Barbados is totally sufficient. Today was the last official briefing and they won't resume before the 7th on Barbados. So implementing this before the 10th of September sounds like a reasonable timeline for me.
Furthermore, with these changes we also partly address #144. The briefing might become fast enough that #144 isn't necessary anymore.
From #160:
@sortega87 Do you thin this would make sense and is worth the effort? I think it would change the current infrastructure of the briefing quite a lot, so we should put some thought into it before we start implementing such changes... We can also discuss in person on Barbados before we start working on this.