opengridcc / opengrid-dev

Open source building monitoring, analysis and control
Apache License 2.0
26 stars 21 forks source link

Caching weather data #126

Closed JrtPec closed 7 years ago

JrtPec commented 8 years ago

I would like to cache DataFrames with weather data, because I only have 1000 requests to Forecast per day.

@saroele Is it an easy addition to the caching library to allow it to cache and update weather data?

JrtPec commented 8 years ago

@saroele I want to get rid of this part in caching.py:

# The df.index does not have a freqstr attribute for some reason. 
# verify the frequency manually
interval = np.round((df.index[-1] - df.index[0]).total_seconds()/(len(df.index)-1))
if not interval == 86400:
    print("Wrong frequency of the index: mean interval = {}s (instead of 86400)".format(interval))
    return False

Because the switch from/to DST induces days with one hour more/less, with intervals dat differ slightly from 86400...

JrtPec commented 8 years ago

Maybe I should just make a WeatherCache(Cache) class that is a bit more permissive.

But in principle I think all daily aggregates should be localised too, because right now we are generating 'daily' data that is in fact 2 hours behind the 'real days' in Belgium. So we have to account for DST, and that means that you can't just assume that all days have a fixed length...

If there is anything I hate more than timezones, it is DST.

saroele commented 8 years ago

Jan, to answer the first question: it should indeed be easy to add caching for weather data. For the second question: we need to check somehow if the dataframe that should be cached has a daily index. Do you see another way to check that?

JrtPec commented 8 years ago

'Daily' data does not have a daily index: midnight 2016-03-26 and midnight 2016-03-27 are only 23 hours apart...

saroele commented 8 years ago

The main reason for the check is that you don't accidentally try to merge hourly or monthly data in the daily dataframe that gets cached. So we can keep the idea of the check, and allow a margin of 1/24th ?

On Thu, Apr 21, 2016 at 3:01 PM, Jan Pecinovsky notifications@github.com wrote:

'Daily' weather data does not have a daily index: midnight 2016-03-26 and midnight 2016-03-27 are only 23 hours apart...

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/opengridcc/opengrid/issues/126#issuecomment-212907205

JrtPec commented 8 years ago
interval1 = np.round((df.index[-1] - df.index[0]).total_seconds() / (len(df.index)-1))
interval2 = np.round((df.index[-1] - df.index[0]).total_seconds() + 3600 / (len(df.index)-1))
interval3 = np.round((df.index[-1] - df.index[0]).total_seconds() - 3600 / (len(df.index)-1))

if not (interval1 == 86400 OR interval2 == 86400 OR interval3 == 86400):
    print("Wrong frequency of the index: mean interval = {}s (instead of 86400)".format(interval1))
    return False

You mean something like this? Because the margin can never be bigger than 1 hour.

saroele commented 8 years ago

yep, seems as explicit as it can get. Should not break the unittests either.

On Thu, Apr 21, 2016 at 3:33 PM, Jan Pecinovsky notifications@github.com wrote:

interval1 = np.round((df.index[-1] - df.index[0]).total_seconds() / (len(df.index)-1)) interval2 = np.round((df.index[-1] - df.index[0]).total_seconds() + 3600 / (len(df.index)-1)) interval3 = np.round((df.index[-1] - df.index[0]).total_seconds() - 3600 / (len(df.index)-1))

if not (interval1 == 86400 OR interval2 == 86400 OR interval3 == 86400): print("Wrong frequency of the index: mean interval = {}s (instead of 86400)".format(interval)) return False

You mean something like this? Because the margin can never be bigger than 1 hour.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/opengridcc/opengrid/issues/126#issuecomment-212921246