opengridcc / opengrid-dev

Open source building monitoring, analysis and control
Apache License 2.0
26 stars 21 forks source link

CSV's for older data #48

Closed JrtPec closed 9 years ago

JrtPec commented 9 years ago

I'm trying to make some graphs using daily data for an entire year. TPMO doesn't have data from before November 2014, and for some reason CSV's are also only available for the last few months... Can anybody help me obtaining this data?

saroele commented 9 years ago

I don't think we have detailed data for a full year yet. There have been issues with the job that pulls the data from the Flukso server. If you don't need minute data, maybe you can still query the Flukso server for a full year of hourly values? I don't know how long they are stored.

But I guess you want minute values, and if that's the case, I'm afraid we don't have them.

On Thu, Feb 26, 2015 at 2:36 PM, Jan Pecinovsky notifications@github.com wrote:

I'm trying to make some graphs using daily data for an entire year. TPMO doesn't have data from before November 2014, and for some reason CSV's are also only available for the last few months... Can anybody help me obtaining this data?

— Reply to this email directly or view it on GitHub https://github.com/opengridcc/opengrid/issues/48.

JrtPec commented 9 years ago

For this experiment daily data is fine.

JrtPec commented 9 years ago

I just found an older folder with CSV's that had synchronised on 5/02/2015. One sensor had data since 18/10/2014. I ran the synchronise script today and that same sensor returns only data since 23/12/2014.

So I guess somewhere in the fluksoapi is a limitation on the amount of data returned, but I can't find it.

Ryton commented 9 years ago

The availability of Tmpo data depends on the FLM firmware update. TMPO data is only available from firmware version r244 onwards. Most opengrid FLMs where upgraded in end october 2014, see #15 (18-23/10/2014). Maybe this sensor may have been upgraded later? The csv data was generated from daily api calls to the flukso server, before TMPO was online.

About your question for a 1year dataset at 1 min resolution: You could contact @gebhardm . He has developed (and ran) a raspberry pi script which logs flukso data every minute, about 1.5 years ago (see https://github.com/gebhardm/flmdisplay ).

JrtPec commented 9 years ago

I found the problem. Seems like something went wrong when downloading/handling the zip files (I was on the train, so I blame bad connection), apparently the synchronise script doesn't download older zips when there are already newer ones in the folder (or something...). The fix was clearing all zips and csv's and downloading everything again (on a stable connection).

For this experiment I'm only looking at 1 day resolution, over an as long period as possible. The biggest set I could download looked like this: screen shot 2015-02-27 at 11 23 14 There seems to be no data logged between april 2014 and july 2014, on all sensors. Does somebody have a way to get this missing data?

saroele commented 9 years ago

The missing data is due to a server shut-down that we did not notice: we currently do not have this data in opengrid. However, if you need only daily totals, I guess the flukso server still has these for a total year.

You can try this out yourself by querying the flukso server for daily data and see how far back you can get. Let me know if you don't immediately know how to do this query, I can point you to the right documentation and example code.

JrtPec commented 9 years ago

I know how to do the query via curl, but it would spare me a lot of work if you have some code on how to do it in python, even more so if it returns a DataFrame, I really don't feel like digging up JSON parsing etc.

saroele commented 9 years ago

In that case, I have good news for you :) Have a look at the module fluksoapi.py, here https://github.com/opengridcc/opengrid/blob/develop/library/fluksoapi.py

The function pull_api() does the http posting and returns the answer object. The function parse() converts a json object to a timeseries. You have to convert the answer to a json first.

We use these functions in the job to extract all data from the fluksoserver and store it in csv's. So you can get an idea how this is done here: https://github.com/opengridcc/opengrid/blob/develop/scripts/extract_flukso_api.py

I don't think we tested other resolutions than a minute, so you may have to search a bit to get it working for a day. Check the flukso documentation for all options.

Good luck, and let us know if this works! roel

JrtPec commented 9 years ago

This did it:

#request parameters
type_ = 'gas'
unit = 'watt'
interval = 'year'
resolution = 'day'

#get sensors from houseprint
sensors = hp.get_all_fluksosensors()

series = []
for flukso_id in sensors.keys():
    for sensor_id, s in sensors[flukso_id].items():
        if s is not None and s:
            if s['Type'] != type_: continue
            #pull api
            r = fluksoapi.pull_api(sensor=s['Sensor'], token=s['Token'], unit=unit, resolution=resolution, interval=interval)
            #parse answer
            ts = fluksoapi.parse(r)
            #filter 'nan' strings :-/
            ts = ts[ts != 'nan']
            #set column name
            ts.name = s['Sensor']
            series.append(ts)
#make DataFrame
df = pd.concat(series, axis=1)
df = df.convert_objects(convert_numeric=True)
df = df.dropna(axis=1, how='all')
#Set datetimes to midnight (useful if it gets joined with other series)
df.index = pd.DatetimeIndex(df.index).normalize()
saroele commented 9 years ago

Great, glad it worked!

On Thu, Mar 5, 2015 at 4:53 PM, Jan Pecinovsky notifications@github.com wrote:

This did it:

type_ = 'gas' unit = 'watt' interval = 'year' resolution = 'day'

sensors = hp.get_all_fluksosensors()

series = [] for flukso_id in sensors.keys(): for sensor_id, s in sensors[fluksoid].items(): if s is not None and s: if s['Type'] != type: continue r = fluksoapi.pull_api(sensor=s['Sensor'], token=s['Token'], unit=unit, resolution=resolution, interval=interval) ts = fluksoapi.parse(r) ts = ts[ts != 'nan'] ts.name = s['Sensor'] series.append(ts) df = pd.concat(series, axis=1) df = df.convert_objects(convert_numeric=True) df = df.dropna(axis=1, how='all') df.index = pd.DatetimeIndex(df.index).normalize()

— Reply to this email directly or view it on GitHub https://github.com/opengridcc/opengrid/issues/48#issuecomment-77389007.