nilmtk / nilm_metadata

A schema for modelling meters, measurements, appliances, buildings etc
http://nilm-metadata.readthedocs.org
Apache License 2.0
49 stars 47 forks source link

Proposal for a 'simple' NILM Metadata schema #16

Open JackKelly opened 9 years ago

JackKelly commented 9 years ago

NILM Metadata tries to make it possible to capture pretty much any conceivable scenario. But, as more datasets become available, it appears that a large proportion of datasets could be described using a simpler metadata schema. It would be great to discuss the design of a "Simple NILM Metadata" schema which could exist along side "NILM Metadata". Perhaps CSV is even easier to read than YAML in Matlab, Java etc so it might be nice if we can use CSV. We'd have a check-list to help people decide whether they require the full expressive power of "NILM Metadata" or if they can get by with "Simple NILM Metadata".

The simple schema could also be used for adding metadata to the output of disaggregation algorithms (hence helping to simplify NILMTK disaggregation algorithm implementation); and for describing the training dataset and the responses for any future NILM competition or validation tool (I'm working with a group of MSc students who aim to produce a proof-of-concept NILM validation tool by the end of this term; here's the project spec.)

So, here's an initial proposal, using REDD as an example:

building1_labels.csv

This looks a little like labels.dat in the REDD format except that:

meter instance, label

1,site meter
2,site meter
3,electric oven#1
4,electric oven#1
5,fridge#1

meter_devices.csv

We also need to specify what is measured in each data file. In NILM Metadata this is done in meter_devices.yaml. In "Simple NILM Metadata" this could be done in a meter_devices.csv files. The file would contain three columns; each row would be a <meter device name>,<key>,<value> tuple. e.g.:

meter device name,key,value

site meters,sample period,1
site meters,measurements,active power;apparent power
submeters,sample period,3
submeters,measurements,active power
submeters,model,eMonitor
submeters,manufacturer,Powerhouse Dynamics

The assumption would be that all meters with the label site meter would take attributes from site meters and all other meters would take attributes from submeters. If this is not the case (e.g. if there are several types of submeter) then we could do the following (and we'd only have to specify this for the meters for which the default assumption does not hold).

meter_devices_mapping.csv
building instance,meter instance,meter device name

1,1,Current Cost
1,2,SCPM

Any thoughts? If you use Matlab / Java / Scala / Julia / C++ etc, would you find it easier to load metadata described using CSV files rather than YAML files? If you maintain a dataset, is there anything in your own dataset that the proposal above cannot express?

JackKelly commented 9 years ago

Some feedback from Peter Davies over email:

My vote would be csv. Very universal and everyone understands it.

eleijonmarck commented 9 years ago

Hey, I would also prefer CSV files. It would be great to have a universal way of interacting with NILM data as it seems to just hit the market and start fresh.

JackKelly commented 9 years ago

cool, thanks for the reply! just to clarify: we do have the NILM Metadata schema already, which does work. It's just that it can be a little over-complex for simple domestic installations. Hence why we're considering building a more simple schema. Also, of course, we have NILMTK for loading and playing with NILM data.

Artform commented 9 years ago

+1 for CSV