Think about making metadata flatter

Some problems with the present design : the metadata is hard to navigate manually; what happens if we want to combine objects from multiple houses (e. G. With group by); also we have this really ugly problem where we repeat meter metadata even though most meters are the same (we have to do this because we can't inherit from objects within the actual metadata).

Each object (meter, appliance, building etc) would have an ID like "/REDD/building1/utility/electric/fridge1" we could also have "/UK-DALE/prototypes/meters/CurrentCostEnviR" which each instance in "/UK-DALE/buildingX/utility/electric/meterX" would reference (not inherit from)

one advantage of the flat setup is that dataset objects no longer need to be aware of the 'type' of the objects they contain. Instead they are completely ignorant of what they 'contain' and instead the contained items just reference the container.

Advantages of splitting metadata into smaller chunks:- easier for humans to parse. Don't want to have to scroll up and down a long way; want entire object to fit on screen. Also means schema for objects can be totally ignorant of objects down the hierarchy. The problem is that we have two representations: a hierarchical representation in memory and a flattish repr on disk. But maybe that's fine. The repr on disk isn't actually flat (because we use lots of refs)

get rid of manufacturer data from NILM Metadata; i.e. appliance metadata in dataset just uses appliance name as key and then supplies all model-related info inplace.


- type: dataset
  id: /REDD
  name: Reference Energy Disaggregation Dataset

- type: building
  id: /REDD/buildings/1
  geo_location: London

- type: electric
  id: /REDD/buildings/1/electric

- type: appliance
  id: /REDD/buildings/1/electricity/appliances/fridge,1
  manufacturer: Bosch
  meters: [4, 5]

or:


/REDD:
  type: dataset
  name: Reference Energy Disaggregation Dataset

/REDD/buildings/1:
  type: building
  geo_location: London

/REDD/buildings/1/electric
  type: electric

/REDD/buildings/1/electricity/appliances/fridge,1:
  type: appliance # or appliance/fridge? or fridge?
  manufacturer: Bosch
  meters: [4, 5]

or:


# The semantics will say that all objects in the root are dataset objects
/REDD:
  name: Reference Energy Disaggregation Dataset

# The semantics know how to interpret the slashes,
# (i.e. as a hierarchy) and know that 'buildings' contains
# building objects
 /REDD/buildings:
  1:
    n_occupants: 5

/REDD/buildings/1/rooms:
  kitchen,1:
  lounge,1:
  bedroom,1:
    description: master bedroom
  bedroom,2:
    description: eldest child's bedroom
  study,1:
    description: also used as a spare bedroom

/REDD/manufacturers:
  CurrentCost:
    url: 
    contact:

/REDD/meters:
  EnviR:
    manufacturer: /REDD/manufacturers/CurrentCost
    sample_rate: 6

/REDD/buildings/1/utilities/electric/meters:
  1: 
    parent: /REDD/meters/EnviR
    site_meter: true

  2: 
    parent: /REDD/meters/EnviR
    submeter_of: 1

/REDD/buildings/1/utilities/electric/appliances:

  fridge,1: # using the appliance name as the key will make it more readable
    meters: [5]
    room: kitchen

  television,1:
    meters: [6]
    room: kitchen
    manufacturer: blah
    model: foo

  television,2:
    parent: /REDD/buildings/1/electric/appliances/fridge,1
    meters: [7]
    room: bedroom,1

priors could be separate objects . e.g.

# priors.yaml
---!Priors
- subject: fridge,1
  variable: on power
  dataset: REDD # optional
  data: foo
  model: blah

- subject: fridge
  variable: on power
  data: bar
  model: foo

pros and cons? One pro is that we can easily keep 'appliance label data' (manufacturer etc) separate from measured data (priors).

Also, perhaps we should use separate files for different parts of the dataset e.g. 'redd.yaml', 'redd_building1.yaml', 'redd_building2.yaml' etc. But the software shouldn't enforce the filenames. Instead it loads all the files in the 'metadata' folder.

Concatenation is ugly. Let's stop doing that! Instead we look up category later.

Meter metadata: should be shipped with the dataset YAML file. There's little (no?) overlap in meters used in the datasets I'm aware of.

nilmtk / nilm_metadata

Think about making metadata flatter #3