nilmtk / nilm_metadata

A schema for modelling meters, measurements, appliances, buildings etc
http://nilm-metadata.readthedocs.org
Apache License 2.0
50 stars 47 forks source link

Missing control components #23

Open gjwo opened 9 years ago

gjwo commented 9 years ago

When considering components, I think you are missing some components that are incorporated in many appliances and radically effect the fingerprints produced by those components, namely the control components. These should be in the components module and built into common combinations for use into components. Some of these may may have built into appliances at a less fundamental level. The main ones are

There may also be sub classes of these such as

This would then give you

etc.

JackKelly commented 9 years ago

So, the schema does currently try to capture many of the control options you describe.

Appliance has a control attribute which can be set to some combination of {‘timer’, ‘manual’, ‘motion’, ‘sunlight’, ‘thermostat’, ‘always on’}

Dimmable lights are specified by adding a dimmer component. e.g. (adapted from Building1 of UK-DALE):

- type: light
  subtype: ceiling downlight
  original_name: kitchen_lights
  instance: 1
  meters: [8]
  components:
  - type: LED lamp
    count: 10
    manufacturer: Philips
    model: Dimmable MASTER LED 10W MR16 GU5.3 24degrees 2700K 12v
    nominal_consumption: {on_power: 10}
  - type: dimmer
    subtype: TRIAC
    number_of_dimmer_levels: 3

So, the things you suggested which are not yet handled by the schema are:

Also, now that I think about it, perhaps light dimmers should really go into Appliance:control rather than Appliance:components. And perhaps Appliance:control should really be a list of dicts rather than a list of strings, so we can specify more details of each control object (e.g. whether the controller is continuous or discrete). What do you think?

gjwo commented 9 years ago

Jack, I couldn't find the control attribute in appliance.yaml, which file is it in? I am coming at this more from an engineering perspective, where the closer the metadata resembles the construction of the real world, the better the model will be. So I think it is better to define the atomic components, and then group these components into the actual appliances. For example, my bathroom floor heater has 1 - heating element, 1 - thermostat and 1 - 7 day timer, whereas my oven has 5 elements 2 thermostats and 1 timer, but the elements are grouped 3 in the main oven, 2 in the top oven/grill ( I haven't yet worked out what are the legitimate combinations!). Similarly the washing machine has a pump (single speed) a drum motor with speed controller, a heating element with thermostat, and some electronics which includes a delay timer and sequence controller. Without sub meters, it is much more important to have an accurate model, this needn't be overly complex as common combinations can be built up from the atomic components for ease of use

JackKelly commented 9 years ago

I couldn't find the control attribute in appliance.yaml, which file is it in?

It's described in the docs, here: http://nilm-metadata.readthedocs.org/en/latest/dataset_metadata.html#appliance

I think it is better to define the atomic components, and then group these components into the actual appliances.

At the start of my PhD, I was very excited about trying to model appliances at the level of individual components. My aim was to have a library of parameterised component models (heating elements, motors, etc) and then appliances would be constructed of finite state machines. This approach is (briefly) described in my 2012 paper on Disaggregating Multi-State Appliances from Smart Meter Data. This is kind of like the approach physicists use to model a system: they start from equations which describe the behaviour of the system and combine these equations to model their system of interest. But I started to realise that, in 6-second data, it becomes very hard to see individual appliance components (e.g. in 1 second data you can see the overshoot-undershoot-stabilise pattern of power demand a motor shows when it's first turned on but you can't see that pattern in 10-second data).

I'm now much more interested in learning whole-appliance models from power data (both aggregate data and individual appliance data) rather than manually specifying appliance models. i.e. I'm now following a more 'machine learning' approach where we try to train a model on lots of data (tens of thousands of activations across as many appliances as possible).

In respect to the schema, I agree that we should try to model the real world. But I'm also eager to make the schema as easy as possible to use (it's currently too hard to use, IMHO). As far as I'm aware, you and I are the only people to make use of the components feature of the schema!

The schema does allow for components to be assembled into appliances. Does the schema work for your needs or is there a specific modification which would allow you to better express your model?

gjwo commented 9 years ago

Your 2012 paper pretty much describes my mindset, bearing in mind my data has a 1 second granularity. I think the metadata approach is better than the forms based approach I was using in Java as it is much more flexible. In terms of what would suit my needs, that requires a little more thought. I certainly find the multiple files more of a hindrance than a help as it's not always obvious where to look. Being unused to yaml I was expecting to find all the options in the yaml files rather than in the documentation, which is why I couldn't find the control attribute, I guess ultimately I would like to build the metadata interactively, or pull it from a manufacturer's database, but I guess both of those options are some way off. I will have a think and get back to you.

gjwo commented 9 years ago

When analysing many iterations of data, how much of the underlying patterns will come into better focus when the data is combined day upon day, even for larger granularity data?

JackKelly commented 9 years ago

When analysing many iterations of data, how much of the underlying patterns will come into better focus when the data is combined day upon day, even for larger granularity data?

Well, one important thing for most NILM work is how well our models generalise to unseen appliances. And, as a general rule, you want as many training examples as possible to help generalise. You want a model which distils the universal 'essence' of each appliance, rather than the quirks of each individual make and model.

gjwo commented 9 years ago

From my perspective as a consumer not a utility, what I would like is an accurate log of what was on when, for how long and how much power it consumed as a basis for aggregating by category, derived from a single point of electrical measurement, supported by data about what is in my house. Ideally I wouldn't want to model what is in an appliance, but pull that data from a manufactures web site (or preferably some central repository). In the absence of that I am happy to build the supporting data, and provide supervision to link electrical appliances to patterns of consumption.

As yet I am unsure how in the toolkit the metadata is used to support disaggregation, or how the linking of patterns to names is made (some guidance on this would be appreciated). I am also unclear if given the components, I could create new composite appliances using components: and parent: in my building metadata rather than in the central metadata?

I envisage a layering down approach, where the major appliances are identified first, then subtracted, allowing for further analysis if the residual power.

Ultimately this is about changing habits or appliances to optimise consumption.

To answer your earlier question, I would like to see all the atomic components, including any missing control components in a single components file. I would also like a way of describing the patterns of use to further aid recognition (in particular for components with a timer). I haven't seen how to use Prior or how training interacts with this, again some guidance would be helpful.

gjwo commented 9 years ago

I have spotted control in the space heater definition (which I used for underfloor heating) I can see that timer might be a bit too generalised, as there are many types such as

all of which result in different behaviours when viewed in the consumption data. Of course the most fundamental control is the power switch!

gjwo commented 9 years ago

Whether in control or components, it would be worth thinking about how you would determine an appliance's behaviour patterns by interrogating the appliance metadata. For example

So the question is what is the best way to represent this?

@oliparson you may have some thoughts on this thread as well as Jack & I

Thinking about this is what caused me to introduce human habits in my own schema, such as

Humans represent manual control components, but also have patterns based on their habits, so any schema should be capable of integrating this kind of control as well - This may lean things towards identifying control separately since a human is not a component of any particular appliance. I have suddenly though a device is controllable, by many means including (normally but not always) manually, so that should mostly be one of the control options

Now I understand how your metadata works, perhaps I should produce a schema for this in yaml, would this be of any use?

JackKelly commented 9 years ago

Hi @gjwo,

From my perspective as a consumer not a utility, what I would like is an accurate log of what was on when, for how long and how much power it consumed as a basis for aggregating by category, derived from a single point of electrical measurement, supported by data about what is in my house. Ideally I wouldn't want to model what is in an appliance, but pull that data from a manufactures web site (or preferably some central repository).

So... the aim of many NILM researchers (and certainly my aim) is to develop an algorithm which can estimate when each large appliance is on, and how much energy it uses each time. And to do this from a single meter which measures the whole home's energy demand. In terms of inputs to the algorithm, I'm most interested in algorithms which require no additional input from the user. i.e. all the algorithm requires is the aggregate power data and it will try to figure out which appliances are present, when they are on, and how much energy they have used.

As yet I am unsure how in the toolkit the metadata is used to support disaggregation

Pretty much all approaches that I am aware of learn almost everything from data (not metadata). Certainly none of the current crop of NILMTK algorithms use any of the metadata to guide disaggregation.

To take a step back: a large change in artificial intelligence over the last, say, decade is the observation that you achieve excellent performance (often state of the art performance) when you learn pretty much everything from the data rather than hand-engineering features. For example, the current best-performing approaches for image classification ("is there a dog in that image?") learn almost all the relevant features from the data. Same for automatic speech recognition. Even more strikingly perhaps, there is now good evidence that if you want to build a machine translation tool (e.g. which can translate from French to English) then you probably don't want to invest huge amounts of engineering effort hand-engineering parse trees etc. Instead you should learn the entire thing from data. (If you want a good overview then take a look at the series of articles Nature Magazine published on May 27th on AI).

My own research (which I suspect echos most peoples' research on this stuff) is focussed on learning as much as possible from data. That certainly includes the fine-grained 'signatures' of each appliance, as well as the longer-term temporal patterns (such as which appliances will be switched on when the family wakes up and starts cooking breakfast).

how the linking of patterns to names is made (some guidance on this would be appreciated)

During training, we expose the disaggregation algorithm to labelled training data. This is usually in the form of individual appliance traces, along with their name. Hence the algorithm can learn one model per appliance name. During 'test' time, the algorithm tries to match its models to the aggregate data.

Zooming out again, you might ask 'why bother with all this metadata if no NILM algorithms yet make use of the metadata'. Basically, I wanted NILM Metadata to be able to capture as much information as possible, even if there aren't yet uses for that data. Plus I wanted NILM Metadata to be of use beyond NILM (so maybe calling it 'NILM Metadata' was a mistake!). NILMTK certainly makes use of some of the metadata to group appliances.

I envisage a layering down approach, where the major appliances are identified first, then subtracted, allowing for further analysis if the residual power

I think we mostly try to detect maybe the top 5 appliances (in terms of energy consumption).

I haven't seen how to use Prior

I'm not aware of any examples of Prior yet. My intention was that it would be used for defining probability distributions expressing, for example, when each appliance would be used each day (e.g. a toaster would be most likely to be used in the morning). These priors would mostly be learnt from data, not manually specified.

Humans represent manual control components, but also have patterns based on their habits, so any schema should be capable of integrating this kind of control as well

Again, I'd have a strong preference (in my own research) for learning habits from data, rather than manually defining it in a schema.

So, to zoom out again... my current feeling about NILM Metadata is that it's actually over complicated for 90% of the cases I can think of. All the interesting detail that you describe (automatic control systems, continual versus discrete control, human habits etc) should be largely learnt from data, IMHO. If you really want a schema which describes lots of detail about human habits, control systems etc then it might be best to fork NILM Metadata and then you'd be free to pull your fork in whatever direction you want.

perhaps I should produce a schema for this in yaml, would this be of any use?

That's very kind. When I first started work on NILM Metadata, I carefully created a formal schema using JSON Schema but it was a lot of effort to make large structural changes etc. Now that the schema is vaguely stable, it should be easier. But I'd still suggest that we should probably let the (informal) schema settle down a bit, especially while we're considering new features and a new simplification; and then it might be nice to re-try writing a formal schema. I made some notes a while ago about this.

gjwo commented 9 years ago

I get the point about the AI / big data approach, and I have seen the strides google translate and others have made using that approach. I think what we have here is a bootstrap loader issue (and yes I am old enough to remember the whole switches, paper tape, disk sequence)! i.e. if we had enough knowledge from the data we wouldn't need metadata to help, but unfortunately we are not starting from there.

During training, we expose the disaggregation algorithm to labelled training data. 
This is usually in the form of individual appliance traces, along with their name. 
Hence the algorithm can learn one model per appliance name. During 'test' time, 
the algorithm tries to match its models to the aggregate data.

How do you enter your labelled training data?

Perhaps were we can come together on this is that I can think of metadata including habits as a way of boxing where a particular signature might be in the data in the same way you are using sub metering, once that signature has been found and labelled (in order to interact with a human user) the code can supersede the metadata with discovered real data.

in either case without an underlying database of previously recognised and named appliances, there has to be some kind of interaction with a user to name appliances that have been found.

JackKelly commented 9 years ago

without an underlying database of previously recognised and named appliances

Ah, sorry, I forgot to mention. There are now over 10 public databases of labelled domestic electricity data. Some are quite large (I think Pecan street has something on the order of 1000 homes). The aim would be to create NILM models which can generalise across houses. i.e. generalise to houses where we don't have labelled training data. The end result would be that you'd only have to squirt your aggregate data to the system and it would magically know which appliances are in there. (How achievable this is is still a matter for research!)

oliparson commented 9 years ago

@gjwo you might be interested to read a bit about unsupervised/semi-supervised learning in the context of energy disaggregation. I recently wrote a blog post about some confusion with this definition, and also a paper on how we can learn generalisable models from databases of sub-metered data and apply it to homes with only aggregate data.

gjwo commented 9 years ago

@oliparson Thanks, I had seen the blog post, I am not sure if I had seen the paper (having read dozens recently) I will read it.

gjwo commented 9 years ago

@JackKelly Thanks I have found the lists in one of the @oliparson blogs http://blog.oliverparson.co.uk/2012/06/public-data-sets-for-nialm.html , not sure if any of these are UK based or applicable to the UK, or if I could access them, or if I could how to apply them in this context. But I had always envisaged that such things should be available from the manufacturers or certifiers of electrical equipment. Again we are probably a couple of years off this being a solution.

Still this does help generalise the issue, there are many places that could source labelled training data, perhaps the toolkit ought to be able to take it from any of these places:

JackKelly commented 9 years ago

these are all good ideas. I suspect we'll get enough data from the first two items on your list. Might not need detailed 'human supplied' data. Not unless it can scale to 100s or 1000s of homes :)