Interesting Paper - - Githubissues

balajikalluri commented 8 years ago

Hello SerVal team,

Your work is indeed interesting. In the recent past, I have used SAX-VSM on my time-series dataset and it was useful in classifying instances of time-series embedded with recurring events (motifs) which are local to them. However, it fails to faithfully recognize those instances which lack distinct events (e.g. more steady level).

I was wondering if DSCo could help in such conditions in faithful classification?

On that note, I wanted to apply your model onto my dataset but I don't see a clear documentation/reference on where to start (e.g. where & how to feed by TS dataset)?

Any quick help would be much appreciated.

Best, KMB

daoli commented 8 years ago

Hello,

Thanks for your interest. I'm not sure about the scenario you've mentioned, but I encourage to try it out with DSCo.

I also acknowledge that the current documentation is very limited, but to use it, just look at the following scripts:

prepare_datasets.py: it converts time series into sax strings.
build_corpus.py: it builds language models and save them on disk
segmentation.py/.sh: it calculates fitness scores
evaluation.py: it assigns class labels and generates classification reports

On the other hand, I'm working to further improve DSCo and more documentation will be available in the coming weeks.

Cheers, Daoyuan

balajikalluri commented 8 years ago

Hi Daoyuan,

Appreciate your earnest and kind response my friend.

BTW in connection with your N-gram Language Modeling for appliance electricity usage profiling, let me re-frame my question to you again.

How does your model capture/detect steady-states of appliances as well? As far as my understanding and experience with using SAX for feature extraction and characterizing time-series appliance energy data is concerned, it is VERY MUCH PROMISING in capturing those distinct signature patterns that are recurring in the input energy signature (e.g. periodic compressor cycles in refrigerator use). However, I don't think it can equally reliably mode/capture an appliance operation in its steady-state (e.g. an LCD monitor which is either ON or OFF) as their energy signatures won't exhibit any kind of distinct patterns.

Does your DSCo distinguish these two appliance states from input time-series energy signatures?

Also I would like to know what is the nature of your input time series dataset? Is it a matrix of rows & columns with each row being a labelled energy signature ? Pls throw some light on it.

Cheers & look forward.. KMB

balajikalluri commented 8 years ago

A quick couple of question from your N-gram language modelling paper:

Understanding that several training time series dataset exhibit distinctly dissimilar energy signature patterns...how would you make an appropriate choice of SAX parameters such sliding window size(w), PAA size(p) & alphabet size(a)? It is cumbersome to do it manually. How does your model deal with this scenario? Is it done manually or automatically ? If automatic - any specific optimizing algorithms for parameter selection used?
The way you compute the likelihood of an unknown symbolized energy data to matching one of those trained appliance profiles (M) is described through a mathematical equation (4) and (5) in your paper. I really don't get to understand the logic behind this. Could you pls. clarify on this? Additionally, can you also refer to me appropriate lines in your code which implement this part?

Thanks in anticipation.

Best, KMB

daoli commented 8 years ago

Hello,

In our appliance profiling paper, the readings span a specific period (e.g. 200 minutes), and we focus on readings that showed at least some variations. So as long as appliances are consuming energy (turned on), their electric readings will vary over time and then we can profile them. In this sense there is no such "steady phase" from our perspective.

In our MLDM paper, we showed with three different setups that the sliding window size does not have a big impact on the overall classification accuracy. But you're right that the alphabet size is difficult to come to a perfect one. We have contacted the author of SAX and was told this is indeed a problem. They usually go with certain arbitrary alphabet size or use some optimization methods. But in our case, this parameter has not really been tackled with. We're still working to find a good optimization method.

For the calculation of fitness scores, please refer to this page: http://norvig.com/ngrams/. Our algorithm is a modified version of the code provided by P. Norvig.

Hope this helps.

Cheers, Daoyuan

daoli commented 8 years ago

Hi,

I've update the repo for a simplified and more accurate version of DSCo. Also, documentation is a bit more detailed. Please take a look and feel free to comment on the new version. Thanks!

https://github.com/serval-snt-uni-lu/dsco/tree/v2.0-ng

Cheers, Daoyuan

serval-snt-uni-lu / dsco

Interesting Paper - #1