patrickzib / SFA

Scalable Time Series Data Analytics
GNU General Public License v3.0
312 stars 67 forks source link

WEASEL+MUSE: Datasets Format #26

Open murometz opened 6 years ago

murometz commented 6 years ago

Hallo

Thank you for your fascinating work! Which structure have the datasets in the datasets folder? They don't look like original UCI datasets... What are the columns?

For example, DigitShapeRandom: 1 1 1 0.3421972205305417 -1.594004942648406 1 2 1 0.3490627644881473 -1.4250704156116172 1 3 1 0.3353316765729366 -1.2647991976536381 1 4 1 0.3559283084457524 -1.0915330160774444 1 5 1 0.3559283084457524 -0.9225984890406556 1 6 1 0.35249553646694987 -0.7709905801614865 1 7 1 0.3490627644881473 -0.5847294349670782 1 8 1 0.37309216833976566 -0.4634431078637429 1 9 1 0.37309216833976566 -0.35515174437862146

First column seems to be the class. And the rest?

Thank you very much for your time.

Regards Ilja

patrickzib commented 6 years ago

thank you. that is true. the dataset format is based on Multivariate Time Series Classification Datasets .

The first column is the sample id, the second column is the time stamp of the observation, the third column is the label for the sample (it may not change for the same id), the last columns are the observations (different dimensions of the multivariate time series).

An example:

Sample Id Time Stamp Class Pressure Temperature Energy
1 1 1 2.70 80.50 4.50
1 2 1 3.20 78.40 6.70
1 3 1 4.20 67.90 3.40
1 4 1 8.20 89.50 7.20
1 5 1 8.90 85.70 5.70
2 1 3 16.34 97.54 5.02
2 2 3 17.61 99.66 5.01
2 3 3 18.87 101.60 4.90
2 4 3 20.14 103.54 4.95
2 5 3 22.67 107.43 4.95
2 6 3 21.15 106.50 4.97
.. .. .. .. .. ..
N 1 0 8.90 85.70 5.70
N 2 0 10.01 88.00 5.05
N 3 0 11.28 89.94 5.04
murometz commented 6 years ago

Hi Patrick

Thank you very much for the fast replay, it helps a lot!

The time series index is just a temporal order of events?

Timeseries id - e.g. different sensors, right?

Thanks again. Best regards Ilja

patrickzib commented 6 years ago

Hi Ilja,

yes, the time index aka time stamp is the temporal order of the events.

No, the time series id is the sample id. Sample 1 could be Berlin, sample 2 could be Paris and sample n is London. Each one has 3 sensors for temperature, pressure and energy.

So, there is no explicit sensor id. it is implicitly coded in the last columns.

murometz commented 6 years ago

Hi Patrick

Great, thanks a lot.

Regards Ilja

murometz commented 6 years ago

Hi Patrick

the third column is the label for the sample (it may not change for the same id)

This means that I can't have different classes for one sample ID?

If I want to detect different activities with several sensors sets, which are installed on different locations, I would have per location (with its sample ID) different classes.

I have one sample as of now, but it contains different classes.

How the data should be constructed in this case?

Thank you very much for your time!

Best regards Ilja

patrickzib commented 6 years ago

I am not sure what you mean by "I have one sample as of now, but it contains different classes." Do you mean one person performing different activities?

A sample can be though of a single recorded activity, similar to a primary key in a database. So for example: Sample 1: Person A jumps. Sample 2: Person A sits. Sample 3: Person A eats. Sample 4: Person A walks.

Here we have a single person doing multiple activities. Each sample can then have multiple sensors attached to it like wrist, finger, arm, etc.

patrickzib commented 6 years ago

But we could as well have different persons (A,B,C) doing different activities:

Sample 1: Person A jumps. Sample 2: Person A sits. Sample 3: Person A eats. Sample 4: Person A walks. Sample 5: Person B jumps. Sample 6: Person C jumps.

murometz commented 6 years ago

Hi Patrick Thank you. I have a set of sensors which is installed in one apartment. This sensor set records different activities, as you mentioned. I already have these activity classes assigned to different timesteps (from protocol) and want to train model to detect these activities just from sensor data. I also would like to know whether the person is doing something which can be regarded as anomaly. The entire record is not separated in samples.

patrickzib commented 6 years ago

I see. This sounds like a multi-label classification problem?

Unfortunately, MUSE does not support this kind of application, yet. Are you able to share this data in some way? I would be interested to look into it, though I can not guarantee how fast I will be able to do so.