Open shairoMt opened 3 years ago
The table of the physical dataset has 6 blocks (or stages) and each block has its own sensors. In our case we need just the sensors of the first stage P1, because P1 is the target of the attacks. At the and we will have 5 numerical columns + the timestamp column that can be ignored (it plays the role of index). Here is an example of the needed data.
Timestamp | FIT101 | LIT101 | MV101 | P101 | P102 |
---|---|---|---|---|---|
22/12/2015 4:30:00 PM | 0 | 124.3135 | 1 | 1 | 1 |
22/12/2015 4:30:01 PM | 0 | 124.392 | 1 | 1 | 1 |
22/12/2015 4:30:02 PM | 0 | 124.4705 | 1 | 1 | 1 |
22/12/2015 4:30:03 PM | 0 | 124.6668 | 1 | 1 | 1 |
The dataset is divided on three csv files. The first two are normal data and will be used at training. The third one is with a series of attacks and will be used by testing. Because of the big size of the training dataset (about 631148 sample) It will not be easy to create new normalised data. So we will try to collect some statistical information about the samples, like calculating the mean, max, and min of each column and then use a normalisation function to map the parsed data from the csv on its normalised values before fitting them to the model. This step (mapping in order to normalise) can be done sample-wise or batch-wise.
Habe schon erlädigt
At this ticket a small explanation of the dataset should be achieved