Looking at an example of the physical dataset

shairoMt commented 3 years ago

At this ticket a small explanation of the dataset should be achieved

shairoMt commented 3 years ago

The table of the physical dataset has 6 blocks (or stages) and each block has its own sensors. In our case we need just the sensors of the first stage P1, because P1 is the target of the attacks. At the and we will have 5 numerical columns + the timestamp column that can be ignored (it plays the role of index). Here is an example of the needed data.

Timestamp	LIT101	MV101	P101	P102
22/12/2015 4:30:00 PM	124.3135	1	1	1
22/12/2015 4:30:01 PM	124.392	1	1	1
22/12/2015 4:30:02 PM	124.4705	1	1	1
22/12/2015 4:30:03 PM	124.6668	1	1	1

The dataset is divided on three csv files. The first two are normal data and will be used at training. The third one is with a series of attacks and will be used by testing. Because of the big size of the training dataset (about 631148 sample) It will not be easy to create new normalised data. So we will try to collect some statistical information about the samples, like calculating the mean, max, and min of each column and then use a normalisation function to map the parsed data from the csv on its normalised values before fitting them to the model. This step (mapping in order to normalise) can be done sample-wise or batch-wise.

AdnanMatini commented 3 years ago

Habe schon erlädigt

shairoMt / Anomaly_detection_at_SWaT

Looking at an example of the physical dataset #3