munhouiani / Deep-Packet

Pytorch implementation of deep packet: a novel approach for encrypted traffic classification using deep learning
MIT License
183 stars 56 forks source link

about:create_train_test_set.py #17

Closed Apollo0801 closed 2 years ago

Apollo0801 commented 2 years ago

Why set _testsize= 0.2 in _create_train_testset.py, but the resulting data set (training set: Test Set) is not (8:2). Moreover, the sample size of the test set is much higher than that of the training set.

munhouiani commented 2 years ago

Because of undersampling.

We first split the entire set into train and test at lines 54 to 65. The ratio of train and test should be around 8:2.

At line 68, if under_sampling_train is set to True, we balance the train set by undersampling.

That is the reason why the final train set is smaller than test set.

Apollo0801 commented 2 years ago

Because of undersampling.

We first split the entire set into train and test at lines 54 to 65. The ratio of train and test should be around 8:2.

At line 68, if under_sampling_train is set to True, we balance the train set by undersampling.

That is the reason why the final train set is smaller than test set.

Thank you very much for your answer.