linwhitehat / ET-BERT

The repository of ET-BERT, a network traffic classification model on encrypted traffic. The work has been accepted as The Web Conference (WWW) 2022 accepted paper.
MIT License
384 stars 81 forks source link

High accuracy without pre-training #92

Open mob2125 opened 2 weeks ago

mob2125 commented 2 weeks ago

Hello, We did not pass the pre-trained_model_path parameter during fine-tuning, So the code (fine-tuning/run_classifier.py) initializes the model parameters randomly. We finetuned this model on the .tsv datasets(datasets/CSTNET-TLS 1.3) given and achieved a high accuracy of 96%. Is this expected??. If not what are we doing wrong.

linwhitehat commented 2 weeks ago

Hello, We did not pass the pre-trained_model_path parameter during fine-tuning, So the code (fine-tuning/run_classifier.py) initializes the model parameters randomly. We finetuned this model on the .tsv datasets(datasets/CSTNET-TLS 1.3) given and achieved a high accuracy of 96%. Is this expected??. If not what are we doing wrong.

Hi mob2125,

Thanks for using our codes!

If you get the desired results without using to a pre-trained model, we think it may depend on the difficulty of the traffic task and the training settings. And our ablation test in the ISCX-VPN-App dataset found a significant decrease in loss of pre-training.

We hope the answer can help you.

mob2125 commented 1 week ago

Hello, Thanks for the reply. We tried the same thing using the ISCX-VPN-App dataset given in the datasets folder. We first converted them into .tsv files using data_process/main.py and then fine-tuned a model without pretraining. It still achieved an accuracy of around 98-99%. Do you know where we are doing wrong??