Regarding the removal of ethernet , IP headers, port numbers.

linwhitehat / ET-BERT

The repository of ET-BERT, a network traffic classification model on encrypted traffic. The work has been accepted as The Web Conference (WWW) 2022 accepted paper.

MIT License

384 stars 81 forks source link

Regarding the removal of ethernet , IP headers, port numbers. #91

Open OXALICACID9172 opened 1 month ago

OXALICACID9172 commented 1 month ago

In the paper it is mentioned that the Ethernet headers, IP headers and port numbers are removed. But in generating the pre-training data(In get_burst_feature() function) I see that only first 64 bytes of a packet are considered( line: 109) and I don't see a line removing the headers. Am I missing something??

linwhitehat commented 3 weeks ago

In the paper it is mentioned that the Ethernet headers, IP headers and port numbers are removed. But in generating the pre-training data(In get_burst_feature() function) I see that only first 64 bytes of a packet are considered( line: 109) and I don't see a line removing the headers. Am I missing something??

Hello, thank you for your interest in our work. During pre-training, the bias issue does not take effect for this since there is no supervised task associated with the downstream task.

OXALICACID9172 commented 2 weeks ago

But the pretraining data and finetuning data have different distributions. Doesn't this affect the model's performance?

linwhitehat commented 2 weeks ago

But the pretraining data and finetuning data have different distributions. Doesn't this affect the model's performance?

Since the pre-training phase is not done for a specific scenario task, it is more inclined to obtain a pervasive traffic representation without the distributional effects under supervised learning.