munhouiani / Deep-Packet

Pytorch implementation of deep packet: a novel approach for encrypted traffic classification using deep learning
MIT License
183 stars 56 forks source link

cannot convert float NaN to integer #38

Closed wladradchenko closed 1 year ago

wladradchenko commented 1 year ago

Dear owner,

I wanted to test the code and I have tried to start the python create_train_test_set.py -s processed_data -t train_test_data code after python preprocessing.py -s data/VPN-PCAPS-01/ -t processed_data. I used dataset from README VPN-PCAPS-01.zip.

I got an error cannot convert float NaN to integer in 57 line: min_label_count = int(label_count_df["count"].min()). Of course I can change the line as min_label_count = int(label_count_df["count"].min()) if not label_count_df["count"].empty else 0, but I understand that data from packet is empty and after finished the code the data will be empty.

If I run the code as python create_train_test_set.py -s processed_data -t train_test_data --under_sampling False (1), than I will be able get some result:

label count 0 7 15991 1 6 127016 2 9 269115 3 5 40164 4 10 900984

I will get an error Please passfeaturesor at least one example when writing data in evaluation_cnn.ipynb line 6 with model on the output of result (1) train_test_data.

How I need to start python create_train_test_set.py -s processed_data -t train_test_data after preprocessing.py? Because the instruction is not full in the current README.

Also, it's not clear for me how you set label for pcap packets? I saw that you used prefix = path.name.split(".")[0].lower(). I'm not sure that used name of file is good idea to set class for NN learning. Why you don't use the IP list? How I can understand that the model works clear if I can't see labels by IP?

Thanks.

wladradchenko commented 1 year ago

And if you take the model from README, then the result looks like the recognition does not work. So, I'm not sure in clear Evaluation Result. I have inspected on PCAPs with label Facebook, Skype, Youtube, VPN. PCAP for test: https://drive.google.com/file/d/1LJWRKypmAO-7gheGNFRR3LPYaOJ-nRtE/view?usp=share_link

import torch
import pathlib
from ml.utils import load_cnn_model
from preprocessing import read_pcap, transform_packet
import torch.nn.functional as F
from utils import ID_TO_APP, ID_TO_TRAFFIC

if torch.cuda.is_available():
    print("GPU")

# model path
application = 'model/application_classification.cnn.model'
traffic = 'model/traffic_classification.cnn.model'

model = load_cnn_model(traffic, gpu=True)

print("Processing")

rows = []
batch_index = 0
for i, packet in enumerate(read_pcap(pathlib.Path("data/test/4.pcap"))):
    arr = transform_packet(packet)
    if arr is not None:
        y_pred = F.log_softmax(model(torch.Tensor(arr.todense().tolist()).cuda()), dim=1)
        y_hat = torch.argmax(y_pred, dim=1)
        # app_label = ID_TO_APP.get(int(y_hat))
        traffic_label = ID_TO_TRAFFIC.get(int(y_hat))
        print(traffic_label, round(y_pred.tolist()[0][int(y_hat)], 4))

I have got result:

365ms commented 1 year ago

Try to correct the processed_data ,open one of the .json file,check the app_label and traffic_label ,if value is null that means u should add the specific prefix-id according to your dataset in utlis.py