munhouiani / Deep-Packet

Pytorch implementation of deep packet: a novel approach for encrypted traffic classification using deep learning
MIT License
183 stars 56 forks source link

Make a prediction #36

Closed alforlea closed 1 year ago

alforlea commented 1 year ago

Dear owner,

In the project, there are methods for preprocessing data, and training and evaluating the model. However, it is unclear to me how data should be passed for perfrming an inference or prediction with the obtained model, as in preprocessing app_label and traffic_labels are passed and this wouldn't make sense in the case.

Could you give some tips, example or Jupyter notebook on how data should be passed to consume the model?

Thanks, Álex.

munhouiani commented 1 year ago

Hi Álex.

For inference, you will need to

  1. Transform a packet, that's https://github.com/munhouiani/Deep-Packet/blob/7e19bb448f1f6aae2ce7da5d3ace8fa5b5caa09f/preprocessing.py#L61
  2. Load a model, i.e. https://github.com/munhouiani/Deep-Packet/blob/7e19bb448f1f6aae2ce7da5d3ace8fa5b5caa09f/ml/utils.py#L197
  3. Pass the feature you get at step 1. into the model, and get a result, i.e., https://github.com/munhouiani/Deep-Packet/blob/7e19bb448f1f6aae2ce7da5d3ace8fa5b5caa09f/ml/metrics.py#L35

I hope that helps you.

alforlea commented 1 year ago

Thanks for the answer! Apologies for not answering, I haven't been able to work on this in some weeks.

I have extracted a .pcap file using wireshark, then followed those steps, specifically:

arr = transform_packet(packet)
model = (CNN.load_from_checkpoint(str(Path(model_path).absolute()),map_location=torch.device(cpu)).float().to(cpu))
model.eval()
result = torch.argmax(F.log_softmax(model(arr), dim=1), dim=1)

However, either if I tried a .pcap file with a single packet or with several, I got the following error when inserting the transformed packet into the model:

File "/home/traffic/Deep-Packet-master/inference.py", line 94, in main
    result = torch.argmax(F.log_softmax(model(arr), dim=1), dim=1)
  File "/home/traffic/anaconda3/envs/deep_packet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/traffic/Deep-Packet-master/ml/model.py", line 83, in forward
    x = self.conv1(x)
  File "/home/traffic/anaconda3/envs/deep_packet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/traffic/anaconda3/envs/deep_packet/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/traffic/anaconda3/envs/deep_packet/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/traffic/anaconda3/envs/deep_packet/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 307, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/traffic/anaconda3/envs/deep_packet/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 303, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
**TypeError: conv1d() received an invalid combination of arguments - got (csr_matrix, Parameter, Parameter, tuple, tuple, tuple, int), but expected one of:
 * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (csr_matrix, Parameter, Parameter, tuple, tuple, tuple, int)
 * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (csr_matrix, Parameter, Parameter, tuple, tuple, tuple, int)**

Do you have any idea why this can be? It is because how wireshark returns the packets? Find a capture of the .pcap example attached (cannot attach the file itself). Best, Álex Captura

alforlea commented 1 year ago

Hello, this is the code I used. Packets are read with an alternative read_pcap function as I had pcap files already in memory, and the loop is in case the pcap file has more than one packet.

    packets = read_pcap("pcap_file", BytesIO(pcap_file))
    for i, packet in enumerate(packets):
        arr = transform_packet(packet)
        if arr is not None:
            y_pred = F.log_softmax(
                model(torch.Tensor(arr.todense().tolist())), dim=1)
            y_hat = torch.argmax(y_pred, dim=1)
            if task == "app":
                label.append(ID_TO_APP.get(int(y_hat)))
            elif task == "traffic":
                label.append(ID_TO_TRAFFIC.get(int(y_hat)))
            label.append(",")
    labelString = ''.join(map(str,label))
    return {"labels": labelString[0:len(labelString)-1]}