networked-systems-iith / AdaFlow

AdaFlow: An Efficient In-Network Cache for Intrusion Detection using Programmable Data Planes
MIT License
0 stars 1 forks source link

Test the system on "Covert channel detection" dataset #1

Closed Sankalp-CS21MTECH12010 closed 1 year ago

Sankalp-CS21MTECH12010 commented 1 year ago

Information Gather Phase

  1. Read only about the CCD from the following paper and the references: https://www.ndss-symposium.org/wp-content/uploads/ndss2021_7C-2_24067_paper.pdf first!
  2. Read the draft paper (shared in the group)
  3. Do it by Thursday!

ML Phase

In order to download the dataset, visit this repository as mentioned in the FlowLens paper.

Follow these steps:

  1. Consider the feature space “F” (11 features) to be the following:
    • Flow IAT Min
    • Avg Packet Size
    • Subflow F.Bytes
    • Flow Duration
    • Total Len F.Packets
    • Active Min
    • Active Mean
    • Init Win F.Bytes
    • PSH Flag Count
    • SYN Flag Count
    • ACK Flag Count

In case you are not able to find all these features, consider only the common features.

  1. Train a vanilla binary decision tree on the dataset (mentioned in 1) on the above-stated features only. Note accuracy, recall, and precision (TP is a number of correctly classified malicious flows) and F1 score.

  2. Find the minimal feature subset F`(max 4 features) of “F” which gives almost the same accuracy, recall, precision, and F1 score.

  3. Make sure that the trained decision tree predicts class probabilities as well (https://stats.stackexchange.com/questions/193424/is-decision-tree-output-a-prediction-or-class-probabilities). Make note of the probabilities along with class labels.

  4. Capture the decision tree rules for only the malicious flows (R) and make a note of them.

System Analysis Phase

Let me know when you get to this point.

  1. We then need to test the current system with this new dataset and new decision tree rules. The code (in python3) is already there, we just need to update it with new decision tree rules. Test on the current system to get a recall, precision, accuracy, and F1 score for different values of hash table entries.
  2. In case of very poor performance, we will need to update the system design.

Once this is done, I can create a new issue accordingly