Test the system on "Covert channel detection" dataset

Information Gather Phase

Read only about the CCD from the following paper and the references: https://www.ndss-symposium.org/wp-content/uploads/ndss2021_7C-2_24067_paper.pdf first!
Read the draft paper (shared in the group)
Do it by Thursday!

ML Phase

In order to download the dataset, visit this repository as mentioned in the FlowLens paper.

Follow these steps:

In case you are not able to find all these features, consider only the common features.

Train a vanilla binary decision tree on the dataset (mentioned in 1) on the above-stated features only. Note accuracy, recall, and precision (TP is a number of correctly classified malicious flows) and F1 score.
Find the minimal feature subset F`(max 4 features) of “F” which gives almost the same accuracy, recall, precision, and F1 score.
Make sure that the trained decision tree predicts class probabilities as well (https://stats.stackexchange.com/questions/193424/is-decision-tree-output-a-prediction-or-class-probabilities). Make note of the probabilities along with class labels.
Capture the decision tree rules for only the malicious flows (R) and make a note of them.

System Analysis Phase

Let me know when you get to this point.

We then need to test the current system with this new dataset and new decision tree rules. The code (in python3) is already there, we just need to update it with new decision tree rules. Test on the current system to get a recall, precision, accuracy, and F1 score for different values of hash table entries.
In case of very poor performance, we will need to update the system design.

Once this is done, I can create a new issue accordingly

networked-systems-iith / AdaFlow