Eval on P2P, DDoS and Covert channel

Sankalp-CS21MTECH12010 commented 1 year ago

Rebuttal Reply: We agree that to make a better case regarding accuracy and generality, AdaFlow should be evaluated on different datasets – P2P App Fingerprinting, Covert Channel, and DDoS detection. We will conduct experiments to evaluate FPR, FNR, malicious flow loss, packet recirculations, and ROC curves and complete these experiments by the camera-ready deadline.

Todo items:

Compare AdaFlow with NetBeacon and *Flow in terms of recall, precision, accuracy, FPR, FNR, and recirculation rate on Covert Deltashaper - ✔
Compare AdaFlow with NetBeacon and *Flow in terms of recall, precision, accuracy, FPR, FNR, and recirculation rate on the Covert Facet - ✔
Compare AdaFlow with NetBeacon and *Flow in terms of recall, precision, accuracy, FPR, FNR, and recirculation rate on P2P App Fingerprinting
Compare AdaFlow with NetBeacon and *Flow regarding the recall, precision, accuracy, FPR, FNR, and recirculation rate on DDoS attack detection.
PR/ROC curve for the above datasets.

Sankalp-CS21MTECH12010 commented 1 year ago

@Sankalp-CS21MTECH12010 Please refer to this first: https://github.com/networked-systems-iith/AdaFlow/issues/10 Done:

Compare AdaFlow with NetBeacon and *Flow in terms of recall, precision, accuracy, FPR, FNR, and recirculation rate on Covert Deltashaper (For 4096 entries)

dtM = 1, dtB = 0: Always prioritize malicious flows and evict benign flows upon collision
dtM = 0.8, dtB = 0.6: Evict (upon collision) malicious flows when the system is more than 80% confident and evict benign flows when the system is more than 60% confident.
dtM = 0.8, dtB = 0.8: Evict (upon collision) flows when the system is more than 80% confident in the prediction.

AdaFlow uses Decision Tree to prioritize flows and XGBoost as an offline ML model.

Key takeaways (First Graph):

AdaFlow with dtM=1 and dtB = 0 gives maximum recall but poor precision because it untimely evicts benign flows to make space for malicious flows, thus leading to high FPs.
NetBeacon with dt = 0.8 equally let all the flows stay in the cache until the model is 80% confident about its prediction. As such, recall reduces but precision and accuracy increase.
AdaFlow with dtM=dtB=0.8 behaves similarly to NetBeacon but still gives slightly better metrics because of model design (two classification levels + aggregate model design -- see section 3.2 in draft paper).
AdaFlow with dtM=0.8 and dtB=0.6 gives recall better that dtM=0.8, dtB=0.6 but worse that dtM=1, dtB=0.
AdaFlow with two levels of classifications give better metrics than when evaluated directly on preliminary classification.
*Flow gives worse metrics because of evict on collision policy which is equivalent to AdaFlow's dtM=dtB=0. Thus it is not able to capture flow distribution correctly.

Observation: When dtM is high, AdaFlow behaves as a malicious traffic filter and gives very good recall at the expense of low precision.

Key takeaways (Second Graph): As we can see, these results are extensions of first graph. Best results are given by dtM = dtB = 0.8. The lowest FNR is given by dtM=1, dtB = 0.

Key takeaways (Third Graph): *Flow has to recirculate a packet every time its buffer gets filled. Thus it has recirculation rate proportional to number of packets -- O(#packets/k) where k is size of narrow buffer. On the other hand, depending on dtM, dtB and dt, in worst case AdaFlow and NetBeacon's recirculation rate will be O(#flows).

Facet Recall: Precision: Accuracy: FPR vs FNR for 8K entries: Recirculations:

In Progress:

Compare AdaFlow with NetBeacon and *Flow in terms of recall, precision, accuracy, FPR, FNR, and recirculation rate on the P2P App Fingerprinting

praveenabt commented 1 year ago

@Sankalp-CS21MTECH12010 --

For each graph (set of related plots), please highlight the reason for why adaflow is better/equal/not better. Followed key takeaways and observations.
Malicious flow loss and ROC curves are not shown -- are you planning to do them later? if so please update the same In Progress.

Sankalp-CS21MTECH12010 commented 1 year ago

@praveenabt

I don't think malicious flow loss is required because: (a) In Strawman flows were getting overwritten. (b) In NetBeacon and AdaFlow, flows are always evicted (even if untimely) so it covers all the flows even if classification is inaccurate.

Sankalp-CS21MTECH12010 commented 1 year ago

Last 5 (important features)

bin75: Say a flow has 100 packets. Then bin75 = 12 for that flow means that out of 100 packets there are 12 packets whose packet lengths lie between [16x75, 16x76 - 1).

praveenabt commented 1 year ago

How do we show the benefits of Adaflow over NetBeacon?

Collision handling in NetBeacon: NetBeacon derives dataset-specific thresholds for small flows and avoids monitoring them.
Identify a dataset that requires small flow monitoring to detect attacks.
Explore how AdaFlow doing compared to NetBeacon.

praveenabt commented 1 year ago

@Sankalp-CS21MTECH12010 For facet/deltashaper, explain the key reason of why NetBeacon recall is giving poor results compared to AdaFlow. -- Is it because of missing small flow monitoring?

Sankalp-CS21MTECH12010 commented 1 year ago

@praveenabt I do not think it is because of missing small flow monitoring. NetBeacon people have clearly showed in the paper that doing that does not affect the metrics much for P2P, Covert and DDoS detection.

I think it is because of ML Model design. NetBeacon uses sequential model while I have used aggregated model design. I wanted to bring this yesterday's meeting but did not because the discussion was already becoming complicated.

So sequential model consumes more switch memory (in terms of SRAM and TCAM) than the aggregated model I used.

We can show that AdaFlow achieves similar metrics (slightly better) than NB for less number of stages (less SRAM and TCAM memory), which is an advantage. The sequential model requires deploying many ML models (depending on a number of inference points).

Another point is that showing that for any CIC-IDS attacks, we cannot miss small flow monitoring could be another advantage.

Lastly, we can also argue that AdaFlow can selectively filter out specific malicious traffic, which NB might not be able to do.

praveenabt commented 1 year ago

Observation: NetBecon dp model consumes more memory (multiple models deployed) interms of SRAM and TCAM, compared to our single aggregated model.

Sankalp-CS21MTECH12010 commented 1 year ago

Important Features for P2P Fingerprinting:

Sankalp-CS21MTECH12010 commented 1 year ago

@praveenabt

Prototype for P2P fingerprinting, Covert Channel, DDoS (715 lines of code) is ready. Features collected are dataset specific:
- P2P fingerprinting: size_min, size_max, size_avg, ipd_max
- Covert channel: bin features (packet length dist): bin66, bin64, bin53, bin71, bin75 -- per flow count of packets that fall in these bins
- DDoS: ipd_min

Uses 10 stages, 8.1% SRAM and 13.9% TCAM

In contrast, NB reported 12 stages, 13.44% SRAM and 34.03% TCAM

For DDoS (500 lines of code): Used 8 stages, 5.7% SRAM and 1.5% TCAM In contrast NB reported 9 stages, 11.1% SRAM and 1.85% TCAM

For P2P: Uses 11 stages, 12.8% TCAM and 9.6% SRAM In contrast NB reported 12 stages, 17.29% of SRAM and 31.25% of TCAM.

praveenabt commented 1 year ago

Todo: Get the accuracy/recall/precision, etc plots and memory usage information for all four datasets for both AdaFlow and NetBeacon.

For P2P and DDoS datasets: get recall, precision, accuracy, fpr, fnr, recirculation rate
For Covertchannel: get switch resource usage
For all datasets: Capture usage of ALUs (or any other with significant improvements compared to NetBecon)
while writing, cover why NB requires more memory than AdaFlow

Sankalp-CS21MTECH12010 commented 1 year ago

Combined prototype to handle covert channels, P2P and DDoS (1000 Lines):

Sankalp-CS21MTECH12010 commented 1 year ago

This is NB output for P2P: Meter ALU is 27.1%. In contrast AdaFlow only consumed 18.4% ALU for P2P.

So there is a decrease, about 8% decrease for P2P and covert. For DDoS about same.

Although I'll have to think why this decrease for meter ALUs.

Sankalp-CS21MTECH12010 commented 1 year ago

@praveenabt For P2P and DDoS, I will directly add the results in the draft paper shared.

networked-systems-iith / AdaFlow

Eval on P2P, DDoS and Covert channel #7

In contrast, NB reported 12 stages, 13.44% SRAM and 34.03% TCAM