Which feature stats function representation is appropriate for the IF?

networked-systems-iith / SecFRR

Repository for research conducted at NETX, a networks research group in the Department of Computer Science and Engineering at IIT Hyderabad, India led by Dr. Praveen Tammana.

https://www.netxiith.in/

0 stars 0 forks source link

Which feature stats function representation is appropriate for the IF? #4

Open divyapathak24 opened 1 year ago

divyapathak24 commented 1 year ago

Task:

collect and analyze (manually) flowsize and flow duration data for normal, congestion, and link failure windows across pcaps..
capture this behavior in a model (training part) and use the model to distinguish it from attack link failure..
Look at x paper to find how they build security models using fsize and fduration.

divyapathak24 commented 1 year ago

Check with Sankalp on point 3

divyapathak24 commented 1 year ago

Function 1: Feed per instance 64 x ( FS,FD) pairs to the IF ML model Goal: Predict before the attack happens (early detection) Observations:

Worked on bmv2 synthetic setup but not for CAIDA setup
On CAIDA setup:
- could not detect 2 attack pcaps out of 10 pcaps (4 attack, 4 link failure, 2 normal/congestion) -- FNR 0.2 is bad.

Next steps:

Explore other functions for CAIDA trace and redo the experiments for bmv2 setup

praveenabt commented 1 year ago

@prathyush1886
Try new functions and find one that is giving better results. Key idea: Leverage the characteristics of flows (fs, fd) to differentiate normal and attack. More specifically, try the functions below:

Classification across flows (whole window)
Consider only fs, only fd, and combination of fs and fd.
Per-flow classification (aggregate anomaly count is above a certain threshold)

Input: 64x(FS, FD), black box function, output: distribution stats

Check fig.4 in this paper for more ideas: https://www.ndss-symposium.org/wp-content/uploads/ndss2021_7C-2_24067_paper.pdf

praveenabt commented 1 year ago

Function 1: vary value 'n'
- calculate similiarty scores of instance 'x' with the last 'n' instances (n-tuple)
- Taining: m x (n-tuples) where 'm' is the number of instances in a pcap
- Inference:
  - similarity scores of instance 'x' with the last 'n' instances
- Parameter: threshold 'x', last x-tuples are reported as anamoly
Function 2: vary 'x' and 'buckets' per instance
- calculate flow size distribution and provide per-bucket count to the model
- Training: per-bucket count of each instance
- Inference: observed per-bucket count in the current instance
- Parameter 1: threshold 'x', last 'x' instances are reported as anamoly
  - the key is to differentiate between normal link failure and attack link failure. Observation is that the normal link failure doesn't impact flow duration distribution.
- Parameter 2: Number of buckets per instance
Model: isolation forest, ...
Test cases:
- all normal instances (many flows are < 10sec)
- normal followed by link failure (> 15%) followed by normal - highlight one before link failure, one with link failure, and one after
- normal followed by attack (> 20%) - highlight 2 before the attack and 1 during the attack
- normal followed by burst (many flows are < 2 sec in an instance) followed by normal
Success metrics:
- Definitions :
  - FPs = Normal class being classified as attack
  - FNs = Attack class being classified as normal
- FPR: It’s the probability that a positive result will be given when the true value is negative.
- FNR: It’s the probability that a negative result will be given when the true value is positive.

praveenabt commented 1 year ago

@prathyush1886 update the meeting minutes, especially cover:

the discussion on why simple removal of flows observing retransmissions is insufficient to predict and stop the attack

Updated Meeting Minutes

Analyzed the flow stats distribution graphs to get conclusions.
Decided on the function to be used to feed the input to the model.
Spoke on the importance of considering edge case scenarios while looking into the ( function x model ) combination to see if there is a chance for them to be misclassified.
Looking into the factors like threshold and last 'n' instances can help understand point 4.
Look into why isolation forest could be used here ( why not any other model ), understand with the help of a concrete example.
Looked into whether a simple solution such as flow removal by blink based on duration for which a flow is experiencing retransmission. The reasons included :
- High chances that in the case of a link experiencing congestion flows experiencing retransmissions due to real reason may end up being removed, not allowing blink to actually reroute in the correct scenario.
- This removal mechanism seemed like more of a ignoring mechanism that seems to prevent attacks while also disregarding the retransmissions that persist for a long time due to genuine reasons.
- Could be useful in some cases of blink where the genuine retransmissions are not persistent, but this makes it very specific .
- One of the main reason for choosing a ML based model compared to this is due to its generalizable nature where the model could also be extended to other FRR systems and still produce results, while in the simple solution case it is more blink specific.

praveenabt commented 1 year ago

@prathyush1886 similarity for buckets with zero flows

ignore such buckets
take a diff with the average flows in that bucket across all instances (instead of last 'n' instances)

prathyush1886 commented 1 year ago

Meeting Minutes ( 2/9/2023 ) :

We have separated the flows based on whether they are retransmission flows or normal flows and generated the distribution plots for each of the instances for the respective flow stats.
- The Stats being Flow Duration, Flow Size and Inter Packet Time Difference
We cross analyzed by comparing 4 link failure case instances ( when the link failure happened ) and 4 attack case instances ( when the attack happened ).
We analyzed the Flow Duration distribution and were able to see that majority of flow in only retransmission flows case seem to lie in the less than 12 seconds range.
This gave the idea of using a statistical based approach where we flag instances that have greater than a threshold number of flows falling in the greater than 12 sec range ( Only Retransmission flows case ).
Some of the Ideas for threshold selection include ( Only Retransmission flows case ):
- Max value in LF case ( Max refers to the maximum number of flows > 12 sec for a instance )
- Avg of number of flows > 12 sec for the Link failure instances
- ( Min + Max )/ 2 of the flows > 12 sec for the Link Failure Instances ( Max refers to the maximum number of flows > 12 sec for a instance and Min refers to minimum number of flows > 12 sec for an instance )
Some extra points to be mention ( added portions on top of the blink logic )
- Flow stats collection Logic
- Flows flagging logic ( experiencing retransmission or not )

FD_ret

praveenabt commented 1 year ago

Meeting Minutes ( 7/9/2023 ):

Threshold based approach for detection: Note that the following discussion is with respect to only the retransmission flows:

After analysing the flow duration distribution for 4 LF instances, generate distribution plots for the remaining LF cases
Derive a threshold (t) using the LF instances and the above-discussed approaches
Dump the per-bin flow counts for all attack instances (up to the instance where the attack actually took place)
Run threshold algorithm on all attack instances one by one on a per-pcap basis.

if a fraction of flows/ threshold (t) > 12 --> mark the instance as an attack instance

How many instances are we able to detect as attack before the attack actually took place?

Evaluation of our detection mechanism:

Metrics: FPR and FNR Variables: threshold (t) and #consecutive instances Objective:

How early are we able to detect attacks?
Variable 1: To study the threshold values for which FPR and FNR are the least
Variable 2: Vary #consecutive instances as 1,2,3,4,5 and study of its effect on FPR and FNR.