networked-systems-iith / SecFRR

Repository for research conducted at NETX, a networks research group in the Department of Computer Science and Engineering at IIT Hyderabad, India led by Dr. Praveen Tammana.
https://www.netxiith.in/
0 stars 0 forks source link

Which feature stats function representation is appropriate for the IF? #4

Open divyapathak24 opened 1 year ago

divyapathak24 commented 1 year ago

Task:

  1. collect and analyze (manually) flowsize and flow duration data for normal, congestion, and link failure windows across pcaps..
  2. capture this behavior in a model (training part) and use the model to distinguish it from attack link failure..
  3. Look at x paper to find how they build security models using fsize and fduration. 
divyapathak24 commented 1 year ago

Check with Sankalp on point 3

divyapathak24 commented 1 year ago

Function 1: Feed per instance 64 x ( FS,FD) pairs to the IF ML model Goal: Predict before the attack happens (early detection) Observations:

Next steps:

praveenabt commented 1 year ago

@prathyush1886
Try new functions and find one that is giving better results. Key idea: Leverage the characteristics of flows (fs, fd) to differentiate normal and attack. More specifically, try the functions below:

Input: 64x(FS, FD), black box function, output: distribution stats

Check fig.4 in this paper for more ideas: https://www.ndss-symposium.org/wp-content/uploads/ndss2021_7C-2_24067_paper.pdf

praveenabt commented 1 year ago
  1. Function 1: vary value 'n'
    • calculate similiarty scores of instance 'x' with the last 'n' instances (n-tuple)
    • Taining: m x (n-tuples) where 'm' is the number of instances in a pcap
    • Inference:
      • similarity scores of instance 'x' with the last 'n' instances
    • Parameter: threshold 'x', last x-tuples are reported as anamoly
  2. Function 2: vary 'x' and 'buckets' per instance
    • calculate flow size distribution and provide per-bucket count to the model
    • Training: per-bucket count of each instance
    • Inference: observed per-bucket count in the current instance
    • Parameter 1: threshold 'x', last 'x' instances are reported as anamoly
      • the key is to differentiate between normal link failure and attack link failure. Observation is that the normal link failure doesn't impact flow duration distribution.
    • Parameter 2: Number of buckets per instance
  3. Model: isolation forest, ...
  4. Test cases:
    • all normal instances (many flows are < 10sec)
    • normal followed by link failure (> 15%) followed by normal - highlight one before link failure, one with link failure, and one after
    • normal followed by attack (> 20%) - highlight 2 before the attack and 1 during the attack
    • normal followed by burst (many flows are < 2 sec in an instance) followed by normal
  5. Success metrics:

    • Definitions :

      • FPs = Normal class being classified as attack
      • FNs = Attack class being classified as normal
    • FPR: It’s the probability that a positive result will be given when the true value is negative.

    • FNR: It’s the probability that a negative result will be given when the true value is positive.

praveenabt commented 1 year ago

@prathyush1886 update the meeting minutes, especially cover:

Updated Meeting Minutes

praveenabt commented 1 year ago

@prathyush1886 similarity for buckets with zero flows

prathyush1886 commented 1 year ago

Meeting Minutes ( 2/9/2023 ) :

FD_ret

praveenabt commented 1 year ago

Meeting Minutes ( 7/9/2023 ):

Threshold based approach for detection: Note that the following discussion is with respect to only the retransmission flows:

if a fraction of flows/ threshold (t) > 12 --> mark the instance as an attack instance

Evaluation of our detection mechanism:

Metrics: FPR and FNR Variables: threshold (t) and #consecutive instances Objective:

  1. How early are we able to detect attacks?
  2. Variable 1: To study the threshold values for which FPR and FNR are the least
  3. Variable 2: Vary #consecutive instances as 1,2,3,4,5 and study of its effect on FPR and FNR.