nyu-mlab / pcap-parser

MIT License
1 stars 1 forks source link

feature/create network flows #4

Closed Rameen-Mahmood closed 6 months ago

Rameen-Mahmood commented 6 months ago

@crazyideas21 Could you review the method for calculating inter-arrival times between network packets? I'm using diff() on the timestamp column to compute the difference between each packet's timestamp and the previous packet's timestamp

grouped = df.groupby(['ip.src', 'ip.dst', 'tcp.srcport', 'tcp.dstport', 'udp.srcport', 'udp.dstport', '_ws.col.Protocol']) df['inter_arrival_time'] = df.groupby(['ip.src', 'ip.dst', 'tcp.srcport', 'tcp.dstport', 'udp.srcport', 'udp.dstport', '_ws.col.Protocol'])['frame.time_epoch'].diff().dt.total_seconds()

crazyideas21 commented 6 months ago

Looks good. To be extra safe, make sure that the timestamps are sorted. Here's a toy example:


import pandas as pd

df = [
    ('a', 4),
    ('b', 20),
    ('a', 2),
    ('a', 1),
    ('b', 10),
    ('b', 100)
]

df = pd.DataFrame(df, columns=['Packet', 'Time']).sort_values(by=['Packet', 'Time'])

g = df.groupby('Packet')
df['Inter-Arrival-Time'] = g['Time'].diff()

df