Open majorsylvie opened 11 months ago
Mandatory: Selin is a student in Machine Learning For Systems w/ Nick Feamster. She has just been given assignment 1 on measuring the quality of netflix traffic. As part of the assignment, she has to find what DNS requests were made to netflix CDN domains in order to find the IP addresses of those netflix CDNs.
Upon first loading in the PCAP with netflix_pcap = netml.pparser.parser.PCAP("path/to/file.pcap"), running
netflix_pcap.pcap2pandas(), then accessing the dataframe with
netflix_pcap.df, there is a "dns_transaction_id" column which is a string of the transaction ID for any DNS request/response, and
None` for any non-DNS traffic.
Selin however, does not know what thedns_transaction_id
actually is. So she consults the NetML documentation @ https://pypi.org/project/netml/ or https://github.com/noise-lab/netml , and she sees an example in the Use section titled "Manipulating DNS traffic"
In the docs section of "Manipulating DNS traffic, Selin sees a code example which shows how to load in a .pcap
file into NetML and create a dataframe. It then highlights the important columns created in relation to DNS traffic. For each relevant column there is:
After reading the comprehensive section that details all DNS-relevant columns, Selin then sees two code example which shows how to filter the dataframe for only DNS traffic
pcap.df
pcap.df.dns_traffic
which automatically returns the dataframe of only DNS traffic
These user stories are going to be the driving force in deciding what we want to work on.
I will split up into mandatory and non-mandatory.
This will be written from the perspective of a new user of NetML (like ourselves for assignment 1)
This will be complete when a wiki page has been created which stores all of these user stories