liuweireign / dpkt

Automatically exported from code.google.com/p/dpkt
Other
0 stars 0 forks source link

is dpkt able to parse the HTTP response ???? #69

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

   I am currently using dpkt to parse libpcap format file. What I want to do is to extract the raw content of the HTTP response. However, I found my script ONLY works for the HTTP responses that have small content length (<1500Byte). My script is something as:

#!/usr/bin/env python
import dpkt
f = open('test.cap')
pcap = dpkt.pcap.Reader(f)

for ts, buf in pcap:
     eth = dpkt.ethernet.Ethernet(buf)
     ip = eth.data
     tcp = ip.data  
     if tcp.sport == 80 :
        try:
            http2 = dpkt.http.Response(tcp.data)
            print http2.headers
        except:
            pass
f.close()

This is just a very simple example trying to print the HTTP headers of all the 
responses. 
However, comparing to my original test.cap file, I found the script does NOT 
print out all the response headers. Only responses with small contents(e.g. 
<1500B) can be printed out. I am wondering why it behaves like that and is dpkt 
really able to extract ALL the HTTP response contents (e.g. html files)?

ps, I am using dpkt-1.7 with Fedora OS.

Thanks for your comments.
heng

Original issue reported on code.google.com by cuiheng....@gmail.com on 13 Apr 2011 at 1:11

GoogleCodeExporter commented 9 years ago
by the way, my python version is Python 2.4.3

Original comment by cuiheng....@gmail.com on 13 Apr 2011 at 1:28

GoogleCodeExporter commented 9 years ago
dpkt doesn't do any TCP stream reassembly. You need to do that yourself.

Here's an example:

    http://code.google.com/p/dsniff/source/browse/trunk/dsniff/lib/reasm.py

If you're doing it live, you need to do your own stream parsing instead, e.g. 
dsniff's HTTP stream parser:

    http://code.google.com/p/dsniff/source/browse/trunk/dsniff/lib/http.py

Good luck!

Original comment by dugsong on 13 Apr 2011 at 1:51

GoogleCodeExporter commented 9 years ago
thanks dubsong, 

in this case, maybe dpkt is not the best option for the http response parsing. 

do you have any ideas which python modules may be a better choice in the http 
parsing in given a libpcap file?

Original comment by cuiheng....@gmail.com on 13 Apr 2011 at 2:29