Closed kusoof closed 7 years ago
Open a PR, and if you could provide an example it could be great !
@kusoof if you can provide a pcap file for this than we're happy to do some testing.
Hi,
Whoops, I’ve been meaning to get back to this. I have a collection of pcap files to test in this repository:
https://github.com/kusoof/alexa_traces/tree/master/dependency_raw/alexa_raw_traces_8_12_2015 https://github.com/kusoof/alexa_traces/tree/master/dependency_raw/alexa_raw_traces_8_12_2015
For example:
https://github.com/kusoof/alexa_traces/blob/master/dependency_raw/alexa_raw_traces_8_12_2015/alexa_traces.pcap https://github.com/kusoof/alexa_traces/blob/master/dependency_raw/alexa_raw_traces_8_12_2015/alexa_traces.pcap
Please let me know if you don’t have access, and I can just send you an attachment. Also, if it’s any help, I’ve attached the script I was using to parse the pcap files. The script works after incorporating the fix I’m proposing.
Cheers, Lynne
On Aug 27, 2016, at 10:12 PM, Brian Wylie notifications@github.com wrote:
@kusoof https://github.com/kusoof if you can provide a pcap file for this than we're happy to do some testing on this.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kbandla/dpkt/issues/264#issuecomment-242941141, or mute the thread https://github.com/notifications/unsubscribe-auth/AHeborlWhKV_Xsc5IMxiv2-mlJqrNY00ks5qkKghgaJpZM4INMC2.
@kusoof when I run the examples/print_http_requests.py code is seems to run fine
$ cd dpkt/examples
$ python print_http_requests.py ( I just edited the example to point to your pcap)
...
Timestamp: 2015-12-08 15:51:48.134968
Ethernet Frame: 08:00:27:1c:a7:fe 52:54:00:12:35:02 2048
IP: 10.0.2.15 -> 104.16.107.204 (len=520 ttl=64 DF=1 MF=0 offset=0)
HTTP request: Request(body='', uri='/img/favicons-sprite16.png?v=5f1c9ad029b2ea2d9d06ae792ba589ab', headers={'accept-language': 'en-GB,en-US;q=0.8,en;q=0.6', 'accept-encoding': 'gzip,deflate,sdch', 'connection': 'keep-alive', 'accept': '*/*', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1195.0 Safari/537.1', 'accept-charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'host': 'cdn.sstatic.net', 'referer': 'http://stackoverflow.com/', 'cookie': '__cfduid=d91dd016837dfb922faa64a88ea1dc9e41436129028'}, version='1.1', data='', method='GET')
Timestamp: 2015-12-08 15:51:48.170114
Ethernet Frame: 08:00:27:1c:a7:fe 52:54:00:12:35:02 2048
IP: 10.0.2.15 -> 104.16.12.8 (len=570 ttl=64 DF=1 MF=0 offset=0)
HTTP request: Request(body='', uri='/ados.js', headers={'accept-language': 'en-GB,en-US;q=0.8,en;q=0.6', 'accept-encoding': 'gzip,deflate,sdch', 'if-modified-since': 'Tue, 09 Jun 2015 19:28:35 GMT', 'connection': 'keep-alive', 'accept': '*/*', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1195.0 Safari/537.1', 'accept-charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'host': 'static.adzerk.net', 'referer': 'http://stackoverflow.com/', 'if-none-match': '"6388cba9e9b34547e4f2c55e10eff2dc"', 'cookie': '__cfduid=d50664e247a053d09f34fb3fea398080c1434379864'}, version='1.1', data='', method='GET')
Non IP Packet type not supported ARP
Non IP Packet type not supported ARP
$
Looking at the patch above it appears that 'running fine' ISN'T what you want.. instead if the request doesn't have a \r\n on it then you want dpkt to raise a UnpackError exception. So you basically want the UnpackError exception to be raised by dpkt when a request spans across tcp segments.. is that correct?
Yes, that is right. If '\r\n' is missing, the header is not complete, and an UnpackError should be raised. Then it's up to the calling code to attempt to assemble multiple TCP segments if it encounters this.
@kusoof okay gotcha. You're correct that the Header is suppose to end with a CRLF (\r\n) (RFC 2616: section 4.1), but I'm hesitant to require the \r\n at the end, it might bite us more than help us. My understanding is that some servers might omit the \r part and only send \n. So we might end up rejecting those headers...
I'd like @kbandla and @obormot to weigh in on this.
In the meantime here's an alternative, because you know exactly what you're looking for you can simply put a check in your code.
I've made the following 3 line addition to dpkt/examples/print_http_requests.py
# Check for Header spanning acrossed TCP segments
if not tcp.data.endswith('\r\n'):
print '\nHEADER TRUNCATED! Reassemble TCP segments!\n'
And now when I run this on your pcap I get.
...
Timestamp: 2015-12-08 15:03:47.215093
Ethernet Frame: 08:00:27:1c:a7:fe 52:54:00:12:35:02 2048
IP: 10.0.2.15 -> 54.239.17.7 (len=1135 ttl=64 DF=1 MF=0 offset=0)
HTTP request: Request(body='path=%2F&queryString=%3Fie%3DUTF8%26%252AVersion%252A%3D1%26%252Aentries%252A%3D0&pageType=Gateway&referer=', uri='/gp/redirection/india.html', headers={'origin': 'http://www.amazon.com', 'cookie': 'x-wl-uid=1r+u9iwZ5U9IRZYFhZL/av+PoAOFlClQcIiAJ8V1GjbOLqoK916vnBk/lAc4ANWxeyB8rlV5ci0U=; session-token=jJTHFpibXMTh2Z5NUH+Bism/C8GfVZhxzNqoDMBoYkoWb2s/lOnfGmfdm1oMVmEhVuZEw4i5y0VI3Kbk9Y09+Rdv7Dmke+hBFrtNwDJRkjkE6/wSrd5jQFlHV0m8CPLyDn5oDF+QzBrHglppq5cU8/MkBfGJfw+VN5fEIkbI2iynbJzCBYgZjzQS9c82zP2NVrkjwnJKKWTnnkxipmu/9WIYWLvjx3LcE990rnCPmrEcMWxX+He2LNrNGWj7RSZH; skin=noskin; ubid-main=183-0508452-9260951; session-id-time=2082787201l; session-id=190-7849816-6285307', 'content-length': '107', 'accept-language': 'en-GB,en-US;q=0.8,en;q=0.6', 'accept-encoding': 'gzip,deflate,sdch', 'connection': 'keep-alive', 'accept': '*/*', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1195.0 Safari/537.1', 'accept-charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'host': 'www.amazon.com', 'x-requested-with': 'XMLHttpRequest', 'referer': 'http://www.amazon.com/', 'content-type': 'application/x-www-form-urlencoded'}, version='1.1', data='', method='POST')
HEADER TRUNCATED! Reassemble TCP segments!
Timestamp: 2015-12-08 15:03:47.319649
Ethernet Frame: 08:00:27:1c:a7:fe 52:54:00:12:35:02 2048
IP: 10.0.2.15 -> 54.239.17.7 (len=1136 ttl=64 DF=1 MF=0 offset=0)
HTTP request: Request(body='path=%2F&queryString=%3Fie%3DUTF8%26%252AVersion%252A%3D1%26%252Aentries%252A%3D0&pageType=Gateway&referer=', uri='/gp/redirection/canada.html', headers={'origin': 'http://www.amazon.com', 'cookie': 'x-wl-uid=1r+u9iwZ5U9IRZYFhZL/av+PoAOFlClQcIiAJ8V1GjbOLqoK916vnBk/lAc4ANWxeyB8rlV5ci0U=; session-token=jJTHFpibXMTh2Z5NUH+Bism/C8GfVZhxzNqoDMBoYkoWb2s/lOnfGmfdm1oMVmEhVuZEw4i5y0VI3Kbk9Y09+Rdv7Dmke+hBFrtNwDJRkjkE6/wSrd5jQFlHV0m8CPLyDn5oDF+QzBrHglppq5cU8/MkBfGJfw+VN5fEIkbI2iynbJzCBYgZjzQS9c82zP2NVrkjwnJKKWTnnkxipmu/9WIYWLvjx3LcE990rnCPmrEcMWxX+He2LNrNGWj7RSZH; skin=noskin; ubid-main=183-0508452-9260951; session-id-time=2082787201l; session-id=190-7849816-6285307', 'content-length': '107', 'accept-language': 'en-GB,en-US;q=0.8,en;q=0.6', 'accept-encoding': 'gzip,deflate,sdch', 'connection': 'keep-alive', 'accept': '*/*', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1195.0 Safari/537.1', 'accept-charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'host': 'www.amazon.com', 'x-requested-with': 'XMLHttpRequest', 'referer': 'http://www.amazon.com/', 'content-type': 'application/x-www-form-urlencoded'}, version='1.1', data='', method='POST')
...
So that seems to work fine, it accomplishes the extra check that you want and you can take action in your code around flow construction. Perhaps stating the obvious, basically ALL the Responses will need TCP assembly. If you're interested in the Responses you may want more general flow construction functionality, so I'll throw out an unsolicited plug for https://github.com/SuperCowPowers/chains which uses DPKT and does flow reconstruction...
When I run chains/links/flow.py with your pcap as the input:
$ python flow.py
...
Flow ('10.0.2.15', '216.58.210.3', 37683, 80, 'TCP') (CTS)-- Packets:6 Bytes:1432 Payload: 'GET / HTTP/1.1\r\nHost: www.g...
Flow ('216.58.210.3', '10.0.2.15', 80, 37683, 'TCP') (STC)-- Packets:5 Bytes:1000 Payload: 'HTTP/1.1 302 Found\r\nLocatio...
Flow ('10.0.2.15', '216.58.210.3', 43024, 443, 'TCP') (CTS)-- Packets:6 Bytes:507 Payload: '\x16\x03\x02\x01\x8b\x01\x00\...
Flow ('10.0.2.15', '216.58.210.3', 43025, 443, 'TCP') (CTS)-- Packets:6 Bytes:507 Payload: '\x16\x03\x02\x01\x8b\x01\x00\...
Flow ('216.58.210.3', '10.0.2.15', 443, 43025, 'TCP') (STC)-- Packets:5 Bytes:192 Payload: "\x16\x03\x02\x00p\x02\x00\x00...
Flow ('216.58.210.3', '10.0.2.15', 443, 43024, 'TCP') (STC)-- Packets:5 Bytes:192 Payload: '\x16\x03\x02\x00p\x02\x00\x00...
Flow ('10.0.2.15', '216.58.210.3', 43026, 443, 'TCP') (CTS)-- Packets:28 Bytes:662 Payload: '\x16\x03\x02\x00\xb6\x01\x00\...
Flow ('216.58.210.3', '10.0.2.15', 443, 43026, 'TCP') (STC)-- Packets:34 Bytes:76134 Payload: '\x16\x03\x02\x00X\x02\x00\x00...
Flow ('10.0.2.15', '216.58.210.3', 43027, 443, 'TCP') (CTS)-- Packets:5 Bytes:507 Payload: '\x16\x03\x02\x01\x8b\x01\x00\...
Flow ('216.58.210.3', '10.0.2.15', 443, 43027, 'TCP') (STC)-- Packets:4 Bytes:192 Payload: '\x16\x03\x02\x00p\x02\x00\x00...
...
and
$ python http_meta.py
...
HTTP_REQUEST 10.0.2.15 --> 104.16.12.8
{'_Request__methods': {'BASELINE-CONTROL': None,
'BCOPY': None,
... <long list of methods> ...
'UPDATE': None,
'VERSION-CONTROL': None},
'_Request__proto': 'HTTP',
'body': '',
'data': '',
'headers': {'accept': '*/*',
'accept-charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'accept-encoding': 'gzip,deflate,sdch',
'accept-language': 'en-GB,en-US;q=0.8,en;q=0.6',
'connection': 'keep-alive',
'cookie': '__cfduid=d50664e247a053d09f34fb3fea398080c1434379864',
'host': 'static.adzerk.net',
'if-modified-since': 'Tue, 09 Jun 2015 19:28:35 GMT',
'if-none-match': '"6388cba9e9b34547e4f2c55e10eff2dc"',
'referer': 'http://stackoverflow.com/',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1195.0 Safari/537.1'},
'method': 'GET',
'uri': '/ados.js',
'version': '1.1'}
HTTP_RESPONSE 104.16.12.8 --> 10.0.2.15
{'_Response__proto': 'HTTP',
'body': '\x1f\x8b\x08c ...<very long data sequence> ... \x1av\x00\x00',
'data': '',
'headers': {'accept-ranges': 'bytes',
'cache-control': 'public, max-age=604800',
'cf-cache-status': 'HIT',
'cf-ray': '2519a23d2cba3518-LHR',
'connection': 'keep-alive',
'content-encoding': 'gzip',
'content-length': '7399',
'content-type': 'application/javascript',
'date': 'Tue, 08 Dec 2015 15:51:48 GMT',
'etag': '"ece0895e9091d0019210eaf143ccd160"',
'expires': 'Tue, 15 Dec 2015 15:51:48 GMT',
'last-modified': 'Tue, 15 Sep 2015 14:54:47 GMT',
'server': 'cloudflare-nginx',
'vary': 'Accept-Encoding',
'x-amz-id-2': 'aw9vnXkDbQVgVoe8+FqQNiRkkXSAzLvBnCK2SV+PHQVYhYELS3a2Fgx+/DqyTOJj',
'x-amz-meta-s3cmd-attrs': 'uid:501/gname:staff/uname:jarrod/gid:20/mode:33188/mtime:1442328886/atime:1442327689/md5:ece0895e9091d0019210eaf143ccd160/ctime:1442328886',
'x-amz-request-id': 'BF92CA5F533BD499',
'x-amz-version-id': '4.EnD3e72Ji1NRRqGMeswl_pK.OnsMCO'},
'reason': 'OK',
'status': '200',
'version': '1.1'}
I'll caveat that Chains is a 'toybox' project and not meant for production use, but might be useful for your use case.
Thanks @brifordwylie. I had already made the change I'm proposing to my local dpkt.http to make my scripts work, but your check also works well without having to alter the dpkt code. I just thought I would bring the issue up here in case checking for an '\r\n' was omitted by accident rather by design.
My understanding is that some servers might omit the \r part and only send \n.
@brifordwylie I've seen this IRL. Most HTTP parsers I've seen make an effort to handle this by detecting the type of line break sent by the server (or client).
Sometimes HTTP requests are too large to fit in one TCP segment, which means on parsing pcap TCP segments it is possible to pass an incomplete header to the dpkt.http decoder. Fixed with the following patch in http.py:
patch.txt