kbandla / dpkt

fast, simple packet creation / parsing, with definitions for the basic TCP/IP protocols
Other
1.09k stars 270 forks source link

dpkt.http needs to check for '\r\n', since we may have fragmented requests #264

Closed kusoof closed 7 years ago

kusoof commented 8 years ago

Sometimes HTTP requests are too large to fit in one TCP segment, which means on parsing pcap TCP segments it is possible to pass an incomplete header to the dpkt.http decoder. Fixed with the following patch in http.py:

patch.txt

RemiDesgrange commented 8 years ago

Open a PR, and if you could provide an example it could be great !

brifordwylie commented 8 years ago

@kusoof if you can provide a pcap file for this than we're happy to do some testing.

kusoof commented 8 years ago

Hi,

Whoops, I’ve been meaning to get back to this. I have a collection of pcap files to test in this repository:

https://github.com/kusoof/alexa_traces/tree/master/dependency_raw/alexa_raw_traces_8_12_2015 https://github.com/kusoof/alexa_traces/tree/master/dependency_raw/alexa_raw_traces_8_12_2015

For example:

https://github.com/kusoof/alexa_traces/blob/master/dependency_raw/alexa_raw_traces_8_12_2015/alexa_traces.pcap https://github.com/kusoof/alexa_traces/blob/master/dependency_raw/alexa_raw_traces_8_12_2015/alexa_traces.pcap

Please let me know if you don’t have access, and I can just send you an attachment. Also, if it’s any help, I’ve attached the script I was using to parse the pcap files. The script works after incorporating the fix I’m proposing.

Cheers, Lynne

On Aug 27, 2016, at 10:12 PM, Brian Wylie notifications@github.com wrote:

@kusoof https://github.com/kusoof if you can provide a pcap file for this than we're happy to do some testing on this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kbandla/dpkt/issues/264#issuecomment-242941141, or mute the thread https://github.com/notifications/unsubscribe-auth/AHeborlWhKV_Xsc5IMxiv2-mlJqrNY00ks5qkKghgaJpZM4INMC2.

brifordwylie commented 8 years ago

@kusoof when I run the examples/print_http_requests.py code is seems to run fine

$ cd dpkt/examples
$ python print_http_requests.py  ( I just edited the example to point to your pcap)
...
Timestamp:  2015-12-08 15:51:48.134968
Ethernet Frame:  08:00:27:1c:a7:fe 52:54:00:12:35:02 2048
IP: 10.0.2.15 -> 104.16.107.204   (len=520 ttl=64 DF=1 MF=0 offset=0)
HTTP request: Request(body='', uri='/img/favicons-sprite16.png?v=5f1c9ad029b2ea2d9d06ae792ba589ab', headers={'accept-language': 'en-GB,en-US;q=0.8,en;q=0.6', 'accept-encoding': 'gzip,deflate,sdch', 'connection': 'keep-alive', 'accept': '*/*', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1195.0 Safari/537.1', 'accept-charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'host': 'cdn.sstatic.net', 'referer': 'http://stackoverflow.com/', 'cookie': '__cfduid=d91dd016837dfb922faa64a88ea1dc9e41436129028'}, version='1.1', data='', method='GET')

Timestamp:  2015-12-08 15:51:48.170114
Ethernet Frame:  08:00:27:1c:a7:fe 52:54:00:12:35:02 2048
IP: 10.0.2.15 -> 104.16.12.8   (len=570 ttl=64 DF=1 MF=0 offset=0)
HTTP request: Request(body='', uri='/ados.js', headers={'accept-language': 'en-GB,en-US;q=0.8,en;q=0.6', 'accept-encoding': 'gzip,deflate,sdch', 'if-modified-since': 'Tue, 09 Jun 2015 19:28:35 GMT', 'connection': 'keep-alive', 'accept': '*/*', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1195.0 Safari/537.1', 'accept-charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'host': 'static.adzerk.net', 'referer': 'http://stackoverflow.com/', 'if-none-match': '"6388cba9e9b34547e4f2c55e10eff2dc"', 'cookie': '__cfduid=d50664e247a053d09f34fb3fea398080c1434379864'}, version='1.1', data='', method='GET')

Non IP Packet type not supported ARP
Non IP Packet type not supported ARP
$ 

Looking at the patch above it appears that 'running fine' ISN'T what you want.. instead if the request doesn't have a \r\n on it then you want dpkt to raise a UnpackError exception. So you basically want the UnpackError exception to be raised by dpkt when a request spans across tcp segments.. is that correct?

kusoof commented 8 years ago

Yes, that is right. If '\r\n' is missing, the header is not complete, and an UnpackError should be raised. Then it's up to the calling code to attempt to assemble multiple TCP segments if it encounters this.

brifordwylie commented 8 years ago

@kusoof okay gotcha. You're correct that the Header is suppose to end with a CRLF (\r\n) (RFC 2616: section 4.1), but I'm hesitant to require the \r\n at the end, it might bite us more than help us. My understanding is that some servers might omit the \r part and only send \n. So we might end up rejecting those headers...

I'd like @kbandla and @obormot to weigh in on this.

In the meantime here's an alternative, because you know exactly what you're looking for you can simply put a check in your code.

I've made the following 3 line addition to dpkt/examples/print_http_requests.py

            # Check for Header spanning acrossed TCP segments
            if not tcp.data.endswith('\r\n'):
                print '\nHEADER TRUNCATED! Reassemble TCP segments!\n'

And now when I run this on your pcap I get.

...
Timestamp:  2015-12-08 15:03:47.215093
Ethernet Frame:  08:00:27:1c:a7:fe 52:54:00:12:35:02 2048
IP: 10.0.2.15 -> 54.239.17.7   (len=1135 ttl=64 DF=1 MF=0 offset=0)
HTTP request: Request(body='path=%2F&queryString=%3Fie%3DUTF8%26%252AVersion%252A%3D1%26%252Aentries%252A%3D0&pageType=Gateway&referer=', uri='/gp/redirection/india.html', headers={'origin': 'http://www.amazon.com', 'cookie': 'x-wl-uid=1r+u9iwZ5U9IRZYFhZL/av+PoAOFlClQcIiAJ8V1GjbOLqoK916vnBk/lAc4ANWxeyB8rlV5ci0U=; session-token=jJTHFpibXMTh2Z5NUH+Bism/C8GfVZhxzNqoDMBoYkoWb2s/lOnfGmfdm1oMVmEhVuZEw4i5y0VI3Kbk9Y09+Rdv7Dmke+hBFrtNwDJRkjkE6/wSrd5jQFlHV0m8CPLyDn5oDF+QzBrHglppq5cU8/MkBfGJfw+VN5fEIkbI2iynbJzCBYgZjzQS9c82zP2NVrkjwnJKKWTnnkxipmu/9WIYWLvjx3LcE990rnCPmrEcMWxX+He2LNrNGWj7RSZH; skin=noskin; ubid-main=183-0508452-9260951; session-id-time=2082787201l; session-id=190-7849816-6285307', 'content-length': '107', 'accept-language': 'en-GB,en-US;q=0.8,en;q=0.6', 'accept-encoding': 'gzip,deflate,sdch', 'connection': 'keep-alive', 'accept': '*/*', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1195.0 Safari/537.1', 'accept-charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'host': 'www.amazon.com', 'x-requested-with': 'XMLHttpRequest', 'referer': 'http://www.amazon.com/', 'content-type': 'application/x-www-form-urlencoded'}, version='1.1', data='', method='POST')

HEADER TRUNCATED! Reassemble TCP segments!

Timestamp:  2015-12-08 15:03:47.319649
Ethernet Frame:  08:00:27:1c:a7:fe 52:54:00:12:35:02 2048
IP: 10.0.2.15 -> 54.239.17.7   (len=1136 ttl=64 DF=1 MF=0 offset=0)
HTTP request: Request(body='path=%2F&queryString=%3Fie%3DUTF8%26%252AVersion%252A%3D1%26%252Aentries%252A%3D0&pageType=Gateway&referer=', uri='/gp/redirection/canada.html', headers={'origin': 'http://www.amazon.com', 'cookie': 'x-wl-uid=1r+u9iwZ5U9IRZYFhZL/av+PoAOFlClQcIiAJ8V1GjbOLqoK916vnBk/lAc4ANWxeyB8rlV5ci0U=; session-token=jJTHFpibXMTh2Z5NUH+Bism/C8GfVZhxzNqoDMBoYkoWb2s/lOnfGmfdm1oMVmEhVuZEw4i5y0VI3Kbk9Y09+Rdv7Dmke+hBFrtNwDJRkjkE6/wSrd5jQFlHV0m8CPLyDn5oDF+QzBrHglppq5cU8/MkBfGJfw+VN5fEIkbI2iynbJzCBYgZjzQS9c82zP2NVrkjwnJKKWTnnkxipmu/9WIYWLvjx3LcE990rnCPmrEcMWxX+He2LNrNGWj7RSZH; skin=noskin; ubid-main=183-0508452-9260951; session-id-time=2082787201l; session-id=190-7849816-6285307', 'content-length': '107', 'accept-language': 'en-GB,en-US;q=0.8,en;q=0.6', 'accept-encoding': 'gzip,deflate,sdch', 'connection': 'keep-alive', 'accept': '*/*', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1195.0 Safari/537.1', 'accept-charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'host': 'www.amazon.com', 'x-requested-with': 'XMLHttpRequest', 'referer': 'http://www.amazon.com/', 'content-type': 'application/x-www-form-urlencoded'}, version='1.1', data='', method='POST')
...

So that seems to work fine, it accomplishes the extra check that you want and you can take action in your code around flow construction. Perhaps stating the obvious, basically ALL the Responses will need TCP assembly. If you're interested in the Responses you may want more general flow construction functionality, so I'll throw out an unsolicited plug for https://github.com/SuperCowPowers/chains which uses DPKT and does flow reconstruction...

When I run chains/links/flow.py with your pcap as the input:

$ python flow.py
...
Flow ('10.0.2.15', '216.58.210.3', 37683, 80, 'TCP') (CTS)-- Packets:6 Bytes:1432 Payload: 'GET / HTTP/1.1\r\nHost: www.g...
Flow ('216.58.210.3', '10.0.2.15', 80, 37683, 'TCP') (STC)-- Packets:5 Bytes:1000 Payload: 'HTTP/1.1 302 Found\r\nLocatio...
Flow ('10.0.2.15', '216.58.210.3', 43024, 443, 'TCP') (CTS)-- Packets:6 Bytes:507 Payload: '\x16\x03\x02\x01\x8b\x01\x00\...
Flow ('10.0.2.15', '216.58.210.3', 43025, 443, 'TCP') (CTS)-- Packets:6 Bytes:507 Payload: '\x16\x03\x02\x01\x8b\x01\x00\...
Flow ('216.58.210.3', '10.0.2.15', 443, 43025, 'TCP') (STC)-- Packets:5 Bytes:192 Payload: "\x16\x03\x02\x00p\x02\x00\x00...
Flow ('216.58.210.3', '10.0.2.15', 443, 43024, 'TCP') (STC)-- Packets:5 Bytes:192 Payload: '\x16\x03\x02\x00p\x02\x00\x00...
Flow ('10.0.2.15', '216.58.210.3', 43026, 443, 'TCP') (CTS)-- Packets:28 Bytes:662 Payload: '\x16\x03\x02\x00\xb6\x01\x00\...
Flow ('216.58.210.3', '10.0.2.15', 443, 43026, 'TCP') (STC)-- Packets:34 Bytes:76134 Payload: '\x16\x03\x02\x00X\x02\x00\x00...
Flow ('10.0.2.15', '216.58.210.3', 43027, 443, 'TCP') (CTS)-- Packets:5 Bytes:507 Payload: '\x16\x03\x02\x01\x8b\x01\x00\...
Flow ('216.58.210.3', '10.0.2.15', 443, 43027, 'TCP') (STC)-- Packets:4 Bytes:192 Payload: '\x16\x03\x02\x00p\x02\x00\x00...
...

and

$ python http_meta.py
...
HTTP_REQUEST 10.0.2.15 --> 104.16.12.8
{'_Request__methods': {'BASELINE-CONTROL': None,
                       'BCOPY': None,
                      ... <long list of methods> ...
                       'UPDATE': None,
                       'VERSION-CONTROL': None},
 '_Request__proto': 'HTTP',
 'body': '',
 'data': '',
 'headers': {'accept': '*/*',
             'accept-charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
             'accept-encoding': 'gzip,deflate,sdch',
             'accept-language': 'en-GB,en-US;q=0.8,en;q=0.6',
             'connection': 'keep-alive',
             'cookie': '__cfduid=d50664e247a053d09f34fb3fea398080c1434379864',
             'host': 'static.adzerk.net',
             'if-modified-since': 'Tue, 09 Jun 2015 19:28:35 GMT',
             'if-none-match': '"6388cba9e9b34547e4f2c55e10eff2dc"',
             'referer': 'http://stackoverflow.com/',
             'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1195.0 Safari/537.1'},
 'method': 'GET',
 'uri': '/ados.js',
 'version': '1.1'}

HTTP_RESPONSE 104.16.12.8 --> 10.0.2.15
{'_Response__proto': 'HTTP',
 'body': '\x1f\x8b\x08c ...<very long data sequence> ... \x1av\x00\x00',
 'data': '',
 'headers': {'accept-ranges': 'bytes',
             'cache-control': 'public, max-age=604800',
             'cf-cache-status': 'HIT',
             'cf-ray': '2519a23d2cba3518-LHR',
             'connection': 'keep-alive',
             'content-encoding': 'gzip',
             'content-length': '7399',
             'content-type': 'application/javascript',
             'date': 'Tue, 08 Dec 2015 15:51:48 GMT',
             'etag': '"ece0895e9091d0019210eaf143ccd160"',
             'expires': 'Tue, 15 Dec 2015 15:51:48 GMT',
             'last-modified': 'Tue, 15 Sep 2015 14:54:47 GMT',
             'server': 'cloudflare-nginx',
             'vary': 'Accept-Encoding',
             'x-amz-id-2': 'aw9vnXkDbQVgVoe8+FqQNiRkkXSAzLvBnCK2SV+PHQVYhYELS3a2Fgx+/DqyTOJj',
             'x-amz-meta-s3cmd-attrs': 'uid:501/gname:staff/uname:jarrod/gid:20/mode:33188/mtime:1442328886/atime:1442327689/md5:ece0895e9091d0019210eaf143ccd160/ctime:1442328886',
             'x-amz-request-id': 'BF92CA5F533BD499',
             'x-amz-version-id': '4.EnD3e72Ji1NRRqGMeswl_pK.OnsMCO'},
 'reason': 'OK',
 'status': '200',
 'version': '1.1'}

I'll caveat that Chains is a 'toybox' project and not meant for production use, but might be useful for your use case.

kusoof commented 8 years ago

Thanks @brifordwylie. I had already made the change I'm proposing to my local dpkt.http to make my scripts work, but your check also works well without having to alter the dpkt code. I just thought I would bring the issue up here in case checking for an '\r\n' was omitted by accident rather by design.

obormot commented 8 years ago

My understanding is that some servers might omit the \r part and only send \n.

@brifordwylie I've seen this IRL. Most HTTP parsers I've seen make an effort to handle this by detecting the type of line break sent by the server (or client).