Improper parsing for Windows NT

shon / httpagentparser

Python HTTP Agent Parser

http://pypi.python.org/pypi/httpagentparser/

MIT License

222 stars 55 forks source link

Improper parsing for Windows NT #4

Closed brunobraga closed 13 years ago

brunobraga commented 13 years ago

Hei Shon,

First of all, thanks for sharing this! I was looking for an agent parser, and found out your lib in python.org.

I don't know if you are aware of this or not, but I was running some tests and found some inconsistent parsing on the following user-agent:

import httpagentparser s = "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.60 Safari/534.24" print httpagentparser.detect(s) {'os': {'version': 'NT 5.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.60 Safari/534.24', 'name': 'Windows'}, 'browser': {'version': '11.0.696.60', 'name': 'Chrome'}}

My computer is running Windows XP SP3, using Chrome 11.0.696.60. All other tests (Linux and Mac) went ok, just so you know.

I took the liberty to write a patch for this, so I pulling it to your code later on.

Thanks,

brunobraga commented 13 years ago

Maybe a good idea would be to use a site like: http://www.useragentstring.com/pages/Browserlist/ to fully execute a unit-test on this (maybe in a separate test.py file).

shon commented 13 years ago

Thanks brunobraga for reporting this and sure a patch is welcome. Bit busy atm, but will think about your other idea about test.py or may be you can contribute if possible.

brunobraga commented 13 years ago

Commited. https://github.com/brunobraga/httpagentparser/commit/13ee456d6976ad996822bfdc22f49cfc36033878

This issue is covered in line 195:

-        return agent.split('Windows')[-1].split(';')[0].strip()
+        temp = agent.split('Windows')[-1]
+        if temp.find(";") > -1:
+            return temp.split(';')[0].strip()
+        else:
+            return temp.split(')')[0].strip()

brunobraga commented 13 years ago

Hei shon,

I was looking more carefully at the web-site from previous comment, and found they have an API... http://www.useragentstring.com/?getJSON=all

Although we can not rely on third-party for a lib, it may be a nice approach to avoid handling all weird cases out there.

shon commented 13 years ago

Hi brunobraga,

New version with this fix is now available on pypi. Your suggestion about useragentstring.com is useful, thanks. Rightnow I am busy into some other things so will think about it later. Or better if somebody volunteers for it.

Cheers!