selwin / python-user-agents

A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.
MIT License
1.43k stars 196 forks source link

UnicodeDecodeError with non-ascii device names #87

Closed declension closed 6 years ago

declension commented 6 years ago

With Python 2.7 and user-agents == 1.1.0, we're seeing exceptions with non-ASCII device names, e.g. this minimal code snippet:

ua_string = 'Mozilla/5.0 (Linux; Android 4.4.2; P80 3G四核 (B1KC) Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/30.0.0.0 Safari/537.36'
user_agent = user_agents.parse(ua_string)

results in:

   File "/home/me/.local/share/virtualenvs/MyProject/local/lib/python2.7/site-packages/user_agents/parsers.py", line 254, in parse
    return UserAgent(user_agent_string)
  File "/home/me/.local/share/virtualenvs/MyProject/local/lib/python2.7/site-packages/user_agents/parsers.py", line 135, in __init__
    ua_dict = user_agent_parser.Parse(user_agent_string)
  File "/home/me/.local/share/virtualenvs/MyProject/local/lib/python2.7/site-packages/ua_parser/user_agent_parser.py", line 227, in Parse
    'device': ParseDevice(user_agent_string, **jsParseBits),
  File "/home/me/.local/share/virtualenvs/MyProject/local/lib/python2.7/site-packages/ua_parser/user_agent_parser.py", line 310, in ParseDevice
    device, brand, model = deviceParser.Parse(user_agent_string)
  File "/home/me/.local/share/virtualenvs/MyProject/local/lib/python2.7/site-packages/ua_parser/user_agent_parser.py", line 198, in Parse
    model = self.MultiReplace(self.model_replacement, match)
  File "/home/me/.local/share/virtualenvs/MyProject/local/lib/python2.7/site-packages/ua_parser/user_agent_parser.py", line 179, in MultiReplace
    _string = re.sub(r'\$(\d)', _repl, string)
  File "/home/me/.local/share/virtualenvs/MyProject/lib/python2.7/re.py", line 155, in sub
    return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 6: ordinal not in range(128)

Any ideas? Thanks

declension commented 6 years ago

Solved - this is Python2, forgot to make that a unicode string, so

ua_string = u'Mozilla/5.0 (Linux; Android 4.4.2; P80 3G四核 (B1KC) Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/30.0.0.0 Safari/537.36'
user_agent = user_agents.parse(ua_string)

does in fact work :smile:

So sorry, my mistake - but leaving here in case anybody runs into the same problem...