selwin / python-user-agents

A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.
MIT License
1.44k stars 197 forks source link

is_bot property is not implemented well #35

Open dobestan opened 9 years ago

dobestan commented 9 years ago
@property
    def is_bot(self):
        return True if self.device.family == 'Spider' else False

Is it okay to update current implementation of is_bot property? Have any idea?

selwin commented 9 years ago

Any suggestions? :)

fabiocaccamo commented 8 years ago

@selwin Here the full bot list: http://www.robotstxt.org/db/all.txt or alternatively there is also this python lib: https://pypi.python.org/pypi/robot-detection

elisarver commented 8 years ago

I have a supporting test from a corpus of 211,000 UA strings. This is a rare 'miss', but, the attached shows a case where the word 'bot' in a user agent string correlates with bots that aren't caught by the parser.

agents_not_is_bot.txt

Also, code review on the function:

return True if self.device.family == 'Spider' else False

can be written as:

return self.device.family == 'Spider'

robcowie commented 7 years ago

Has there been any more thought about this?

Do you (we) consider this to be the appropriate place for building out more accurate bot detection or is it the responsibility of ua_parser (and the regexes in uap-core)?

How about an attempt to integrate https://pypi.python.org/pypi/robot-detection in some way? It should be reasonably simple. The only difficulty I see is reconciling what ua-parser considers a 'spider' with what robot-detection considers a bot; No doubt they will diverge. Is that a problem?

cr0mbly commented 3 years ago

Bump on this would be nice if this was something that was configurable so we can register our own bots.