Open dobestan opened 9 years ago
Any suggestions? :)
@selwin Here the full bot list: http://www.robotstxt.org/db/all.txt or alternatively there is also this python lib: https://pypi.python.org/pypi/robot-detection
I have a supporting test from a corpus of 211,000 UA strings. This is a rare 'miss', but, the attached shows a case where the word 'bot' in a user agent string correlates with bots that aren't caught by the parser.
Also, code review on the function:
return True if self.device.family == 'Spider' else False
can be written as:
return self.device.family == 'Spider'
Has there been any more thought about this?
Do you (we) consider this to be the appropriate place for building out more accurate bot detection or is it the responsibility of ua_parser
(and the regexes in uap-core
)?
How about an attempt to integrate https://pypi.python.org/pypi/robot-detection in some way? It should be reasonably simple. The only difficulty I see is reconciling what ua-parser considers a 'spider' with what robot-detection considers a bot; No doubt they will diverge. Is that a problem?
Bump on this would be nice if this was something that was configurable so we can register our own bots.
Is it okay to update current implementation of is_bot property? Have any idea?