omrilotan / isbot

🤖/👨‍🦰 Detect bots/crawlers/spiders using the user agent string
https://isbot.js.org/
The Unlicense
876 stars 72 forks source link

Lighthouse was not recognised #213

Closed cortopy closed 1 year ago

cortopy commented 1 year ago

User Agent String

Mozilla/5.0 (Linux; Android 11; moto g power (2022)) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Mobile Safari/537.36

Reproduce

  1. Go to https://pagespeed.web.dev/
  2. Run analysis of a website. I'm getting visits with browser using the User Agent
  3. You may verify User Agent by hovering on the information of the report as per screenshot attached

Screenshot from 2023-07-01 17-39-27

omrilotan commented 1 year ago

In my experience, all tests from https://pagespeed.web.dev/ use the Chrome-Lighthouse substring to identify as bot.

Examples:

Check example on isbot.js.org%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/94.0.4590.2%20Mobile%20Safari/537.36%20Chrome-LighthouseMozilla/5.0%20(Macintosh;%20Intel%20Mac%20OS%20X%2010_15_7)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/94.0.4590.2%20Safari/537.36%20Chrome-Lighthouse)

omrilotan commented 1 year ago

Any way, you refer to the README clarifications:

What does "isbot" do?

This package aims to identify "Good bots". Those who voluntarily identify themselves by setting a unique, preferably descriptive, user agent, usually by setting a dedicated request header.

What doesn't "isbot" do?

It does not try to recognise malicious bots or programs disguising themselves as real users.

If a tool uses legitimate browser user agent string with no indication of being an automated service - we can not use this tool to identify it.

Continuing in clarifications section:

...other methods of identification can be added such as reverse dns lookup.

omrilotan commented 1 year ago

I can see Pagespeed have decided to not recognise themselves as a bot using the user agent string. Also mentioned in discussion #214