Google-Safety and Google are not considered bots

hisorange / browser-detect

Browser Detection for Laravel by hisorange!

https://browser-detect.com

MIT License

1.08k stars 143 forks source link

Google-Safety and Google are not considered bots #194

Closed darkylmnx closed 1 year ago

darkylmnx commented 1 year ago

Hi,

It seems user agents "Google" and "Google-Safety" do not return true for ::isBot() method. Seems same for "Mozilla/5.0 scpitspi-rs".

Same thing but a bit different, when ::browserFamily() returns "Unknown", for exemple it seems twitter or buffer seems to trigger requests from their frontend when a link is added, and despite browser being a string returning "Unknown" it's still considered as a browser.

Any idea why? Is this intended because it seems a bit off.

hisorange commented 1 year ago

Hello @darkylmnx

The user agent just "Google" is not used but the Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/106.0.5249.119 Safari/537.36 is. Google never just sends "Google" as UA, or at least I could not find any new docs about that they plan to, I think most of the internet would kinda crash from it :D

Same goes for "Google-Safety" but the proper one which is actually used by the robot here

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Google-Safety; +http://www.google.com/bot.html)

But, the Mozilla/5.0 scpitspi-rs is different topic, it could be identified as a bot, and can be patched in, but based a quick research it seems like it has no real usage, I was only able to find a single reference to it, and it suggest me that it will make no impact on identification.

The twitter thing makes more curious, can you provide an example UA string which they use?

darkylmnx commented 1 year ago

@hisorange you closed if while the issue is still happening. Google is a useagent I got from some of my users, whether it's someone spoofing or not, it should not be considered as a "browser" but that's what you package returns.

Same for Mozilla/5.0 scpitspi-rs which I have multiple times.

Either isBot should mean any kind of bots, not only known bots/crawlers, or there should be isNeitherBotNoBrowser kind of thing.

I have thousands of useless entries in my DB because their UA was validated as bots by the package.

hisorange commented 1 year ago

Hey @darkylmnx

The problem with that would be to process a user agent "Google Chrome 102.1.4" we cannot just substring match a company's name and call it a bot.

In case you want to detect bots and filter them, I would recommend a WAF, this package never was designed or intended to do a deep dive on what the client is. And as you said someone sending you a "Google" as UA is just a spoofing, I can send it from my desktop browser, does that qualify me as bot?

Also bots tend to use desktop browser UAs.

The package working as intended, The Mozilla/5.0 scpitspi-rs is not a bot identifier, and literally has no reference on the internet, in your case the client can just replace it with Mozilla/5.0 mynewscript-co and it will pass through.

Please mind that, this is not a filtering package, nor a personal service, in case you wanna qualify those as bots, I would advise to simply regex match and overwrite / skip the package's result.

darkylmnx commented 1 year ago

I'm not saying you should add Mozilla/5.0 scpitspi-rs or Google as bot if the ìsBot()` method is only supposded to return true on "known bots and crawlers".

What is true on the other hand is that Mozilla/5.0 scpitspi-rs is not a known browser either, but your package makes browserFamily() return some thing even when the UA contains of those I mentionned while they aren't known browsers which misleads what your package does.

The package working as intended, The Mozilla/5.0 scpitspi-rs is not a bot identifier, and literally has no reference on the internet,

That's more than false as the following screenshot shows. Capture d’écran 2022-11-26 à 01 02 52

Please mind that, this is not a filtering package

Your description is litteraly "Browser Detection for Laravel by hisorange!".

So as I just demonstrated and as you said, the two UA I gave you aren't browsers nor bots in your definition, but your package act's like they are browsers, that is the issue here.

darkylmnx commented 1 year ago

@hisorange have you seen my last message?