matomo-org / plugin-TrackingSpamPrevention

GNU General Public License v3.0
12 stars 7 forks source link

[BUG] Check client hints for headless browsers #108

Open peterbo opened 4 months ago

peterbo commented 4 months ago

Option "block headless browsers" is active in TrackingSpamPrevention, but headless browsers are tracked nevertheless:

block-headless

snake14 commented 4 months ago

Hi @peterbo . Thank you for taking the time to create this issue. I was unable to reproduce the problem. I tested with Matomo 5.0.2, TrackingSpamPrevention version 5.0.0, and Headlesss Chrome 121.1. When TrackingSpamPrevention was enabled, requests weren't tracked from the headless browser and they were when the plugin was disabled.

Could you please provide more background information, like your Matomo/plugin version?

peterbo commented 4 months ago

Hi @snake14 - we're seeing a lot of those in different instances. The problem seems to be, that bots use a "normal" chrome as a user agent, but identify as a headless chrome in the client hints:

[20/Feb/2024:10:53:13 +0100] "POST /matomo.php?action_name=Welcome&idsite=2&rec=1&r=577211&h=9&m=53&s=13&url=https%3A%2F%2Fwww.example.com%2Fen%2F&_id=&_idn=1&send_image=0&_refts=0&cookie=1&res=1280x720&dimension4=en&dimension8=content&pv_id=8jAVo2&pf_net=500&pf_srv=747&pf_tfr=366&uadata=%7B%22fullVersionList%22%3A%5B%7B%22brand%22%3A%22Not%20A(Brand%22%2C%22version%22%3A%2299.0.0.0%22%7D%2C%7B%22brand%22%3A%22HeadlessChrome%22%2C%22version%22%3A%22121.0.6167.57%22%7D%2C%7B%22brand%22%3A%22Chromium%22%2C%22version%22%3A%22121.0.6167.57%22%7D%5D%2C%22mobile%22%3Afalse%2C%22model%22%3A%22%22%2C%22platform%22%3A%22Linux%22%2C%22platformVersion%22%3A%225.15.0%22%7D HTTP/1.1" 204 0 "https://www.example.com/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.6167.57 Safari/537.36"

The client hints seem to be reported in the end. Therefore, TrackingSpamPrevention should also check client hints, if the browser identifies as headless, i.e. https://github.com/matomo-org/matomo/blob/de2d14a6dda52c3445cfdcce50dcadaa1e87d7da/plugins/DevicesDetection/Columns/BrowserEngine.php#L40

snake14 commented 4 months ago

Thank you @peterbo . That's very helpful. I'll mark this issue for our Product team to review and prioritise.

matomoto commented 4 months ago

Joined from the matomo forum ...

The User Agent has nothing information to detect the browser as an Headless Browser: HTTP/1.1" 204 0 "https://www.example.com/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.6167.57 Safari/537.36" So, it is not possible to detect the browser as an Headless Browser.

Matomo used a Device Detector: https://github.com/matomo-org/device-detector/ So far it is understandable, the Device Detector creates a key=value for the Matomo Tracking URL-Querystring: uadata={"fullVersionList":[{"brand":"Not A(Brand","version":"99.0.0.0"},{"brand":"HeadlessChrome","version":"121.0.6167.57"},{"brand":"Chromium","version":"121.0.6167.57"}],"mobile":false,"model":"","platform":"Linux","platformVersion":"5.15.0"} and in one of the subinfos, the browser is detected as an Headless Browser: {"brand":"HeadlessChrome","version":"121.0.6167.57"}

More Infos about uadata: https://github.com/matomo-org/matomo/issues/20128

The possible Solution is, that the TrackingSpamPrevention Plugin used this information. But i have no idea how comes the uadata into the URL-Querystring, ... and it is possible to use this data in the TrackingSpamPrevention Plugin.